Mission of the Machine Learning ⇌ Science Colaboratory

 
Scientific machine learning for data-driven discovery

Scientific discovery lacks, by definition, a ground truth. We don't know if the problem is solvable or how well we can do. There rarely are benchmark datasets. Data is missing acutely not at random due to sensor failures and collection bias. A substantial body of previous knowledge needs consideration: Conservation laws, dynamical equations, integrity constraints. Prediction is seldom enough: Causal understanding is the ultimate goal, and uncertainty evaluation and interpretability are requisites. Data acquisition is not mediated by analytics of web behaviour but by expensive, often unique experiments, and data modalities are often mixed, sometimes exotic.

We see substantial potential for new developments at the interface between ML and topical research. Besides the abundant algorithmic challenges in scaling, robustness, interpretability and expression of inductive biases, there are opportunities at the edges of the ML pipeline, i.e. on the steps that are most actionable for domain scientists: Problem definition, data collection, feature development, quality evaluation and formulation of new hypotheses and interventions to adress causality.

With its specific challenges, methods, and standards a field is emerging. Some are calling this cross-disciplinary endeavour scientific machine learning.

 
A colaboratory seeds a community of practice

At the ml ⇌ science colab we seek to develop a community of practice to tackle scientific problems with machine learning methods. For that

  • We work with domain specialists across the natural sciences, social sciences and humanities in order to advance this field by collecting experience with currently open problems and fresh data.
  • We constantly collaborate with the ML research community in Tübingen to address the tough, specific challenges of this setting.
  • We train postgraduate scientists to understand data problems and use the latest algorithms and inference software.
  • We set out to find what works where, but also what fails and why. We make a point of communicating both, for the benefit of the community, in traditional and interactive formats.
  • We develop pieces of inference software that we consider of general interest to researchers across disciplines, and want to see survive the career progression of the original author(s).

We are part of a larger effort worldwide towards making machine learning a useful tool to a broader audience, and a more reliable one at that. Steps in that process include developing engineering practices for machine learning [1] and largely automating some of the craft that goes into it [2].

[1] Michael Jordan writes on ML as a human-centric engineering discipline
[2] Rich Caruana identifies Research opportunities in AutoML