Computation is Inference

Probabilistic Numerics: Computation under Uncertainty

Machine learning, in contrast to classic, rule-based AI, describes learning as a statistical inference problem. Solving this problem in a concrete machine requires the solution of numerical problems: Optimization (finding the best explanation for data), Integration (to measure evidence) and Simulation (to predict the future). Applied mathematics has invented great algorithmic tools for this purpose. The core tenet of our group's work is the insight that these numerical methods are themselves elementary learning machines. To compute a non-analytic numerical result means using the result of tractable (observable) computations — data — to infer the unknown number. Since numerical methods also actively decide which computations to perform to efficiently solve their task, they really are autonomous agents. So it must also be possible to describe their behavior in the mathematical terms of learning machines. When this is done in the probabilistic framework, we call the resulting algorithms probabilistic numerical methods. They are algorithms that take as their input a generative model describing a numerical task, use computing resources to collect data, perform Bayesian inference to refine the generative model and return a posterior distribution over the target quantity. This distribution can then be studied, analogous to how point estimates are studied in classic numerical analysis: The posterior should concentrate around the correct value (e.g. its mean should be close to the true value, and the standard deviation should relate to the distance between truth and mean), and the concentration should be "fast" in some sense. The theoretical work in the group focusses on developing such methods and using them to provide novel, much-needed functionality for contemporary machine learning and beyond.

 

Here are some examples of past theoretical results developed in the group. Recent work can be found on the publications page.

 

Integration

Integration is the foundational operation of probabilistic machine learning. It is required to compute conditional and marginal distributions — to measure how many possible explanations are left for a set of observations, and how good these explanations are. We have contributed to the development of Bayesian quadrature methods, a conceptually clean formalism for active integration. These methods are showing increasingly strong empirical performance relative to their key competitors, Markov Chain Monte Carlo methods. Improving integration is a hard challenge, but because this operation plays such an exceptionally fundamental role in many domains, improvements of the state of the art have wide-ranging implications.

 

Linear Algebra

Methods for linear algebra — solving systems of linear equations, and finding structure (decompositions) within them that simplifies subsequent computations — are the bedrock of virtually all scientific computation. Although classic linear algebra methods are extremely efficient, they lack key functionality urgently required by contemporary big-data machine learning — in particular robustness to strong stochastic noise. We have provided key insights into the probabilistic interpretation of linear algebra methods, and developed custom linear solvers specific to the kind of (kernel Gram least-squares) problems encountered in machine learning. 

 

Optimization

Optimization — finding the minimum of a high-dimensional surface — is the core computational task of statistical learning, including most of deep learning. Stochasticity caused by batched big-data processing has shaken up this domain, which was once considered essentially solved. A bewildering array of optimization methods now exist, and they often leave the tuning of crucial hyper-parameters as an arduous task to the user. Our work has contributed insights into the causes of these new complications, pointed out misconceptions, and identified new computable quantities as crucial for the tuning of hyper-parameters. We have also released high-quality software packages that make these quantities accessible, and benchmarks for the quantitative, undogmatic comparison of optimization methods.

 

Differential Equations

Differential equations describe the behaviour of dynamical systems. In machine learning they show up as the continous limit of certain deep architectures,as well as in model predictive control (a subset of reinforcement learning) where the learner has to “predict the future”, but they also play a central role in quantitative science and thus many applications of machine learning in science. Our work has helped extend the classic formalism for the solution of (in particularl: ordinary) differential equations, adding probabilistic notions of uncertainty across the entire process. This unlocks new modelling paradigms, makes solvers for differential equations more robust to computational imprecision, provides more structured notions of output uncertainty, and can propagate uncertainty through the computation. These advantages also come to bear in the important problem of inferring the parameters of dynamical systems from finite observations.