# Bachelor of Science (B.Sc.) & Master of Science (M.Sc.):

# Available Thesis Topics

Both Bachelor and Master theses are extensive pieces of work that take full-time committment over several months. They are thus also deeply personal decisions by each student. This page lists a few topics we are currently seeking to address in the research group. If you find one of them appealing, please contact the person listed here for each project. *If you have an idea of your own — one that you are passionate about, and which fits into the remit of the Chair for the Methods of Machine Learning — please feel invited to pitch your idea. To do so, first contact Philipp Hennig to make an appointment for a first meeting.*

## BackPACK

We want to extend the functionality of our backpropagation library BackPACK [1] presented at ICLR 2020.

It is a high-quality software library that computes additional and novel numerical quantities with automatic differentiation that aim to improve the training of deep neural networks. Applicants should be interested in automatic differentiation and be experienced in PyTorch and Python. Students will learn about deep neural networks' operations and their autodifferentiation internals, as well as the quantities extracted by BackPACK. They thus offer an opportunity to gain expert knowledge in the algorithmic side of deep learning. Both are challenging projects, which require familiarity with the manipulation of tensors (indices!) and multivariate calculus (automatic differentiation). A significant amount of time will be spent on software engineering, as the works will be fully integrated into BackPACK and hopefully released in a future version. Results will be presented in forms of runtime benchmarks similar to the original work [1]. The students are encouraged to investigate further applications.

[1] F. Dangel, F. Kunstner & P. Hennig: BackPACK: Packing more into Backprop (2020)

## BackPACK for Recurrent Neural Networks (M.Sc. Thesis)

Supervisor: Felix Dangel

Backpropagation in recurrent neural networks (RNN) is more complicated than in convolutional neural networks, as a layer processes a temporal sequence of inputs. This introduces a cyclic structure in the computation graph. The student will formulate backpropagation through time for popular RNN architectures (Elman RNN, LSTM, ...) and, step by step, cover all quantities that can be computed with BackPACK.

## Speeding up BackPACK with custom PyTorch C++/CUDA operations (M.Sc. Thesis)

Supervisor: Felix Dangel

Our approach in BackPACK was to re-use existing PyTorch operations and work primarily in Python. This works, but we had to work around not having low-level control. Some operations for convolutions can be sped up by custom low-level code in C++ and CUDA.

## Probabilistic Linear Solvers

Linear systems A x=b are the bedrock of virtually all numerical computation. Machine learning poses specific challenges for the solution of such systems due to their scale, characteristic structure and their stochasticity. Datasets are often so large that data subsampling approaches need to be employed, inducing noise on A. In fact, usually only noise-corrupted matrix-vector products are available. Typical examples are large-scale empirical risk minimization problems. Classic linear solvers such as CG typically fail to solve such systems accurately since they rely on errors within machine precision.

Probabilistic linear solvers [1] aim to address these challenges raised by ML by treating the problem of solving the linear system itself as an inference task. This allows the incorporation of prior (generative) knowledge about the system, e.g. about its eigenspectrum and enables the solution of noisy systems.

[1] Hennig, P., Probabilistic Interpretation of Linear Solvers, *SIAM Journal on Optimization*, 2015, 25, 234-260

## Benchmarking Linear Solvers for Machine Learning (M.Sc. Thesis)

Supervisor: Jonathan Wenger

In order to compare linear solvers with respect to the specific linear systems which arise in machine learning, a benchmark suite of linear problems needs to be established. Such a list should include among others noisy systems (stochastic quadratic programs, empirical risk minimization), large scale systems (n > 100000), structured systems (sparsity, block diagonality) and systems with generative prior information (kernel Gram matrices). The student's task will be to establish such a benchmark based on interesting current research topics in ML and to evaluate existing linear solvers.

## [already taken] Efficient Computation of Matrix-variate Distributions (B.Sc. Thesis)

Supervisor: Jonathan Wenger

Matrix-variate distributions are a key component of matrix-based probabilistic linear solvers which perform inference on the system matrix or its inverse. In order to make use of the posterior distribution over the inverse as produced by the solver, efficient techniques for matrix-variate distributions need to be established. The student will derive and implement efficient sampling and evaluation of the probability density function of matrix-variate normal distributions with different covariance structures. The implementation will be in Python as part of the probnum package.

## Bayesian quadrature

Bayesian quadrature (BQ) treats numerical integration as an inference problem by constructing posterior measures over integrals given observations, i.e. evaluations of the integrand. Besides providing sound uncertainty estimates, the probabilistic approach permits the inclusion of prior knowledge about properties of the function to be integrated and leverages active learning schemes for node selection as well as transfer learning schemes, e.g. when multiple similar integrals have to be jointly estimated.

## [already taken] Bayesian quadrature on Riemannian manifolds (M.Sc. Thesis)

supervisors: Alexandra Gessner, Georgios Arvanitidis (MPI-IS)

Riemannian manifolds provide a principled way to model the nonlinear geometric structure assumed to be inherent in the data. The goal of the project is to use BQ to normalize probability distributions on these manifolds. In the iterative process of learning the distribution's parameters, the integral for the normalization has to be evaluated repeatedly. The setup is ideal for BQ: The integrand has known structure that can be encoded in the prior and function evaluations are expensive since they require computing geodesics (i.e. solving differential equations). The iterative nature of the integral updates further suggest a transfer learning approach to recycle information collected in previous iterations.

## Probabilistic Solutions to ODEs

Ordinary differential equations (ODEs) are central to mathematical models of physical phenomena. For example, the spread of a disease in a population can be predicted by approximating the solution of an ODE. Classical numerical analysis has developed a rich body of methods regarding the solution of this task.

By taking a probabilistic perspective, it is possible to derive an algorithm that returns a probability distribution describing the ODE solution. The variance of this posterior distribution is not only informed about numerical accuracy of the approximation but can be leveraged inside a chain of computation which has been useful, for instance in parameter inference problems involving ODEs.

## [already taken] Benchmarking Probabilistic ODE Solvers (M.Sc./B.Sc. Thesis)

**Supervisor:** Nicholas Krämer

In this project we want to compare different lines of thoughts regarding probabilistic ODE solvers. There are mainly two approaches: (i) computing samples of a stochastically perturbed "classical" method [1] and (ii) applying Bayesian filtering and smoothing to the ODE [2]. The students tasks will be to (i) implement sampling based methods in Python as part of the probnum package (filtering based methods already exist) and (ii) develop a set of benchmark problems on which computational speed and differences in posterior uncertainty estimates are evaluated.

[1] Patrick R Conrad, Mark Girolami, Simo Särkkä, Andrew Stuart and Konstantinos Zygalakis. Statistical analysis of differential equations: introducing probability measures on numerical solutions. Statistics and Computing, 2017

[2] Filip Tronarp, Hans Kersting, Simo Särkkä and Philipp Hennig. Probabilistic solutions to ordinary differential equations: a new perspective. Statistics and Computing, 2019.

## Probabilistic Computation of Geodesics (M.Sc. Thesis)

**Supervisor:** Nicholas Krämer

Straight lines on manifolds, so-called geodesics, are of high importance in computational geometry, and consequently in modern data analysis. They can usually not be computed analytically, instead one numerically solves so-called boundary value problems: second order ordinary differential equations with an initial condition and a terminal condition. In this project investigate the potential of using probabilistic ODE solvers on this task and see how much one gains in terms of (i) computational speed and (ii) quantification of (numerical) uncertainty.

The students task will initially be to reproduce previous studies on this topic [1] and focus on how much the most recent efforts in numerical solution of boundary value problems [2, 3] are able to contribute to an improvement. If there is enough time, we investigate the effect of these insights on challenging tasks in computational geometry and machine learning.

[1] Georgios Arvanitidis, Soren Hauberg, Philipp Hennig and Michael Schober. Fast and robust shortest paths on manifolds learned from data. AISTATS 2019.

[2] David John, Vincent Heuveline and Michael Schober. GOODE: a Gaussian off-the-shelf ordinary differential equation solver. ICML 2019.

[3] Filip Tronarp, Simo Särkkä and Philipp Hennig. Bayesian ODE solvers: The Maximum A Posteriori Estimate. arXiv:2004.00623.

## Approximate Bayesian Deep Learning

Bayesian inference is a principled way to enable deep networks to quantify their predictive uncertainty. The key idea is to put a prior over their weights and apply Bayes' rule given a dataset. The resulting posterior can then be marginalized to obtain predictive distribution. However, exact Bayesian inference on deep networks is intractable and thus one must resort to approximate Bayesian methods, such as Laplace approximations, variational Bayes, and Markov-Chain Monte-Carlo. One of the focus is to design cost-effective, scalable yet highly performant approximate Bayesian neural networks, both in terms of predictive accuracy and uncertainty quantification.

## Uncertainty aware top-k for image classification on large datasets (M.Sc. Thesis)

Supervisor: Marius Hobbhahn

On large datasets for image classification such as ImageNet [1], people often use a top-1 and top-5 ranking (i.e. the best and the best 5 predictions) to measure their success. An uncertainty-aware top-k ranking in contrast would not keep the number of predictions static but rather additionally use their uncertainty. If the prediction of an image is very certain you would use only the first estimate. If the prediction is very uncertain you might use 10 individual estimates.

We have already introduced an uncertainty-aware top-k ranking using the Laplace Bridge in a recent paper [2] where it seemed to yield interesting insights and improved results. However, there are multiple other ways to generate uncertainty for Neural Networks and different ways to implement them.

This thesis would be a rigorous exploration of different approaches to the uncertainty-aware top-k metric. A basic understanding of image classification and Neural Networks is recommended. A willingness to work with large datasets and large networks is required.

[1] ImageNet Large Scale Visual Recognition Challenge; https://arxiv.org/abs/1409.0575

[2] Fast Predictive Uncertainty for Classification with Bayesian Deep Networks; https://arxiv.org/abs/2003.01227

## [already taken] Last-layer Bayesian Inference---Laplace, Variational, or MCMC? (M.Sc. Thesis or M.Sc. Project )

Supervisor: Agustinus Kristiadi

Last-layer approximate Bayesian inference methods have been shown to be competitive yet cost-efficient for quantifying the predictive uncertainty of neural networks [1]. [2], [3], and [4] have independently used last-layer Laplace approximation, mean-field variational Bayes (VB), and stochastic-gradient MCMC (SG-MCMC) for different applications. However, a systematic and comprehensive comparison between all of them is still lacking.

The student shall implement and train various last-layer Bayesian methods (last-layer Laplace, mean-field VB, and SG-MCMC, and perhaps also their variants) and analyze them in terms of their costs, implementation convenience, and predictive uncertainty quality.

The student is required to be (i) familiar with PyTorch and deep learning in general and (ii) have passed the Probabilistic Machine Learning course.

[1] Ober, Sebastian W., and Carl Edward Rasmussen. "Benchmarking the Neural Linear Model for Regression." arXiv preprint arXiv:1912.08416 (2019).

[2] Kristiadi, Agustinus, Matthias Hein, and Philipp Hennig. "Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks." ICML (2020).

[3] Riquelme, Carlos, George Tucker, and Jasper Snoek. "Deep bayesian bandits showdown: An empirical Comparison of Bayesian Deep Networks for Thompson Sampling." ICLR (2018).

[4] Brosse, Nicolas, et al. "On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation." arXiv preprint arXiv:2001.08049 (2020).

## Last-layer Variational Bayes (M.Sc. Thesis or M.Sc. Project)

Supervisor: Agustinus Kristiadi

One of the most popular approximate Bayesian methods for neural networks (NNs) is the variational Bayesian inference, also known as variational Bayes (VB). It works by assuming a parametric form of the NNs' posterior (usually Gaussian) and subsequently optimizing the evidence-lower-bound (ELBO). However, even when the approximate posterior takes the simplest form, i.e. diagonal Gaussian, implementing VB to state-of-the-art networks requires a large amount of effort, not to mention that the ELBO optimization is non-trivial and requires a lot of tuning [1]. Last-layer VB (LLVB, also called the ``neural linear model'') [2] is a competitive alternative that is easier to implement and optimize. Furthermore, being a simple last-layer method, LLVB can still be cost-effective even if more granular covariance approximations in the posterior, such as a Kronecker-factored [3] and full covariance, were employed. Nevertheless, the effect of the granularity of the last-layer covariance approximation to the predictive uncertainty quantification, along with their respective cost, have not been extensively studied before.

The student will implement and train various LLVBs with different covariance assumptions (diagonal, Kronecker-factored, low-rank, full, etc.) on large networks and analyze them in terms of their costs and predictive uncertainty quality.

The student is required to be (i) amiliar with PyTorch and deep learning in general and (ii) have passed the Probabilistic Machine Learning course.

[1] Osawa, Kazuki, et al. "Practical Deep Learning with Bayesian Principles." NeurIPS (2019).

[2] Riquelme, Carlos, George Tucker, and Jasper Snoek. "An empirical Comparison of Bayesian Deep Networks for Thompson Sampling." ICLR (2018).

[3] Louizos, Christos, and Max Welling. "Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors." ICML (2016).