# Bachelor of Science (B.Sc.) & Master of Science (M.Sc.):

# Available Thesis Topics

Both Bachelor and Master theses are extensive pieces of work that take full-time committment over several months. They are thus also deeply personal decisions by each student. This page lists a few topics we are currently seeking to address in the research group. If you find one of them appealing, please contact the person listed here for each project. *If you have an idea of your own — one that you are passionate about, and which fits into the remit of the Chair for the Methods of Machine Learning — please feel invited to pitch your idea. To do so, first contact Philipp Hennig to make an appointment for a first meeting.*

## Phase-Aware Deep Learning Optimization

Deep learning setups are typically characterized by heavily overparametrized models that are trained using gradient information on large amounts of data, providing state-of-the-art performance in a broad variety of scenarios.

One major downside of this setup is that training can become very resource-intensive, since DL optimizers rely on heuristic rules and must be often tuned by inefficient trial-and-error searches. Despite dozens of such heuristics being proposed, there is no clear option that provides stable and performative training across different scenarios [1].

In this project we aim to break down the problem in smaller parts, observing that the optimization process may experience different phases during training, each with distinct characteristics and goals.

[1] Schmidt, Schneider and Hennig 2020, "Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers"

## Investigating how Neural Networks learn using Batch Normalization (B.Sc. & M.Sc. Thesis)

Supervisor: Andres Fernandez Rodriguez

Batch Normalization (BN) is a very popular technique in Deep Learning, since it improves convergence and performance. Still, it becomes unstable when the batch size is smaller (a very common scenario) [1]. For these reasons, recent efforts aim to replace BN [2].

In this project, we observe that BN makes use of an exponential moving average to track data statistics, and we propose to analyze the behaviour of this tracker in order to investigate questions like the following:

* How well does BN estimate individual instances at initialization?

* How does the estimation improve as a function of training and layer?

* What are the properties of a "well-trained" BN layer? Do they depend on the model and task?

As a starting point, we provide a working example in Python+PyTorch. References and contact:

[1] Wu and He 2018, "Group Normalization"

[2] Hoedt, Hochreiter and Klambauer, 2022, "Normalization is dead, long live normalization!"

B.Sc. & M.Sc. Thesis, could also be a student assistant project

## Investigating how Neural Networks learn using Compressed Sensing (B.Sc. & M.Sc. Thesis)

Supervisor: Andres Fernandez Rodriguez

The Information Bottleneck framework for Deep Learning gained attraction because it proposes a model- and task- agnostic method to quantify the information loss through the neural network. Still, it has been challenged in [2], as it relies on statistical assumptions and methods that may not hold for every data instance or model.

On the other hand, the Compressed Sensing (CoSe) paradigm [2] is a framework that, given an underdetermined linear system Ax=b with dim(x)>>dim(b), provides mathematical guarantees to recover x from A and b in a stable and efficient way using convex optimization. For that, CoSe relies on the sparsity of x as well as the isometry and incoherence of A.

In this project, we propose to treat a ReLU neural network as a cascade of such linear systems. We can then apply CoSe to recover the inputs from the activations in order to investigate questions like the following:

* Is a randomly initialized network lossy?

* How does training affect the information loss for each layer?

* How does the objective, activation sparsity, and nature of the dataset affect recovery?

* Which data samples can be better recovered, before and after training?

As a starting point, we provide a working example in Python+PyTorch+CVXPY. References and contact:

[1] Saxe et al. 2018, "On the Information Bottleneck Theory of Deep Learning"

[2] Candès and Wakin, 2008, "An Introduction To Compressive Sampling"

B.Sc. & M.Sc. Thesis, could also be a student assistant project

## Probabilistic Solutions to ODEs

Ordinary differential equations (ODEs) are central to mathematical models of physical phenomena. For example, the spread of a disease in a population can be predicted by approximating the solution of an ODE. Classical numerical analysis has developed a rich body of methods regarding the solution of this task.

By taking a probabilistic perspective, it is possible to derive an algorithm that returns a probability distribution describing the ODE solution. The variance of this posterior distribution is not only informed about numerical accuracy of the approximation but can be leveraged inside a chain of computation which has been useful, for instance in parameter inference problems involving ODEs.

## Probabilistic numerical ODE solvers for score-based generative models (M.Sc. Thesis)

Supervisor: jonathan.schmidt @uni-tuebingen.de

Score-based diffusion models are recent and highly performant generative models with a variety of applications such as image generation, audio synthesis, music generation and others. The core of the computation involves the solution of an ordinary differential equation (ODE) that maps a simple distribution to the target distribution (e.g. the image distribution).The student will investigate the use of probabilistic numerical ODE solvers (ProbODE solvers) for this task. Probabilistic numerical ODE solvers explicitly model the discretization error that arises in any numerical ODE solution. One of the aims is to understand how ProbODE solvers interact with the score-based generative model, and if sample generation can potentially be improved.

**References:**

[1] Song et al. 2021, "Score-Based Generative Modeling through Stochastic Differential Equations", ICLR.

[2] Song et al. 2021, "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS.

[3] Dhariwal et al. 2021, "Diffusion Models Beat GANs on Image Synthesis", Neurips.

[4] Hennig et al. 2022, "Probabilistic Numerics---Computation as machine learning", Chapter VI: "Solving Ordinary Diﬀerential Equations".

## BackPACK

We want to extend the functionality of our backpropagation library BackPACK [1] presented at ICLR 2020.

It is a high-quality software library that computes additional and novel numerical quantities with automatic differentiation that aim to improve the training of deep neural networks. Applicants should be interested in automatic differentiation and be experienced in PyTorch and Python. Students will learn about deep neural networks' operations and their autodifferentiation internals, as well as the quantities extracted by BackPACK. They thus offer an opportunity to gain expert knowledge in the algorithmic side of deep learning. Both are challenging projects, which require familiarity with the manipulation of tensors (indices!) and multivariate calculus (automatic differentiation). A significant amount of time will be spent on software engineering, as the works will be fully integrated into BackPACK and hopefully released in a future version. Results will be presented in forms of runtime benchmarks similar to the original work [1]. The students are encouraged to investigate further applications.

[1] F. Dangel, F. Kunstner & P. Hennig: BackPACK: Packing more into Backprop (2020)

**Currently no projects available. Please feel free to reach out with your own ideas.**

## Probabilistic Linear Solvers

Linear systems A x=b are the bedrock of virtually all numerical computation. Machine learning poses specific challenges for the solution of such systems due to their scale, characteristic structure and their stochasticity. Datasets are often so large that data subsampling approaches need to be employed, inducing noise on A. In fact, usually only noise-corrupted matrix-vector products are available. Typical examples are large-scale empirical risk minimization problems. Classic linear solvers such as CG typically fail to solve such systems accurately since they rely on errors within machine precision.

Probabilistic linear solvers [1] aim to address these challenges raised by ML by treating the problem of solving the linear system itself as an inference task. This allows the incorporation of prior (generative) knowledge about the system, e.g. about its eigenspectrum and enables the solution of noisy systems.

[1] Hennig, P., Probabilistic Interpretation of Linear Solvers, *SIAM Journal on Optimization*, 2015, 25, 234-260

**Currently no projects available. Feel free to reach out to Jonathan Wenger with your own ideas.**

## Bayesian quadrature

Bayesian quadrature (BQ) treats numerical integration as an inference problem by constructing posterior measures over integrals given observations, i.e. evaluations of the integrand. Besides providing sound uncertainty estimates, the probabilistic approach permits the inclusion of prior knowledge about properties of the function to be integrated and leverages active learning schemes for node selection as well as transfer learning schemes, e.g. when multiple similar integrals have to be jointly estimated.

**Currently no projects available. Feel free to reach out with your own ideas.**

## Approximate Bayesian Deep Learning

Bayesian inference is a principled way to enable deep networks to quantify their predictive uncertainty. The key idea is to put a prior over their weights and apply Bayes' rule given a dataset. The resulting posterior can then be marginalized to obtain predictive distribution. However, exact Bayesian inference on deep networks is intractable and thus one must resort to approximate Bayesian methods, such as Laplace approximations, variational Bayes, and Markov-Chain Monte-Carlo. One of the focus is to design cost-effective, scalable yet highly performant approximate Bayesian neural networks, both in terms of predictive accuracy and uncertainty quantification.

**Currently no projects available. Feel free to reach out with your own ideas.**