Bachelor of Science (B.Sc.) & Master of Science (M.Sc.):
Available Thesis Topics
Both Bachelor and Master theses are extensive pieces of work that take full-time committment over several months. They are thus also deeply personal decisions by each student. This page lists a few topics we are currently seeking to address in the research group. If you find one of them appealing, please contact the person listed here for each project. If you have an idea of your own — one that you are passionate about, and which fits into the remit of the Chair for the Methods of Machine Learning — please feel invited to pitch your idea. To do so, first contact Philipp Hennig to make an appointment for a first meeting.
Phase-Aware Deep Learning Optimization
Deep learning setups are typically characterized by heavily overparametrized models that are trained using gradient information on large amounts of data, providing state-of-the-art performance in a broad variety of scenarios.
One major downside of this setup is that training can become very resource-intensive, since DL optimizers rely on heuristic rules and must be often tuned by inefficient trial-and-error searches. Despite dozens of such heuristics being proposed, there is no clear option that provides stable and performative training across different scenarios [1].
In this project we aim to break down the problem in smaller parts, observing that the optimization process may experience different phases during training, each with distinct characteristics and goals.
[1] Schmidt, Schneider and Hennig 2020, "Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers"
Investigating how Neural Networks learn using Compressed Sensing (B.Sc. & M.Sc. Thesis)
Supervisor: Andres Fernandez Rodriguez
The Information Bottleneck framework for Deep Learning gained attraction because it proposes a model- and task- agnostic method to quantify the information loss through the neural network. Still, it has been challenged in [2], as it relies on statistical assumptions and methods that may not hold for every data instance or model.
On the other hand, the Compressed Sensing (CoSe) paradigm [2] is a framework that, given an underdetermined linear system Ax=b with dim(x)>>dim(b), provides mathematical guarantees to recover x from A and b in a stable and efficient way using convex optimization. For that, CoSe relies on the sparsity of x as well as the isometry and incoherence of A.
In this project, we propose to treat a ReLU neural network as a cascade of such linear systems. We can then apply CoSe to recover the inputs from the activations in order to investigate questions like the following:
* Is a randomly initialized network lossy?
* How does training affect the information loss for each layer?
* How does the objective, activation sparsity, and nature of the dataset affect recovery?
* Which data samples can be better recovered, before and after training?
As a starting point, we provide a working example in Python+PyTorch+CVXPY. References and contact:
[1] Saxe et al. 2018, "On the Information Bottleneck Theory of Deep Learning"
[2] Candès and Wakin, 2008, "An Introduction To Compressive Sampling"
B.Sc. & M.Sc. Thesis, could also be a student assistant project
Probabilistic Solutions to ODEs
Ordinary differential equations (ODEs) are central to mathematical models of physical phenomena. For example, the spread of a disease in a population can be predicted by approximating the solution of an ODE. Classical numerical analysis has developed a rich body of methods regarding the solution of this task.
By taking a probabilistic perspective, it is possible to derive an algorithm that returns a probability distribution describing the ODE solution. The variance of this posterior distribution is not only informed about numerical accuracy of the approximation but can be leveraged inside a chain of computation which has been useful, for instance in parameter inference problems involving ODEs.
Probabilistic numerical ODE solvers for score-based generative models (M.Sc. Thesis)
Supervisor: Jonathan Schmidt
Score-based diffusion models are recent and highly performant generative models with a variety of applications such as image generation, audio synthesis, music generation and others. The core of the computation involves the solution of an ordinary differential equation (ODE) that maps a simple distribution to the target distribution (e.g. the image distribution).The student will investigate the use of probabilistic numerical ODE solvers (ProbODE solvers) for this task. Probabilistic numerical ODE solvers explicitly model the discretization error that arises in any numerical ODE solution. One of the aims is to understand how ProbODE solvers interact with the score-based generative model, and if sample generation can potentially be improved.
References:
[1] Song et al. 2021, "Score-Based Generative Modeling through Stochastic Differential Equations", ICLR.
[2] Song et al. 2021, "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS.
[3] Dhariwal et al. 2021, "Diffusion Models Beat GANs on Image Synthesis", Neurips.
[4] Hennig et al. 2022, "Probabilistic Numerics---Computation as machine learning", Chapter VI: "Solving Ordinary Differential Equations".
Neural Ordinary Differential Equations
Neural Ordinary Differential Equations (Neural ODEs) [1] combine ideas from dynamical systems and parameter inference with deep learning methods. By putting a neural network inside of the ODE problem, Neural ODEs are able to model complex data in continuous time, and provide an their discretizations provide an interesting interpretation of some classic deep learning architectures such as residual neural networks.
[1] Neural Ordinary Differential Equations, Chen et al, 2018
Training neural ODEs with gradient matching (M.Sc. Thesis)
Supervisor: Nathanel Bosch
Neural Ordinary Differential Equations (Neural ODEs) [1] are typically trained by computing derivatives through a numerical ODE solver. Since this is not quite as trivial as it sounds, there has been a lot of work on different methods to do this in an memory-efficient, numerically stable, and more robust manner. But there are also methods for parameter-inference in dynamical systems that do not rely on numerical solvers, such as gradient matching [2]. This could lead to more stable training (in particular when the data oscillatets), and overall to an easier loss objective and thus potentially to faster training. In this thesis, we want to explore the utility of gradient matching for neural ODEs.
Goals of the thesis project:
- Learn about neural ODEs (in particular about the many different established ways to train neural ODEs), gradient matching, and also related topics like Neural CDEs [3] and multiple shooting [4]. _Understanding the literature is an important part of this project._
- Develop and implement a gradient matching-based Neural ODE algorithm.
- Benchmark against the established alternatives and investigate advantages and limitations
Relevant literature:
[1] Neural Ordinary Differential Equations, Chen et al, 2018
[2] ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical Systems, Wenk et al, 2020
[3] Neural Controlled Differential Equations for Irregular Time Series, Kidger et al, 2020
[4] Multiple shooting for training neural differential equations on time series, Turan et al, 2021