Bachelor of Science (B.Sc.) & Master of Science (M.Sc.):

Available Thesis Topics

Both Bachelor and Master theses are extensive pieces of work that take full-time committment over several months. They are thus also deeply personal decisions by each student. This page lists a few topics we are currently seeking to address in the research group. If you find one of them appealing, please contact the person listed here for each project. If you have an idea of your own — one that you are passionate about, and which fits into the remit of the Chair for the Methods of Machine Learning — please feel invited to pitch your idea. To do so, first contact Philipp Hennig to make an appointment for a first meeting.

AlgoPerf Submissions (M.Sc. Project)

Supervisor: Frank Schneider

At the moment, training a contemporary deep neural network is a time-consuming and resource-intensive process, largely due to the many crucial decisions practitioners must make
regarding the underlying training algorithms (e.g. should I use SGD, ADAM, or SHAMPOO? What learning rate (schedule)? Weight decay? etc.). ALGOPERF, a benchmark (competition) measuring training speed, provides a platform to compare various algorithmic choices and thus guide practitioners meaningfully. This project aims to prepare a submission for the ALGOPERF benchmark by either implementing and tweaking existing methods or by innovatively combining them.

Possible approaches are:

Developing a self-tuning submission based on the winning external tuning submission, SHAMPOO.
Improving the winning self-tuning submission, SCHEDULE-FREE ADAMW,
e.g. by sequentially scheduling a second hyperparameter config for training the IMAGENET RESNET workload.
Benchmarking new algorithms that did not yet have a submission, e.g. ADEMAMI

Prerequisites:

(basic) experience in PyTorch or JAX
(basic) knowledge of deep learning

Uncertainty Disentanglement (Project)

Supervisor: Bálint Mucsányi

The field of uncertainty disentanglement (UD) has been steadily gaining traction in recent years. UD aims to construct multiple uncertainty estimators that are each tailored to one and only one source of uncertainty (e.g., epistemic and aleatoric uncertainty). Recently, it has been found that the most widely used uncertainty decomposition formulas fail to disentangle uncertainties. One possible reason is that the different uncertainties are all captured in the output space, which leads to high correlations. This project explores breaking these correlations by calculating the estimates at different points of the computation graph. The primary objective is to equip weight-space-based UQ methods – such as the Laplace approximation – with disentanglement capabilities by pushing the uncertainty forward to an intermediate representation for epistemic uncertainty and measuring aleatoric uncertainty in the output
space. The new approaches can be evaluated on an existing UD benchmark and compared to latent density methods.

Prerequisites:

experience with PyTorch

Bayesian exploration of synthetic datasets [lab rotation/ research project]

Supervisor: Tobias Weber

Synthetic datasets are increasingly recognized as valuable for training deep architectures. The emergence of potentially infinite data spaces introduces novel opportunities, particularly in dynamically sampling new data based on the current model’s performance. This project aims to explore Bayesian optimization techniques as a method for navigating and leveraging such expansive data spaces. We will begin with a thorough literature review to understand the landscape of Bayesian optimization in this context. The primary objective will be to compare various approaches through small-scale experiments, identifying their strengths and limitations. The insights gained are intended to inform and potentially expand into a Master's thesis on this topic.

Prerequisites:

Enthusiasm for reading and analyzing research papers.
Experience with setting up Deep Learning experiments in Jax or PyTorch.

Sampling-free Bayesian deep learning for classification [Project]

Supervisor: Nathaël da Costa

Standard neural architectures for classification typically comprise of a neural network mapping to a Euclidean space (the logits), followed by a softmax activation function. Such architectures are then trained using the cross entropy loss. Bayesian deep learning then attempts to build a Gaussian distribution over the parameters over the neural network. For each input to the network, this results in a Gaussian distribution over the logit space, which must be pushed forward through the softmax and integrated to obtain predictive probabilities. This last step cannot be done tractably in closed form, and is thus approximated through Monte-Carlo sampling, which is computationally costly and noisy. This is an exploratory project to test different choices of last-layer activations (to replace the softmax) and losses (to replace the cross-entropy loss) that allow for closed-form approximate distributions over the predictive space, bypassing Monte-Carlo sampling.

Prerequisites:

experience with PyTorch or Jax
experience with deep learning

Accelerating Hessian-free Optimization for Physics-informed Neural Networks (M.Sc. Thesis Project)

Supervisors: Marius Zeinhofer (ETH Zürich), Felix Dangel (Vector Institute), Lukas Tatzel

The loss landscape of physics-informed neural networks (PINNs) is notoriously hard to navigate for first-order optimizers like SGD and Adam. Second-order methods can significantly outperform them on small- and medium-sized problems. In particular, the Hessian-free optimizer represents a strong baseline that requires almost no tuning. However, as the problem size grows, the Hessian-free optimizer only achieves a few hundred steps within a given budget, diminishing its benefits. The goal of this project is to accelerate the Hessian-free optimizer through three aspects: (i) numerical tricks to speed up matrix-vector products and enable efficient pre-conditioning, (ii) revisiting recommendations from the seminal work, and (iii) correcting for a recently discovered bias in mini-batch quadratics.

Prerequisites:

Experience with PyTorch and numerical optimization
Interest in PDEs and automatic differentiation

A JAX Library for Practical Hessian Approximations (B.Sc. Project, starting earliest 1 January 2025)

starting earliest 1 January 2025
Supervisor: Joanna Sliwa

A Hessian of a model stores second-order partial derivatives of a loss function with regards to the models' weights. In its pure form, it is computationally infeasible. This project aims to create a library focused on efficiently approximating Hessian matrices in JAX. The library will implement key approximation techniques, such as Generalized Gauss Newton, Kronecker Factored Approximation Curvature, and the diagonal of a Fisher Information matrix. The primary goals include achieving computational and memory efficiency while maintaining a user-friendly design. By providing practical and accessible implementations within the JAX framework, this project seeks to offer a valuable tool for researchers and practitioners in various applications, including continual learning and model training.

Prerequisites:

experience in JAX
knowledge of deep learning
knowledge of linear algebra

Neural-Parareal [Bachelor thesis]

Supervisor: Tobias Weber

Deep Learning holds great promise for scientific applications, particularly in solving and propagating time-dependent partial differential equations. However, it has yet to achieve broad real-world applicability, primarily due to challenges with accuracy and out-of-distribution robustness when compared to traditional numerical methods. While neural networks can offer speed as a significant advantage, they often fall short in precision, limiting their utility. A promising application, however, arises in scenarios where computational speed is prioritized over absolute accuracy.

In this Bachelor thesis, you will investigate a [recent approach](https://arxiv.org/abs/2405.01355) that leverages a neural network to provide coarse predictions, which in turn guide the fine-scale computations of a numerical method. A first step will be to derive a simple toy experiment generalizing away from the specific domain of the referenced paper. This experiment will serve as a platform to explore and evaluate various training setups for the proposed task.

Prerequisites:

Basic understanding of partial differential equations and scientific computing
Proficiency in Jax