Bachelor Theses at the Chair of Cognitive Systems (Prof. Dr. Andreas Zell)

Students who want to take a bachelor thesis should have attended at least one lecture of Prof. Zell and passed it with good or at least satisfactory grades. They might also have obtained the relevant background knowledge for the thesis from other, similar lectures.

Open Topics

Generating synthetic data from GTAV for person detection

Mentor: Yitong Quan

Email: yitong.quan at

This thesis is related to the SafeAI project that should provide additional security to a system, e.g., by using sensor data of the environments to detect nearby persons. A robust object detection system should be ideally able to handle special cases such as persons being partially occluded by other objects. For example, in a real application in farming scenarios, persons inside the field are very likely occluded by the plants.

However, generating a dataset from real scenarios to train such a detector, vast efforts are needed for data collection and labeling.

This thesis aims to short-cut the path to build such a data set, by collecting data from the game GTAV with the plugin DeepGTAV, where the labeling (bounding boxes and masks) comes for almost free. For this aim, a student is expected to finish the following tasks:

1, set up the reasonable environment variables, e.g., poses and trajectories of the camera and persons in the files, as well as time, seasons, and weather conditions.

2, collect data (including RGB images, depth images, labels of the persons) from the engine

3, statistical analysis of the collected data set.

4, train public available models on our synthetic collected data set. And compare their performances to those trained on other public real data sets.

The requirements are knowledge in Python and PyTorch, as well as experience in training neural networks.



Event-camera, camera and robot arm calibration

Mentor: Andreas Ziegler

Email: andreas.zieglerspam

The cognitive systems group at the University of Tübingen uses a table tennis robot system to conduct research on various topics around robotics, control, computer vision, machine learning and reinforcement learning. So far the system uses up to five cameras for the perception pipeline. Recently, the group added event-based cameras to the sensor suite.

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision.

With this new sensors in place, the questions arises, how the whole system should be calibrated. Eye-to-hand (calibration of the camera and the robot arm) and camera calibration is a well studied topic in the literature. However, since event-based cameras are still relatively new, there is only very little literature on event-based camera calibration, especially for eye-to-hand calibration.

In a first step, the student should study the state-of-the-art calibration methods relevant for our table tennis robot system. Based on this, the goal of the thesis is to develop new methods which can be used in a calibration toolbox, allowing to calibrate the system in an automatic fashion.

Requirements: Familiar with "traditional" Computer Vision, C++ and/or Python

Drawing People with a Robot Arm

Mentor: Mario Laux

Email: mario.lauxspam

Description: In his recently completed bachelor thesis Adrian Müller developed a system, which can take the image of person, perform line detection operations on the image, convert the binarized image to vector line segments and finally draw the line segments with a robot arm. While this works well for features with high contrast, features with low contrast, like the nose in frontal images, are not well detected. The aim of this thesis is to train a deep neural network to recognize typical features of a human face in an image and to convert it into a line drawing sketch, improving the existing system. The sketch should then be drawn on a whiteboard using a Franka Emika Panda robot arm.

Requirements: C++, DNN, ROS

Implementation of Quantization Algorithms for Model Compression

Mentor: Rafia Rahim


Description: Deep Neural Networks based algorithms have brought huge accuracy improvements for stereo vision. However, they result in large models with long inference time. Our goal here is to implement algorithms for quantization and training of deep stereo vision algorithms for model compression. To this end, one part will involve writing algorithms for quantization and compression of existing state of the art deep stereo algorithms during training. The second part will focus on how to exploit model quantization and compression during inference time.

Requirements: good programming skills, deep learning knowledge.

Knowledge Distillation for the training of Lean Student Stereo Network

Mentor: Rafia Rahim


Description: Knowledge distillation is a way of transferring model capabilities from a deep computationally expensive network to a lean, compact and computationally efficient student network. The goal here is to explore knowledge distillation methods for training a lean student stereo network by distilling knowledge of state of the art 3-D teacher network. To this end, one will experiment with different knowledge distillation experiments for training of student networks.

Requirements: good programming skills, deep learning knowledge.

Multi-sensor object detection on UAVs using RGB and thermal footage

Mentor: Benjamin Kiefer


While RGB cameras are of high resolution, lighting conditions severly affect the performance in object detection from a UAV's point of view. On the other hand, thermal cameras are very robust towards towards different lighting conditions but have low resolution.

In this thesis, you should explore how the two sensors can be leveraged simultaneously to improve the performance of an object detector onboard a UAV for the scenario of maritime search and rescue. To this end, you are given video footage of the same scenes shot from an RGB and thermal camera simultaneously. By adapting simple out-of-the-box object detectors, the goal is to explore whether using an additional channel (thermal) helps detection performance. This should be done in a variety of different experiments.

Requirements: Knowledge in Deep learning and Computer Vision, Python (PyTorch)

How can zooming in improve object detection on UAVs?

Mentor: Benjamin Kiefer


Object detection on UAVs is hard since objects of interest often are very small due to high flying altitudes. As a result, objects are barely visible in the resulting footage, which cannot capture the object as the ground sample distance is too large.
In this thesis, you should explore the use of data coming from a camera with zoom-in functionality. Given data samples to experiment with, you should explore ways to include the zoomed-in data to enhance the object detection performance.

Requirements: Knowledge in Deep learning and Computer Vision, Python (PyTorch)

On the Necessity of Anchors in Object Detection

Mentor: Martin Meßmer

Email: martin.messmerspam

For a long time, anchor-based object detection has been the non-plus-ultra in the research community. Many well-known one-shot detectors, like YOLOv2 and v3, SSD, and most recently EfficientDet, employed anchors with great success. Since FCOS (2019, Zhi Tian et al.) doubts formed about the necessity of anchors in object detection. In this thesis, the student should have a theoretical and a practical look at both and compare the two approaches.

Requirements: basic deep learning knowledge, Python, good English or German

Racket tracking and 3D position estimation from stereo-event camera

Mentor: Thomas Gossard


Description: Professional table tennis players use the stroke movement to predict the ball's return trajectory and spin before it is even hit. Using a Convolutional Pose Machine to get the body pose gives a general idea of the stroke but is not enough. Indeed, there are different grips possible (Penhold, Shakehand and Seemiller) and the wrist angles are hard to estimate. These factor have a huge impact of the returned ball. Thus, estimating the 3D position of the racket is very useful. Frame-based camera estimation has already been implemented [1]. However, due to the high speed of the movement, the racket is quite blurry. To compensate for that, event-camera [2] can be used for sharper edge detection.

The objective of the thesis would be to first implement a stereo event-camera simulator using ESIM [3] to generate data with a moving 3D racket model. The racket should then be tracked[4] and its 3D position should be estimated (either using machine learning or classic computer vision[4]). The developed algorithm will be finally be tested on real data generated by our table tennis setup.

Requirements: Python, Pytorch, Computer Vision (triangulation,...), ROS experience is useful in order to use rviz


Exploiting dependencies of output variables in neural networks for multi-output regression.

Mentor: Valentin Bolz

E-Mail: valentin.bolz ( a t )

Description: Neural networks are powerful tools for performing regressions with multiple output variables. In this case, the neural network is trained on a data set with multiple output variables, each of which has a continuous state space. In this thesis, we aim to find different methods to incorporate the information about the dependencies of the output variables into the training process of the neural network. This could include loss function modifications and their influence on the gradient calculation or weight sharing between dependent variables.

Requirements: good mathematical foundation, programming skills (Python), basic knowledge in neural networks

Dealing with different output scales in multi-output regression.

Mentor: Valentin Bolz

E-Mail: valentin.bolz ( a t )

Description: In multi-output regression, multiple non-independent variables are approximated with a continuous state space. A fairly simple method is linear regression, while more advanced methods can be achieved by using neural networks. While it is well known that multiple regression methods work best on normalized input variables, the scale of the output variables also play an important role. The goal of this thesis is to investigate the robustness of the regression methods with respect to the scaling of the output values and to find out how the regression methods behave with different scaling attempts. This requires a thorough look at the mathematical background of the regression methods.

Requirements: very good mathematical foundation, programming skills (Python), basic knowledge in neural networks