Bachelor Theses at the Chair of Cognitive Systems (Prof. Dr. Andreas Zell)

Students who want to take a bachelor thesis should have attended at least one lecture of Prof. Zell and passed it with good or at least satisfactory grades. They might also have obtained the relevant background knowledge for the thesis from other, similar lectures.

Open Topics

Predict the ripeness level of fruit with Deep Learning

Mentor: Leon Varga

Email: leon.vargaspam

In previous research, we have shown that you can predict the ripeness level of fruit with hyperspectral imaging and deep learning. 
The cameras used record a wide range of wavelengths and are therefore not readily usable.

In this thesis, you should evaluate different techniques to identify the important wavelengths. Further it should be valted, whether a simpler camera could
reach a similiar accuracy.

Requirements: Linux, Python, Basic understanding of neural networks

Actor-Critic performance for a robotic table tennis simulation

Mentor: Jonas Tebbe


Description: Our table tennis robot learns to adapt its stroke using an actor-critic reinforcement learning model. The student is to adapt the algorithm to the OpenAI gym. On a simplified physical simulation, the performance of our custom model and the OpenAI baseline models shall be compared. In contrast to comparisons in the literature, the focus should be on learning accuracy using only a limited amount of data. In our case, that means a dataset of under 1000 table tennis strokes.

Requirements: Lecture on machine learning / neural networks.

Visualizing the responsiveness of a table tennis robot

Mentor: Jonas Tebbe


Description: In order for our table tennis robot to hit the ball, the future trajectory of the ball is predicted. With more measurements of the flying ball, the prediction becomes more and more accurate. In order to analyze the bottlenecks for the hitting accuracy, the motion of the robot should be considered in a simplified fashion. The student should analyze the following questions: How does the accuracy of the prediction, improving over time, influence the final hitting result? How responsive is the final hitting result against sudden changes of the prediction accuracy for the better or for the worse?

Requirements: Lecture on robotics helpful.

Drawing People with a Robot Arm

Mentor: Mario Laux

Email: mario.lauxspam

Description: In his recently completed bachelor thesis Adrian Müller developed a system, which can take the image of person, perform line detection operations on the image, convert the binarized image to vector line segments and finally draw the line segments with a robot arm. While this works well for features with high contrast, features with low contrast, like the nose in frontal images, are not well detected. The aim of this thesis is to train a deep neural network to recognize typical features of a human face in an image and to convert it into a line drawing sketch, improving the existing system. The sketch should then be drawn on a whiteboard using a Franka Emika Panda robot arm.

Requirements: C++, DNN, ROS

Salient Object Detection with Boolean Maps

Mentor: Daniel Weber


Description: Computing saliency maps has been shown to be beneficial for applications like image segmentation. A simple and efficient bottom-up approach is the Boolean Map based Saliency model (BMS) proposed in [Zhang, ICCV2013]. The goal of this thesis is to implement BMS in Python or C++on Linux in order to detect unknown objects, where no data set is available and a neural network approach is not possible. Additionally it shall be tested if the approach presented in the paper can be extended from RGB images to RGBD images.

Requirements: Good programming skills

Hand Gestures following robot using Deep Learning

Mentor: Hamd ul Moqeet Riaz


Description: The aim of this thesis is to investigate the application of Deep learning techniques for recognizing human hand gestures. A turtlebot would follow the hand instructions predicted by the trained neural network. The robot must follow at least five hand gestures. Either hand gesture data can be collected or a pre-trained network can be employed. The trained network must be tested on a NVIDIA Jetson Tx-2 module for real time operation.

Requirements: Knowledge in Deep learning and computer vision, Programming in Python (Tensorflow/PyTorch), Basic understanding of ROS and mobile robots

Benchmarking FourierNet on limited hardware

Mentor: Hamd ul Moqeet Riaz


Description: FourierNet is an instance segmentation network, which utilizes a Fourier series to decode the shape (mask) of an object from compressed feature map. This thesis aims at training various configurations of FourierNet on different datasets and testing on limited hardware for real-time operation. FourierNet should be trained on PASCAL VOC and MS COCO with various input scales. The mean average precision (mAP) should be evaluated on both datasets. The speed of all these networks should be tested on an Nvidia Jetson TX2 or Xavier board, and a suitable candidate must be recommended.

Requirements: Knowledge in Deep learning and computer vision, Python (PyTorch)

Machine Learning for 6D Pose Estimation

Mentor: Timon Höfer


Description: One of the most important components of modern computer vision systems for applications such as mobile robotic manipulation and augmented reality is a reliable and fast 6D object detection module. 6D pose estimation is the task of detecting the 6D pose of an object, which include its location and orientation, where the D stands for degrees of freedom.

In this project, you will have a look at state of the art pose estimators and test them on different publicly available datasets. Optionally you could do tests with our own cameras and get qualitative results of different methods.

Requirements: Basic Python, experience with Github is useful

Camera Absolute Pose Estimation in Urban Environment for UAVs

Mentor: Chenhao Yang


Description: We have collected a UAV image database, covering a variety of typical outdoor environments in urban areas, which can be applied to deep learning training for UAV outdoor localization. The database contains the images collected by drones of different heights. Through the structure from motion method, the 6D camera poses (3D position, 3D orientation) corresponding to each image are obtained. Based on the database, we are planning to combine the camera relative pose estimation with the nearest neighbor image retrieval system to achieve a complete camera absolute pose estimation.

Requirements: deep learning experience, Python or C++ experience

On the Necessity of Anchors in Object Detection

Mentor: Martin Meßmer

Email: martin.messmerspam

For a long time, anchor-based object detection has been the non-plus-ultra in the research community. Many well-known one-shot detectors, like YOLOv2 and v3, SSD, and most recently EfficientDet, employed anchors with great success. Since FCOS (2019, Zhi Tian et al.) doubts formed about the necessity of anchors in object detection. In this thesis, the student should have a theoretical and a practical look at both and compare the two approaches.

Requirements: basic deep learning knowledge, Python, good English or German

An evaluation of multiple regression methods

Mentor: Valentin Bolz

Email: valentin.bolzspam

Description: There are multiple frameworks and libraries, which offer a variety of different regression tools. The goal of this thesis is to apply the most common techniques, such as linear and polynomial regression, random forests or neural networks, to datasets in which the target variables are continuous. Furthermore, a mathematical formulation of these techniques and an evaluation of their advantages and disadvantages among each other should be given.

Requirements: good mathematical skills, Python programming skills, basics in neural networks

Accelerating DNN training Using Weight Extrapolation

Mentor: Benjamin Kiefer


Description: Nowadays, the Backpropagation Algorithm (BP) has proven to be a robust tool for training deep neural networks (DNNs). However, training times of DNNs grow larger and larger due to the increasing data set sizes. In this thesis, the student should explore an accelerating technique based on weight extrapolations. In particular, different ways of how extrapolations can be applied and their effects on various modern DNN architectures shall be studied.

Requirements: Basic knowledge in Deep Neural Networks. Basis knowledge in PyTorch or Tensorflow is beneficial.

Development of Human-Machine Interface Prototype for the Visualization of Object Detectors

Mentor: Benjamin Kiefer



The field of artificial intelligence is growing fast. Despite its impressive results in very specific areas such as image classification or object detection, it is often unclear how these algorithms can be leveraged to help humans. However, a machine can only ever be as good as its interface to humans. In most safety critical systems, rather than having a completely autonomous system, which provides no insights to a user, a human-machine symbiosis in the form of a human-machine interface can enable operators to understand, coordinate and control further actions. Related to explainable artificial intelligence, these systems lay the foundation for a wider acceptance of artificial intelligence and machine-learning-based algorithms in many use-cases.

In this bachelor thesis, the student is asked to build a simple human-machine interface for the task of object detection. The use-case is as follows:

An underlying high-resolution video stream aboard a UAV is compressed such that it can be sent to a ground-station in real-time. Along with this compressed stream, a set of coordinates depicting interesting regions (which already have been computed with a light-weight neural network aboard the UAV) is also sent to the ground station. These regions of interest (which cannot be too large) are extracted from the original video stream and also sent to the ground station in full resolution. The human-machine interface should visualize the regions of interest in full quality along with the highly compressed complete stream. The regions of interest are furthermore processed in a large deep neural network to perform object detection at an object-size level. The human-machine interface should be dynamically changeable in that the operator can shift and resize the region of interest or define their own.

In the thesis, the student is given several exemplary videos and optionally (trained) neural networks serving as a testbed.


Requirements: The student should have basic knowledge in deep neural networks and computer vision. The interface can be programmed in Python, Java or C++.

Empirical upper bound in 3D object detection

Mentor: Nuri Benbarka


Description: Object detection in 3D space is an essential task for mobile robots. In 3D object detection, there are usually three steps: generating 3D proposals, classification and bounding box regression of those proposals. In this thesis, we will try to focus on the second part only, where we will classify the ground truth bounding boxes using state-of-the-art networks for point clouds, like PointNet and others.

Requirements: Experience with Python and interest to learn PyTorch.

Implementation of Quantization Algorithms for Model Compression

Mentor: Rafia Rahim


Description: Deep Neural Networks based algorithms have brought huge accuracy improvements for stereo vision. However, they result in large models with long inference time. Our goal here is to implement algorithms for quantization and training of deep stereo vision algorithms for model compression. To this end, one part will involve writing algorithms for quantization and compression of existing state of the art deep stereo algorithms during training. The second part will focus on how to exploit model quantization and compression during inference time.

Requirements: good programming skills, deep learning knowledge.

Knowledge Distillation for the training of Lean Student Stereo Network

Mentor: Rafia Rahim


Description: Knowledge distillation is a way of transferring model capabilities from a deep computationally expensive network to a lean, compact and computationally efficient student network. The goal here is to explore knowledge distillation methods for training a lean student stereo network by distilling knowledge of state of the art 3-D teacher network. To this end, one will experiment with different knowledge distillation experiments for training of student networks.

Requirements: good programming skills, deep learning knowledge.

Exploring Deep Semi-Supervised Learning Methods on non-typical datasets

Mentor: Axel Fehrenbach
Email: axel.fehrenbachspam

Description: In recent years Deep Semi-Supervised Learning has been very successful in reducing the amount of labeled data needed while still retaining good results on image classification tasks. In this thesis, the student would compare Deep Semi-Supervised methods on a task and dataset outside the usual scope of image classification and datasets such as CIFAR-10.

Requirements: basic deep learning knowledge, basic python knowledge

Feature selection in IR Spectral data for Quality Control

Mentor: Axel Fehrenbach
Email: axel.fehrenbachspam

Description: Infrared spectral data analysis is a successful approach for quality control, from plastics over plants to pharmaceuticals. While traditionally a small number of wavelength bands are selected and features are computed, which are fed into a classifier (like SVM, ANN, Random Forest, kNN etc.), deep convolutional neural networks (DCNNs) usually work well on the raw input data. This thesis should investigate at least three different methods of wavelength feature selection on plastics data, and compare them with DCNNs, in training time, recall time and accuracy.

Requirements: basic deep learning knowledge, basic python knowledge

Fusion of point clouds obtained from two depth sensors

Mentor: Dr. Faranak Shamsafar

Email: faranak.shamsafarspam

Description: In the last decade, different types of depth cameras have been introduced for various applications. The goal of this thesis is to develop an efficient algorithm to fuse the point clouds obtained by two cameras from two different viewpoints in order to have a better depth perception. Both of the depth cameras are a recent active TOF ranging device, which is called "Azure Kinect DK". For this, at least two existing point cloud registration strategies should be tested on small plastic parts, and the better one should be further improved. The improvement of the combined point clouds versus the individual ones should be evaluated.

Requirements: Good programming skills in Python, C++ (helpful), Experience in DNN (helpful)