Master Theses at the Chair of Cognitive Systems (Prof. Dr. Andreas Zell)

Students who want to take a master thesis should have attended at least one lecture of Prof. Zell and passed it with good or at least satisfactory grades. They might also have obtained the relevant background knowledge for the thesis from other, similar lectures.

Various Topics for LLMs and VLMs

Mentor: Benjamin Kiefer
Email: benjamin.kieferspam prevention@uni-tuebingen.de

Contact me for more information.

Requirements: Python programming, Some experience with PyTorch, No fear to work with robot

360° Computer Vision and 2D to 3D Projection

Mentor: Benjamin Kiefer
Email: benjamin.kieferspam prevention@uni-tuebingen.de

Delve into the immersive world of 360° computer vision in this Master thesis project. Focus on working with different lens models and establishing accurate 2D to 3D projection methodologies. The scope of the project will further extend to applying these techniques in real-world scenarios (such as autonomous navigation, virtual tours, or robotics). Candidates with an expertise in computer vision and a keen interest in practical applications are encouraged to apply. Embark with us on this 360° journey of expanding the horizon of computer vision technology.

Requirements: Python programming, basic robotics and computer vision background; Unity is a plus

Augmented Reality Applications for Boats

Mentor: Benjamin Kiefer
Email: benjamin.kieferspam prevention@uni-tuebingen.de

Engage in innovative research to create a robust AR application, ideally using Unity or Python. The focus of this Master thesis is on developing solutions for video stabilization, horizon detection, and the establishment of stable 2D to 3D projections. Applicants should possess knowledge in robotics and computer vision. This project provides an exceptional opportunity to delve into practical application and exploration in the rapidly evolving AR technology landscape. Join us as we navigate uncharted waters in this exciting field.

Requirements: Python programming, basic robotics and computer vision background; Unity is a plus

Segment Anything for Multispectral Image Data

Mentor: Hannah Frank
E-Mail: hannah.frankspam prevention@uni-tuebingen.de

With Segment Anything [0], Meta AI recently introduced a powerful new approach to instance segmentation and mask generation. However, it is restricted to RGB data. The goal of this thesis is to adapt their method for direct usage on multispectral recordings. For this, the student needs to extend the source code of the Segment Anything Model (SAM) to be applied to multiple channels (e.g., RGB + 2 NIR channels), and potentially fine-tune the model subsequently on this special kind of data. The adapted model should be evaluated on an existing multispectral data set or, ideally, multispectral recordings from our current research project. Also, the impact and advantage of using additional spectral channels should be analyzed.

[0] Kirillov et al., "Segment Anything", ArXiv preprint, 2022. (https://arxiv.org/abs/2304.02643)

Requirements: Deep learning, Python programming (esp. PyTorch)

Transfer Learning for Hyperspectral Image Classification

Mentor: Hannah Frank
Email: hannah.frankspam prevention@uni-tuebingen.de

Transfer learning [1] is a machine learning technique where knowledge gained from one task is applied to another related task. It has already shown promising results, especially in the case of limited labeled data. In this thesis, the student should research, implement and evaluate different transfer learning methods (such as pretraining and fine-tuning or multi-task learning) and apply it to hyperspectral image data from different sources (e.g., fruit ripeness, material characterization, and remote sensing data) to improve generalization and potentially also performance on the individual classification tasks.

[1] Pan SJ, Yang Q. "A Survey on Transfer Learning". IEEE Transactions on Knowledge and Data Engineering, 2010.

Requirements: Deep learning basics, Python programming and experience with PyTorch/TensorFlow

Spiking neural network for event-based ball detection

Mentor: Andreas Ziegler

Email: andreas.zieglerspam prevention@uni-tuebingen.de

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision.

So far, most learning approaches applied to event data, convert a batch of events into a tensor and then use conventional CNNs as network. While such approaches achieve state-of-the-art performance, they do not make use of the asynchronous nature of the event data. Spiking Neural Networks (SNNs) on the other hand are bio-inspired networks that can process output from event-based directly. SNNs process information conveyed as temporal spikes rather than numeric values. This makes SNNs an ideal counterpart for event-based cameras.

The goal of this thesis is to investigate and evaluate how a SNN can be used together with our event-based cameras to detect and track table tennis balls. The Cognitive Systems groups has a table tennis robot system, where the developed ball tracker can be used and compared to other methods.

Requirements: Familiar with "traditional" Computer Vision, Deep Learning, Python

Pushing an event-simulator towards its limit

Mentor: Andreas Ziegler

Email: andreas.zieglerspam prevention@uni-tuebingen.de

Event cameras are bio-inspired sensors that asynchronously report timestamped changes in pixel intensity and offer advantages over conventional frame-based cameras in terms of low-latency, low redundancy sensing and high dynamic range. Hence, event cameras have a large potential for robotics and computer vision.

Currently, a practical obstacle to adoption of event camera technology is the high cost of several thousand dollars per camera, similar to the situation with early time of flight cameras. In a recent project [1] we developed an event simulator which takes frames from a conventional frame-based camera as input and outputs events in real-time.

The goal of this thesis is to evaluate the limits of this event simulator by applying it in different real-time use cases. Two such scenarios are real-time object tracking of a fast moving object and balancing a ball with a robot arm on a 2D plane.

The student should to be familiar with „traditional“ Computer Vision and Robotics. A good command of C++ or Python from previous projects would be beneficial.

[1] A. Ziegler, D. Teigland, J. Tebbe, T. Gossard, and A. Zell, “Real-time event simulation with frame-based cameras.” arXiv, Sep. 10, 2022. Accessed: Dec. 09, 2022. [Online]. Available: arxiv.org/abs/2209.04634

Asynchronous Graph-based Neural Networks for Ball Detection with Event Cameras

Mentor: Andreas Ziegler

Email: andreas.zieglerspam prevention@uni-tuebingen.de

Event cameras are bio-inspired sensors that asynchronously report timestamped changes in pixel intensity and offer advantages over conventional frame-based cameras in terms of low-latency, low redundancy sensing and high dynamic range. Hence, event cameras have a large potential for robotics and computer vision.

State-of-the-art machine-learning methods for event cameras treat events as dense representations and process them with CNNs. Thus, they fail to maintain the sparsity and asynchronous nature of event data, thereby imposing significant computation and latency constraints. A recent line of work [1]–[5] tackles this issue by modeling events as spatio-temporally evolving graphs that can be efficiently and asynchronously processed using graph neural networks. These works showed impressive reductions in computation.

The goal of this thesis is to apply these Graph-based networks for ball detection with event cameras. Existing graph-based networks were designed for some more general object detection task [4], [5]. Since we only want to detect balls, in a first step, the student will investigate if a network architecture, targeted for our use case, could further improve the inference time.

The student should to be familiar with „traditional“ Computer Vision and Deep Learning. Experience with Python and PyTorch from previous projects would be beneficial.

[1] Y. Li et al., “Graph-based Asynchronous Event Processing for Rapid Object Recognition,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, Oct. 2021, pp. 914–923. doi: 10.1109/ICCV48922.2021.00097.

[2] Y. Deng, H. Chen, H. Liu, and Y. Li, “A Voxel Graph CNN for Object Classification with Event Cameras,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, Jun. 2022, pp. 1162–1171. doi: 10.1109/CVPR52688.2022.00124.

[3] A. Mitrokhin, Z. Hua, C. Fermuller, and Y. Aloimonos, “Learning Visual Motion Segmentation Using Event Surfaces,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, Jun. 2020, pp. 14402–14411. doi: 10.1109/CVPR42600.2020.01442.

[4] S. Schaefer, D. Gehrig, and D. Scaramuzza, “AEGNN: Asynchronous Event-based Graph Neural Networks,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, Jun. 2022, pp. 12361–12371. doi: 10.1109/CVPR52688.2022.01205.

[5] D. Gehrig and D. Scaramuzza, “Pushing the Limits of Asynchronous Graph-based Object Detection with Event Cameras.” arXiv, Nov. 22, 2022. Accessed: Dec. 16, 2022. [Online]. Available: arxiv.org/abs/2211.12324

Multi Object tracking via event-based motion segmentation with event cameras

Mentor: Andreas Ziegler

Email: andreas.zieglerspam prevention@uni-tuebingen.de

Event cameras are bio-inspired sensors that asynchronously report timestamped changes in pixel intensity and offer advantages over conventional frame-based cameras in terms of low-latency, low redundancy sensing and high dynamic range. Hence, event cameras have a large potential for robotics and computer vision.

Since event cameras report changes of intensity per pixel, their output resembles an image gradient where mainly edges and corners are present. The contrast maximization framework (CMax) [1] uses this fact by optimizing the sharpness of accumulated events to solve computer vision tasks like the estimation of motion, depth or optical flow. Most recent works on event-based (multi) object segmentation [2]–[4] applies this CMax framework. The common scheme is to jointly assign events to an objct and fit ting a motion model which best explains the data.

The goal of this thesis is to develop a real-time capable (multi) object tracking pipeline by applying multi object segmentation. After the student got familiar with the recent literature, a suitable multi object segmentation approach should be chosen and adjusted for our use case, namely a table tennis setup. Afterwards, different object tracking approaches should be developed, evaluated and compared against each other.

The student should to be familiar with „traditional“ Computer Vision. Experience with C++ and/or optimization from previous projects or coursework would be beneficial.

[1] G. Gallego, M. Gehrig, and D. Scaramuzza, “Focus Is All You Need: Loss Functions for Event-Based Vision,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, Jun. 2019, pp. 12272–12281. doi: 10.1109/CVPR.2019.01256.

[2] X. Lu, Y. Zhou, and S. Shen, “Event-based Motion Segmentation by Cascaded Two-Level Multi-Model Fitting.” arXiv, Nov. 05, 2021. Accessed: Jan. 05, 2023. [Online]. Available: http://arxiv.org/abs/2111.03483

[3] T. Stoffregen, G. Gallego, T. Drummond, L. Kleeman, and D. Scaramuzza, “Event-Based Motion Segmentation by Motion Compensation,” ArXiv190401293 Cs, Aug. 2019, Accessed: Jun. 14, 2021. [Online]. Available: http://arxiv.org/abs/1904.01293

[4] Y. Zhou, G. Gallego, X. Lu, S. Liu, and S. Shen, “Event-based Motion Segmentation with Spatio-Temporal Graph Cuts,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–13, 2021, doi: 10.1109/TNNLS.2021.3124580.

A Comparison of Robot Arm Motion Planners

Mentor: Mario Laux

Email: mario.lauxspam prevention@uni-tuebingen.de

Description: The aim of this thesis is to review, evaluate and compare different motion planners for robot arms. Suitable metrics have to be developed. The corresponding simulations and real-world experiments have to be analyzed statistically.

Requirements: C++, calculus, statistics, ROS, DNN, MOVEit

Exploiting Drone Metadata for Multi Object Tracking (MOT)

Mentor: Martin Meßmer

Email: martin.messmerspam prevention@uni-tuebingen.de

Although some deep learning methods like correlation filters and Siamese networks show great promise to tackle the problem of multi object tracking, those approaches are far from working perfectly. Therefore, in specific use cases, it is necessary to impose additional priors or leverage additional data. Luckily, when working with drones, there is free metadata to work with such as height or velocity of the drone. In this thesis, the student should develop some useful ideas on how to exploit this data to increase the performance of a MOT-model and also implement and compare those ideas with other approaches.

Requirements: deep learning knowledge, Python, good English or German

Robust Trajectory Prediction for Autonomous Driving

Mentor: Marcel Hallgarten

Email: marcel.hallgarten@uni-tuebingen.de

Prediction of future behavior of dynamic agents within a scene (given by a lane-graph) is a crucial task for safe (i.e. collision-free) autonomous driving. To this end, current state-of-the-art approaches take the scene context (e.g., lane graph, traffic-light states, position and extent of static objects, etc.) as well as context of agents within the scene (e.g., position, velocity, heading etc. within the last observed timesteps) as an input and predict how the future will unroll by predicting the most likely future trajectories for each agent.

While State-of-the-Art approaches yield impressive results w.r.t displacement errors and off-road rates on test-sets of various large-scale open-source datasets, they have been proven to be vulnerable to realistic adversarial examples These results suggest, that including the agent-history as feature causes the model to perform the prediction by extrapolating the past without taking the lane-graph into account sufficiently

The goal of this thesis is to evaluate different approaches to overcome this. Therefore approaches such as removing the agent history from the features or adding a multitask training objective to enforce a strong correlation of prediction and lane-graph (e.g., self-supervised lane graph completion based on prediction) should be evaluated.

Requirements: Knowledge in Deep Learning, Python (PyTorch)