Proseminar/Seminar: 3D Vision

This combined proseminar/seminar is on 3D Computer Vision. It can be taken by both Bachelor students (as Proseminar) and Master students (as Seminar). Students write and review reports and present a topic in the field of 3D vision in groups of 2 students.

Qualification Goals

Students gain a deep unterstanding of a scientific topic. They learn to efficiently search, navigate and read relevant literature and to summarize a topic clearly in their own words in a written report. Moreover, students present their topic to an audience of students and researchers, and provide feedback to others in the form of reviews and discussions. During the seminar, students learn to put scientific research into context, practice critical thinking and identify advantages and problems of a studied scientific method.


  • Course number: ML-4507
  • Credits: 3 ECTS (2h)
  • Total Workload: 90h
  • The seminar is held in a physical format in the MvL6 lecture hall. Students must bring a 3G proof, a mobile phone with QR scanning app, and their university credentials (username/password) for registering and contact tracing.
  • Presence is mandatory during all scheduled sessions


  • Report (5-6 pages, double column, excluding references)
  • Presentation (25-30 minutes, max. 20 slides)
  • Review of another report (1 page, double column)
  • Discussion (during all presentations)


  • Basic Computer Science skills: Variables, functions, loops, classes, algorithms
  • Basic Math skills: Linear algebra, analysis, probability theory
  • Basic knowledge of Deep Learning is beneficial, but not required


  • To participate in this seminar, you must register in the ILIAS booking pool


Links to Latex/Overleaf templates for reports, reviews and slides. Reports and reviews must use the corresponding template. Presentation slides can be done with other tools, e.g., PowerPoint, Keynote.





Introduction and Assignment of Topics and Reviews


Introduction to Scientific Reading, Writing and Presenting


No Seminar


TA Feedback Sessions (per group, via zoom)


No Seminar


TA Feedback Sessions (per group, via zoom)


Deadline for Initial Reports and Slides


Presentation 1 and 2, Deadline for all Reviews

Christmas Break


Presentation 3 and 4


Presentation 5


No Seminar


Deadline for Final Drafts and Slides


The students may choose among the following topics for their seminar paper.

1. Novel View Synthesis
Novel View Synthesis (NVS) addresses the problem of rendering a scene from unobserved viewpoints, given a number of RGB images and camera poses as input, e.g., for interactive exploration.

2. Generative models for 3D objects and scenes
Generative models like Generative Adversarial Networks allow generating images that resemble objects or scenes from the training dataset. However, by default, it is not possible to perform 3D manipulations like viewpoint changes or transformations of individual objects in the generated scenes. Recently, several works add inductive biases to generative models that enable 3D controllability.

3. 3D Reconstruction based on Deep Implicit Representations
A 3D reconstruction pipeline receives one or multiple RGB images and optionally depth maps and tries to infer the underlying geometry from these sparse inputs. Deep Implicit Representations compactly encode the geometry of a scene as the level set of a deep neural network. Recently, promising results have been achieved by adopting Deep Implicit Representations for 3D reconstruction.

4. 3D Reconstruction based on Multi-View Stereo
Multi-View Stereo (MVS) is a classical technique for dense 3D reconstruction that gets as input multiple RGB images of a scene and the corresponding camera poses. The MVS algorithm tries to find pixels in different views (images) that correspond to the same 3D point. Solving this correspondence problem allows estimating depth and normal maps that can be processed by subsequent stages of the reconstruction pipeline into a mesh. Recently, Deep Neural Networks have been used to improve the performance of MVS.

5. Semantic segmentation of 3D scenes
In semantic segmentation for 3D data the goal is to partition the input space into segments and assign semantic labels to each segment. Since in the majority of 3D vision applications input data is in the form of point clouds (e.g. LIDAR sensor data), input 3D space can be discretized to a regular grid of voxels, making the problem more similar to 2D image segmentation. With the popularization of coordinate MLPs, recent methods can be applied directly to 3D points and are able to efficiently process large point clouds of indoor and outdoor scenes.

6. 3D object detection (and tracking)
A classical task of 2D object detection predicts a 2D bounding box that localizes the object in the image space. In contrast, a 3D object detector should output a 3D bounding box that estimates the object's location, pose and size in 3D space which is especially useful for applications in self-driving, surveillance etc. Additionally, tracking detected objects across time might be of interest. 3D detection methods can work on RGB images, pointclouds or combine multiple modalities in order to solve the task. 

7. Optical and scene flow
Both optical and scene flow deal with the problem of estimating a dense motion field between two consecutive time-frames (e.g. two image captures) of a dynamic scene. Scene flow estimates for each pixel a 3D vector that represents the motion of the (scene) surface point visible in that pixel, while optical flow describes a 2D displacement of each pixel between two frames. 

8. Visual SLAM
Visual Simultaneous Localization And Mapping is a technique for estimating a pose of the agent and at the same time a reconstruction of the surrounding environment map using only visual input (images). Visual SLAM is a crucial component of many autonomous systems, however, despite the popularity and breadth of deep learning applications in computer vision, current state-of-the-art systems are still based on more traditional approaches (optimization, geometry etc.)