Algorithms in Bioinformatics

Bachelor Theses

Here is a listing of possible topics for projects. Please come and see us if you want to learn more about the topics. Further suggestions are welcome.


  • Phylogenetic networks on SARS-CoV-2 genomes- Systematically investigate the application of phylogenetic tree and network methods - AVAILABLE      Survey and compare different approaches to computing phylogenetic trees on SARS-CoV-2 genomes. Look into the application of phylogenetic network methods.
  • Application of AnnoTree in metagenomic analyses  - AVAILABLE
    AnnoTree is a visualization tool built on re-annotations of GTDB genomes. In addition to providing visualizations, the database containing annotations is also available. This thesis will explore the idea of using AnnoTree re-annotations in functional analysis of metagenome datasets, and compare it to existing KEGG and Pfam classifications.
  • Systematic analysis of metagenomic analysis based on AnnoTree  - AVAILABLE
    MEGAN and a few other tools have been used in functional analysis of metagenome data. There has not been a systematic study on how well these tools perform, though. The aim of this thesis is to take AnnoTree annotations as a ground-truth, simulate short and long reads out of AnnoTree genomes, and systematically evaluate the performance of functional metagenome analysis tools on simulated datasets.
  • Determining plant-growth promoting genes in incomplete genomes  - AVAILABLE
  • Analysis of long read metagenomic samples - Comparison of methods published in Arumugam et al ‎2019 and EL Moss et al ‎2020 - AVAILABLE
  • Textmining for PGPT traits in publicly available literature and comparison to the novel PGPT-Ontology - AVAILABLE
    The project aims at improving/updating the current collection of orthologous protein groups of plant growth-promoting traits (PGPTs) by developing an automatic approach, that mines publications (e.g. via WhatIzIt, iTextMine) for new-announced PGPTs. In more detail, it will report on their taxonomic and functional assignments, and their experimental proof.
  • Comparison of hybrid assemblers for short and long sequencing reads - AVAILABLE                                                                       Long read sequencing methods are useful to generate genome assemblies from sequencing data as the long reads are able to assemble repeating sequences better than short reads, however, they usually have a high per base error rate. This led to the development of hybrid assembly approaches using both short and long read data as input. The goal of the thesis is to test different hybrid assembly approaches on real data and evaluate the assemblies with various metrics.
  • Develop a Python API for MEGAN and use it to perform analysis of metagenome data - AVAILABLE                                             MEGAN is scriptable through the command line and the use of a scripting language like Python makes it possible to build complex and reproducible analysis workflows. The goal of the thesis is to write a Python package that gives the user access to the most important functions of MEGAN in a Python environment. As a proof of concept this API should then be used to conduct a metagenome analysis.


  • Analysis of metagenomic long-read sequences from a biorector - done
  • Distinguishing between human and microbial long reads - expired
  • Using Mash, Dashing and other k-mer approaches to compute phylogenetic networks on bacteria - done


  • Analyse unterschiedlicher Zellpopulationen von und nach Therapie - done
  • Improved taxonomic profiles based on simulated binning - done
  • Completeness, contamination and annotation of long read metagenome datasets - done


  • Analysis of antimicrobial resistance in microbiome sequencing data - done
  • Application of MetaCyc in microbiome analysis - done
  • Classification of phages and visualization -  done
  • Comparison of different KEGG-based microbiome analysis approaches - done


  • Development of a MobileApp/Website for advising proper antibiotics usage - done
  • Phage identification in metagenomics - done
  • Fast computation of consensus splits - done
  • Structural variants of Bacteroides vulgatus - done
  • Ecoli assembly - done
  • NanoChain: An Empirical Nanopore Read Simulator - done


  • Design and implementation of a web portal for microbiome data - done
  • Analysis of microbiome data in the context of Adipositas - done
  • Analysis of publicly available 16S human gut microbiome data - done
  • Deploying bioinformatics tools in the cloud using Docker -done
  • Steered Molecular Dynamics of 2fdt structure - done
  • Influence of Processing on 16S Analysis Data - done


  • Visualisierung taxonomischer Daten mit Voronoi Trees - done
  • Fast comparison of metagenomic samples - done
  • Functional analysis using BioCyce - done
  • Genes associated with diabetes in the microbiome -done


  • Comparision of metagenomic DNA, done
  • Clouding computing in bioinformatics, done
  • Android app for phylogenetic trees - done
  • Hybrid approach to analysis of 16S rRNA data - done


  • Transskriptomanalyse von Krebszellen, done


  • Vergleich von existierenden Metagenomics Pipelines -done
  • Analysis of ancient DNA - done


  • Plattform für Next-Gen Transkriptomanalyse- done