Research Topics

Visual Analytics of Life Science Data

For a better understanding of complex biological data, our group develops interactive visualization tools that handle diverse fields, for example, multi-omics analysis to genomic and phylogenetic visualization.

TueVis: A central resource for visualization tools developed by visualization groups of IBMI Tuebingen.

OmicsTIDE: Omics Trend-comparing Interactive Data Explorer (OmicsTIDE) is an interactive visual analytics tool for the integration of transcriptomics and proteomics data. The tool offers a comparison of two data sets which share the same conditions. Trends of shared behavior of genes are extracted and compared visually between the two data sets using profile plots that are connected by a Sankey diagram. Since the data sets are clustered together, the genes can be grouped into genes with discordant and concordant behaviour in both sets, which represents an intuitive mental model when comparing two data sets.

GO-Compass: A visual analytics tool for the functional comparison of long lists of genes, originating, for example, from the analysis of differential expression in high throughput transcriptome data. The tool performs a GO (Gene Ontology) enrichment for each of the lists resulting in lists of GO terms. Lists of GO terms can be redundant, since they often contain parent and subcategories of functionality. The redundancy is reduced using a clustering based on semantic similarity. The results are visualized in an interactive dashboard, where users can interactively decide on the desired level of redundancy.

Evidente: A visual analytics tool for data enrichment in SNP-based phylogenetic trees, that allows the parallel interaction with the phylogenetic tree as well as with the underlying SNPs and metadata. Furthermore, it allows an enrichment analysis on the different phyl. clades to detect over-represented features within them. These could be genomic features such as GO-terms or taxonomic characteristics, such as antibiotic resistance.

BLASTphylo: ​​allows users to run blast(n/p/x) for a query of interest and interactively visualizes the occurrence of the query across a taxonomy. Furthermore, it performs and visualizes a phylogenetic analysis of the blast hits.

Dimensionality Reduction

VIPurPCA: This tool offers a visualization of uncertainty propagated through principal component analysis (PCA), a widely used dimensionality reduction technique. It combines classic error propagation by linearization with the power of modern automatic differentiation. The tool visualizes the output uncertainty in an animated fashion such that researchers can assess the stability of the low-dimensional map. This project is part of the Cluster of Excellence Machine Learning in the Sciences.

Development of Methods for Expression Analysis

The understanding and analysis of the basic principles of gene expression and moreover gene regulation is still one of the open and unsolved problems in biology. We develop and apply algorithms and tools for the analysis and visualization of large-scale expression data.

Mayday and Mayday SeaSight provide a powerful workbench for microarray and next-generation sequencing technologies, such as a graphical user interface with a flexible and fully controllable approach for background correction, normalization and expression value computation from heterogeneous data.

DeSeq2-Vis is a shiny app that allows interactively using the DeSeq2 R package in a user-friendly web app without any programming expertise. DeSeq2 is an R package commonly used for calculating differential expression that requires some programming experience. In addition to providing the standard DeSeq2 methods, DeSeq2-Vis provides further normalization options, as well as the visualization of gene profiles.

Development of Methods for Transcriptome Architecture Analysis

Besides quantifying the gene expression, the identification and/or prediction of transcriptional features plays an important role in our research. For example, with TSSpredator we have developed a software for the automated detection and classification of TSS from 5’ enriched RNA-seq data. Moreover, TSSCaptur allows the characterization of TSS signals identified from any 5’ enriched RNA-seq data that cannot be allocated to known labelled genes. It predicts a plausible 3’ end and a function of the transcript. Furthermore, it runs a motif analysis on the promoter regions to identify known or novel transcription factor binding sites. With nocoRNAc, we provide a program for the prediction and characterization of ncRNA transcripts in bacteria, which is able to operate solely on the genomic sequence of the target organism.


Taxonomic classification describes the process of assigning a sequence to a specific node in the taxonomy on a pre-defined taxonomic rank, e.g. genus. This is particularly relevant in the field of metagenomics, for example, where the origin of millions of reads needs to be identified. Recently, we published a paper that uses k-mer frequencies, feature space balancing, and simple machine learning models to classify DNA sequences of fixed length (1500 nt) at superkingdom, phylum and genus level. We are currently developing an approach that classifies DNA sequences at species level.

Machine Learning for Bacterial MS/MS Data

Cells can modify proteins by attaching small molecules like phosphate or acetyl groups to the amino acids. In the past, these post-translational modifications (PTMs) have mostly been overlooked in bacteria, where they occur in lower abundance and are therefore harder to measure than in eukaryotes. Nevertheless, bacterial cells use this mechanism in a wide range of important cellular processes such as signal transduction, metabolism regulation and pathogenicity. In collaboration with the Compomics group headed by Lennart Martens (Ghent University) and the Quantitative Proteomics group headed by Boris Macek, we use as well as develop novel machine-learning based strategies to analyze Mass Spectrometry data in order to gain a deeper and more comprehensive understanding of the highly diverse epi-proteome of bacteria. This project is part of the Cluster of Excellence Machine Learning in the Sciences.

Tübingen KI Zentrum für Mediziner (TüKITZMed)

The use of artificial intelligence (AI) is becoming increasingly important, especially in the field of medicine. This development presents prospective medical professionals with the challenge of not only acquiring sound specialist knowledge in the medical field, but also developing qualified application expertise and orientation knowledge for the appropriate use of AI applications. In order to ensure the efficient use of AI in medical practice, targeted training of people who work with these technologies is essential. Understanding how AI systems work and how they can be integrated into everyday medical practice is crucial in order to optimize the benefits of these technologies. This is why the BMBF-funded "TüKITZMed" project has set itself the goal of developing and establishing a cross-faculty, interprofessional curriculum with a focus on "AI in medicine" in the long term.

The curriculum of TüKITZMed offers courses in mathematics and machine learning methods at different levels, depending on the goals of the participants. The extent to which machine learning methods are already being used in medical research and in everyday clinical practice is also part of the curriculum.

For more infos, visit TüKITZMED.

Computational Paleogenetics and Ancient Genomics

We develop automatic analysis pipelines that offer the mapping-based reconstruction of genomes from short-read data (see EAGER and MUSIAL) and that integrate specifically tailored aDNA methods. We also develop de novo assembly methods (see MADAM) that are tuned for DNA data from ancient bacteria.


Our group focuses on the research and development of the computation of a pan-genome (see PanGee) using our SuperGenome approach, and interactive visualization of pan-genomes (see PanTetris).


Cooperation Projects

Omics analysis for the transregional collaborative research center TRR 261

Cellular Mechanisms of Antibiotic Action and Production (ANTIBIOTIC CellMAP). As partners, we support the TRR261 with the generation, bioinformatics analyses, and interpretation of genomics and transcriptomics data to analyze bacterial adaptation.

Bioinformatic analysis for the Cluster of Excellence "Controlling Microbes to Fight Infection" (CMFI)

Researchers in the Cluster of Excellence Controlling Microbes to Fight Infections aim to find new, targeted agents which will have a positive effect on the microbiome. We know that useful bacteria help to keep down the harmful ones. In order to understand and exploit the underlying mechanisms, the Cluster of Excellence will bring together researchers from the fields of molecular, bioinformatics and clinical disciplines. Our group runs different bioinformatic analyses to reach the goals of this Cluster of Excellence.

Genomic Landscape of Treponema pallidum

Our group is part of an international research cooperation (with Natasha Arora (University of Zürich), Marta Diaz (University of Valencia), Fernando Gonzalo (University of Valencia), Justin Radolph (UConn Health), Kelly Hawley (UConn Health) and Jonathan Parr (UNC School of Medicine)) aiming for the development of a globally applicable syphilis vaccine. We reconstruct and analyze large amounts of genomes from short-read DNA extracted from clinical samples of individuals infected with the bacteria Treponema pallidum with a special focus on exploring the variability of outer membrane proteins.

Post-translational modifications in bacteria

Collaboration with the Compomics Group headed by Lennart Martens, aimed at applying open modification search methods to find PTMs in bacterial proteomics data. One of the international cooperations of the Cluster of Excellence Machine Learning in the Sciences

Genomic and Transcriptomic Analyses of Enterohaemorrhagic Escherichia coli (EHEC)

Collaboration with Herbert Schmidt (University of Hohenheim).

Genomic and Transcriptomic Analyses of Lactococcus lactis

Collaboration with Herbert Schmidt (University of Hohenheim).

Microbial Remediation of Overexploited Soils in Malawi

Collaboration with Herbert Schmidt (University of Hohenheim) and Keston Nijra (University of Malawi).

Former Projects

(to be completed)

Pathogenomics of Staphylococci

Friedrich Götz (Microbial Genetics, University of Tübingen), Ralph Bertram (Microbial Genetics, University of Tübingen), Jörg Bernhardt (Microbial Physiology and Molecular Biology, University of Greifswald)

In this collaboration project we computationally identified several non-coding RNAs in Staphylococcus equorum which putatively act as antisense RNAs in a type I TA system. With further in silico analyses we assessed their structural conservation as well as their RNA-RNA interaction potential with their target mRNAs. Publication within the project:

Schuster CF, Park JH, Prax M, Herbig A, Nieselt K, Rosenstein R, Inouye M, Bertram R.
Characterization of a mazEF toxin-antitoxin homologue from Staphylococcus equorum.
J Bacteriol 2013, 195(1):115-25.


Steffen Hüttner (HB Technologies), Michael Bonin (Microarray Facility Tübingen)
State-of-the-art RNA-seq protocols allows performing gene expression profiling of known genes, annotation of unknown transcripts, differential splicing analysis, variant calling and estimation of allele specific expression. The NGS technologies used for that produce tens of millions of reads , which, in turn, require substantial computing resources for subsequent analyses. One bottle-neck is the mapping step. For this not only powerful compute resources are needed but also a reference genome. PASSAGE, short for ‘Parallel Sequencing Systems for the Analysis of Gene Expression’ is a newly developed experimental protocol and computational methods.

morePASSAGE extends the idea of SAGE by sequencing reads originating only from well-defined genomic positions. This is achieved by using a specialized library preparation protocol, for which full-length cDNAs are synthesized and digested with RsaI.

We have developed an efficient algorithm that rapidly clusters reads from a common genomic locus and estimates expression levels for the corresponding transcripts in time linear to the number of read sequences. For this it does not need a reference genome, and therefore PASSAGE is an ideal system for high-throughput gene expression studies for non-model organisms.

PASSAGE is supported by the Zentrales Innovationsprogramm Mittelstand (ZIM) (AIF) to establish a full-service technology platform together with HB Technologies (Dr. Steffen Hüttner) and MFT (Dr. Michael Bonin). Publication within this project:

Battke F, Körner S, Hüttner S, Nieselt K.
Efficient sequence clustering for RNA-seq data without a reference genome
German Conference Bioinformatics 2010. Lecture Notes in Informatics. Proceedings of the German Conference on Bioinformatics 2010, Vol P-173, 21-30.


A social network for collaboration projects


Collact is a social network focusing on online collaboration, where users create project(s) and manage them from anywhere, at anytime. Collact helps people to get in contact with their colleagues, project partners, employees, students and run projects together. It offers a clean, simple and user-friendly interface and useful tools such as integrated QR-code generator, BibTeX importer and Twitter topic analyzer (