Projects
Current Research Topics
- Visual Analytics of Life Science Data
- Dimensionality Reduction
- Expression Analysis
- Transcriptome Analysis
- Machine Learning for Bacterial MS/MS Data
- Tübingen KI Zentrum für Mediziner (TüKITZMed)
- Computational Paleogenetics and Ancient Genomics
- Pan-Genomics
Current Cooperation Projects
- Omics analysis for the transregional collaborative research center TRR 261
- Bioinformatic analysis for the Cluster of Excellence "Controlling Microbes to Fight Infection" (CMFI)
- Genomic Landscape of Treponema pallidum
- Genomic and Transcriptomic Analyses of Enterohaemorrhagic Escherichia coli (EHEC)
- Post-translational modifications in bacteria
- Genomic and Transcriptomic Analyses of Lactococcus lactis
- Microbial Remediation of Overexploited Soils in Malawi
Former Projects
(to be completed)
Research Topics
Visual Analytics of Life Science Data
For a better understanding of complex biological data, our group develops interactive visualization tools that handle diverse fields, for example, multi-omics analysis to genomic and phylogenetic visualization.
TueVis: A central resource for visualization tools developed by visualization groups of IBMI Tuebingen.
OmicsTIDE: Omics Trend-comparing Interactive Data Explorer (OmicsTIDE) is an interactive visual analytics tool for the integration of transcriptomics and proteomics data. The tool offers a comparison of two data sets which share the same conditions. Trends of shared behavior of genes are extracted and compared visually between the two data sets using profile plots that are connected by a Sankey diagram. Since the data sets are clustered together, the genes can be grouped into genes with discordant and concordant behaviour in both sets, which represents an intuitive mental model when comparing two data sets.
GO-Compass: A visual analytics tool for the functional comparison of long lists of genes, originating, for example, from the analysis of differential expression in high throughput transcriptome data. The tool performs a GO (Gene Ontology) enrichment for each of the lists resulting in lists of GO terms. Lists of GO terms can be redundant, since they often contain parent and subcategories of functionality. The redundancy is reduced using a clustering based on semantic similarity. The results are visualized in an interactive dashboard, where users can interactively decide on the desired level of redundancy.
Evidente: A visual analytics tool for data enrichment in SNP-based phylogenetic trees, that allows the parallel interaction with the phylogenetic tree as well as with the underlying SNPs and metadata. Furthermore, it allows an enrichment analysis on the different phyl. clades to detect over-represented features within them. These could be genomic features such as GO-terms or taxonomic characteristics, such as antibiotic resistance.
BLASTphylo: allows users to run blast(n/p/x) for a query of interest and interactively visualizes the occurrence of the query across a taxonomy. Furthermore, it performs and visualizes a phylogenetic analysis of the blast hits.
Dimensionality Reduction
VIPurPCA: This tool offers a visualization of uncertainty propagated through principal component analysis (PCA), a widely used dimensionality reduction technique. It combines classic error propagation by linearization with the power of modern automatic differentiation. The tool visualizes the output uncertainty in an animated fashion such that researchers can assess the stability of the low-dimensional map. This project is part of the Cluster of Excellence Machine Learning in the Sciences.
Development of Methods for Expression Analysis
The understanding and analysis of the basic principles of gene expression and moreover gene regulation is still one of the open and unsolved problems in biology. We develop and apply algorithms and tools for the analysis and visualization of large-scale expression data.
Mayday and Mayday SeaSight provide a powerful workbench for microarray and next-generation sequencing technologies, such as a graphical user interface with a flexible and fully controllable approach for background correction, normalization and expression value computation from heterogeneous data.
DeSeq2-Vis is a shiny app that allows interactively using the DeSeq2 R package in a user-friendly web app without any programming expertise. DeSeq2 is an R package commonly used for calculating differential expression that requires some programming experience. In addition to providing the standard DeSeq2 methods, DeSeq2-Vis provides further normalization options, as well as the visualization of gene profiles.
Development of Methods for Transcriptome Architecture Analysis
Besides quantifying the gene expression, the identification and/or prediction of transcriptional features plays an important role in our research. For example, with TSSpredator we have developed a software for the automated detection and classification of TSS from 5’ enriched RNA-seq data. Moreover, TSSCaptur allows the characterization of TSS signals identified from any 5’ enriched RNA-seq data that cannot be allocated to known labelled genes. It predicts a plausible 3’ end and a function of the transcript. Furthermore, it runs a motif analysis on the promoter regions to identify known or novel transcription factor binding sites. With nocoRNAc, we provide a program for the prediction and characterization of ncRNA transcripts in bacteria, which is able to operate solely on the genomic sequence of the target organism.
Machine Learning for Bacterial MS/MS Data
Cells can modify proteins by attaching small molecules like phosphate or acetyl groups to the amino acids. In the past, these post-translational modifications (PTMs) have mostly been overlooked in bacteria, where they occur in lower abundance and are therefore harder to measure than in eukaryotes. Nevertheless, bacterial cells use this mechanism in a wide range of important cellular processes such as signal transduction, metabolism regulation and pathogenicity. In collaboration with the Compomics group headed by Lennart Martens (Ghent University) and the Quantitative Proteomics group headed by Boris Macek, we use as well as develop novel machine-learning based strategies to analyze Mass Spectrometry data in order to gain a deeper and more comprehensive understanding of the highly diverse epi-proteome of bacteria. This project is part of the Cluster of Excellence Machine Learning in the Sciences.
Tübingen KI Zentrum für Mediziner (TüKITZMed)
In this project, an online learning environment for medical personnel is developed. The main goal is to transfer knowledge about machine learning and artificial intelligence to different knowledge levels. Therefore, TüKITZMed provides videos with various difficulty levels. In addition, all videos are connected via a graph structure to guide the user through the learning environment. These interconnections allow the user to freely traverse through the video pool and find her/his personal learning path. Since knowledge about the ever-growing field of machine learning and artificial intelligence requires further reading, TüKITZMed also provides links to external knowledge resources.
The content of TüKITZMed is structured in fundamentals, advanced topics, and practical applications. Between all three areas exist connections and allow the user to start with an application, for example, before learning anything about the machine learning method in advance. Additionally, some applications can be used and tested in virtual reality (VR) as well as augmented reality (AR).
Computational Paleogenetics and Ancient Genomics
Cooperation Projects
Omics analysis for the transregional collaborative research center TRR 261
Cellular Mechanisms of Antibiotic Action and Production (ANTIBIOTIC CellMAP). As partners, we support the TRR261 with the generation, bioinformatics analyses, and interpretation of genomics and transcriptomics data to analyze bacterial adaptation.
Bioinformatic analysis for the Cluster of Excellence "Controlling Microbes to Fight Infection" (CMFI)
Researchers in the Cluster of Excellence Controlling Microbes to Fight Infections aim to find new, targeted agents which will have a positive effect on the microbiome. We know that useful bacteria help to keep down the harmful ones. In order to understand and exploit the underlying mechanisms, the Cluster of Excellence will bring together researchers from the fields of molecular, bioinformatics and clinical disciplines. Our group runs different bioinformatic analyses to reach the goals of this Cluster of Excellence.
Genomic Landscape of Treponema pallidum
Our group is part of an international research cooperation (with Natasha Arora (University of Zürich), Marta Diaz (University of Valencia), Fernando Gonzalo (University of Valencia), Justin Radolph (UConn Health), Kelly Hawley (UConn Health) and Jonathan Parr (UNC School of Medicine)) aiming for the development of a globally applicable syphilis vaccine. We reconstruct and analyze large amounts of genomes from short-read DNA extracted from clinical samples of individuals infected with the bacteria Treponema pallidum with a special focus on exploring the variability of outer membrane proteins.
Post-translational modifications in bacteria
Collaboration with the Compomics Group headed by Lennart Martens, aimed at applying open modification search methods to find PTMs in bacterial proteomics data. One of the international cooperations of the Cluster of Excellence Machine Learning in the Sciences
Genomic and Transcriptomic Analyses of Enterohaemorrhagic Escherichia coli (EHEC)
Collaboration with Herbert Schmidt (University of Hohenheim).
Genomic and Transcriptomic Analyses of Lactococcus lactis
Collaboration with Herbert Schmidt (University of Hohenheim).
Microbial Remediation of Overexploited Soils in Malawi
Collaboration with Herbert Schmidt (University of Hohenheim) and Keston Nijra (University of Malawi).
Former Projects
(to be completed)
Pathogenomics of Staphylococci
Friedrich Götz (Microbial Genetics, University of Tübingen), Ralph Bertram (Microbial Genetics, University of Tübingen), Jörg Bernhardt (Microbial Physiology and Molecular Biology, University of Greifswald)
In this collaboration project we computationally identified several non-coding RNAs in Staphylococcus equorum which putatively act as antisense RNAs in a type I TA system. With further in silico analyses we assessed their structural conservation as well as their RNA-RNA interaction potential with their target mRNAs. Publication within the project:
Schuster CF, Park JH, Prax M, Herbig A, Nieselt K, Rosenstein R, Inouye M, Bertram R.
Characterization of a mazEF toxin-antitoxin homologue from Staphylococcus equorum.
J Bacteriol 2013, 195(1):115-25.
Passage
Steffen Hüttner (HB Technologies), Michael Bonin (Microarray Facility Tübingen)
State-of-the-art RNA-seq protocols allows performing gene expression profiling of known genes, annotation of unknown transcripts, differential splicing analysis, variant calling and estimation of allele specific expression. The NGS technologies used for that produce tens of millions of reads , which, in turn, require substantial computing resources for subsequent analyses. One bottle-neck is the mapping step. For this not only powerful compute resources are needed but also a reference genome. PASSAGE, short for ‘Parallel Sequencing Systems for the Analysis of Gene Expression’ is a newly developed experimental protocol and computational methods.
morePASSAGE extends the idea of SAGE by sequencing reads originating only from well-defined genomic positions. This is achieved by using a specialized library preparation protocol, for which full-length cDNAs are synthesized and digested with RsaI.
We have developed an efficient algorithm that rapidly clusters reads from a common genomic locus and estimates expression levels for the corresponding transcripts in time linear to the number of read sequences. For this it does not need a reference genome, and therefore PASSAGE is an ideal system for high-throughput gene expression studies for non-model organisms.
PASSAGE is supported by the Zentrales Innovationsprogramm Mittelstand (ZIM) (AIF) to establish a full-service technology platform together with HB Technologies (Dr. Steffen Hüttner) and MFT (Dr. Michael Bonin). Publication within this project:
Battke F, Körner S, Hüttner S, Nieselt K.
Efficient sequence clustering for RNA-seq data without a reference genome
German Conference Bioinformatics 2010. Lecture Notes in Informatics. Proceedings of the German Conference on Bioinformatics 2010, Vol P-173, 21-30.
A social network for collaboration projects
Collact
Collact is a social network focusing on online collaboration, where users create project(s) and manage them from anywhere, at anytime. Collact helps people to get in contact with their colleagues, project partners, employees, students and run projects together. It offers a clean, simple and user-friendly interface and useful tools such as integrated QR-code generator, BibTeX importer and Twitter topic analyzer (Collact.me).