Keynote Speakers

Niko Beerenwinkel

ETH Zurich

"Computational analysis of tumor single-cellsequencing data"

Cancer progression is an evolutionary process characterized by the accumulation of genetic alterations and responsible for tumor growth, clinical progression, and drug resistance development. We discuss how to reconstruct the evolutionary history of a tumor from single-cell sequencing data and present probabilistic models and efficient inference algorithms for mutation calling and learning tumor phylogenies from mutation and copy number data. We present methods for integrating single-cell DNA and RNA data obtained from tumor biopsies and for detecting common patterns of tumor evolution among patients, including re-occurring evolutionary trajectories and clonally exclusive mutations.

David Bryant

University of Otago, Dunedin

"The Coalescent and Deep Phylogeny"

The coalescent is an exceptionally useful model for the genealogies of individuals in the same species or within closely related species. It is curious, then, that methods based on the coalescent have become a standard tool for deep phylogenetics, even supplanting traditional approaches. In this talk I will use a combination of mathematical and statistical thought experiments to argue that ancient phylogenetics and recent phylogenetics are really quite different endeavours.

Caroline Friedel

Ludwig-Maximilians-Universität, Munich

"Be careful what you look at: lessons for transcriptomics analyses from HSV-1 infection"

Herpes simplex virus 1 (HSV-1) is one of nine herpesviruses infecting humans and commonly known for causing cold sores at the lips. A characteristic of HSV-1 lytic infection is the induction of a profound host shut-off. A key role in this process is played by the HSV-1 vhs protein, an endonuclease that cleaves both cellular and viral mRNAs but not circular RNAs (circRNAs). In addition, HSV-1 massively downregulates host transcriptional activity and leads to a widespread disruption of transcription termination. This results in read-through transcription for tens-of-thousands of nucleotides beyond poly(A) sites and into downstream genes. In this talk, I will discuss how these characteristics of HSV-1 infections can - and have - mislead standard transcriptome analysis if not properly taken into account.  While HSV-1 infection represents a unique combination of these effects, most of these are present in other conditions, e.g. cellular stress or other virus infections, and can massively bias corresponding analyses and interpretation of results.

Nils Gehlenborg

Harvard Medical School, Boston

"A Fresh Look at Genomics Data with Grammar-Based Visualization"

Visualization of genomics data for exploration and communication has a long history in molecular biology. Over the years, dozens of techniques and hundreds of tools to view and explore genomics data have been developed. This rich set of tools and techniques demonstrates the importance of data visualization in genomics. However, it also poses significant challenges for data analysts, who often need to convert between different data formats and use multiple tools for their analysis tasks. To address these challenges, we designed the Gosling visualization grammar (http://gosling-lang.org) that can be used to generate virtually any previously described interactive visualization technique for genome-mapped data. I will explain how we developed Gosling and introduce the tool ecosystem that we built to support Gosling-based visualizations. Finally, I will propose opportunities for future research in genomics data visualization.

Jean-Pierre Hubaux

École polytechnique fédérale de Lausanne

"GDPR-Compliant Federated Learning for Health Data"

Frequently, datasets are siloed, notably for data protection reasons. We provide secure and privacy-preserving federated learning that supports the training of an ML model on siloed datasets. This software technique is based on a combination of fully homomorphic encryption and secure multi-party computation. We show that it works at scale, notably for the computation of Kaplan-Meier survival curves, for genome-wide association studies and for single-cell analysis. This approach preserves the confidentiality of each institutions’ input data, of any intermediate values, and of the trained model parameters.

Manja Marz

Friedrich Schiller University, Jena

"Sequence based bioinformatical approaches to work with RNA viruses"

Viruses are bioinformatically in general understudied objects. That is particularly interesting, because most of the available tools generated for bacteria, eukaryotes or other organisms are not applicable to viruses. This ranges from simple multiple sequence alignments, to genome annotations, RNA secondary structure information, and phylogenetic information up to NN-based algorithms for virus classification and host classification. In this talk, we will learn about recently developed methods from the AG Marz, as well as tools developed within the European Virus Bioinfomratics Center (EVBC).

Hans-Ulrich Prokosch

Friedrich-Alexander-Universität, Erlangen-Nuremberg

"Large Data Sharing Initiatives and Infrastructures in Germany"

In the last decade numerous large data and biosample sharing initiatives have been initiated in Germany (German Biobank Node/Alliance, Medical Informatics Initiative, Network University Medicine CODEX, ...), all of them defining organizational data sharing frameworks/regulations and establishing new IT-infrastructures. The talk will illustrate thosee initiatives, their conceptual approaches and IT architectures, as well as current attempts to create synergies between the initiatives.

Matthias Rarey

University of Hamburg

"Good algorithms might help: Practical examples from early drug discovery"

Addressing challenging problems in early-phase drug discovery require the full arsenal of computer science. Recently, machine learning in pharmaceutical research gained a lot of attention, but this is only one successful technology. Efficient data management systems, combinatorial algorithms and optimization can have a substantial impact for the development of next generation scientific software solutions. In this presentation, a few illustrative examples ranging from navigation in chemical space to the modeling of structure-activity relationships will be shown.

Oliver Stegle

German Cancer Research Center, Heidelberg

"From genotype to phenotypewith single-cell resolution"

Olga Vitek

Northeastern University, Boston

"Statistical methods and tools for mass spectrometry-based proteomics"

Mass spectrometry-based proteomics studies proteins in complex biological mixtures. Statistical experimental design and analysis are key for this field, as they allow us to reduce bias and inefficiencies, distinguish the systematic variation from random artifacts, and maximize the reproducibility of the results. The talk will discuss strategies for leveraging modern statistical and machine learning techniques for design and interpretation of these experiments. We will show that approaches that specifically account for the properties of these data improve upon the standard off-the-shelf statistical and machine learning methods. Finally, the talk will overview the implementations of the specialized methods in the open-source software developed by our lab.

Tilmann Weber

DTU Biosustain, Lyngby

"Mining soils for drugs (and more…) -
Integrating Informatics and Metabolic Engineering for the discovery of novel Natural Products"


Genome analyses of many microorganisms but also higher organisms indicate that the genetic potential to synthesize specialized metabolites is far beyond the number of molecules observed in traditional screenings. With the availability of cheap and easy-to-obtain whole genome sequences, in silico genome mining has become an indispensable tool to complement the classical chemistry-centered approach to identify and characterize novel secondary / specialized metabolites. Since the initial release in 2011, the open source genome mining pipeline antiSMASH(1) (https://antismash.secondarymetabolites.org), which we develop in collaboration with the group of M. Medema (U. Wageningen, Netherlands) and many international contributors, has become one of the most widely used tools. We are currently working on version 7 of antiSMASH, including the detection of novel BGC families, new visualizations and further improvement. Specialist and non-specialist users can easily analyze genomic sequences for the presence of secondary metabolite biosynthetic gene clusters with antiSMASH.  

To provide extensive analysis options of the data generated with antiSMASH, we have extended the framework with several databases (2-4). The antiSMASH database, (https://antismash-db.secondarymetabolites.org/)(2), contains 147,517 high quality BGC regions from 388 archaeal, 25,236 bacterial and 177 fungal “high-quality” genomes.  

These genome mining technologies build the foundation of further in silico studies towards a more comprehensive “Genome Analytics” platform, which we use to streamline our natural product discovery and characterization efforts.  

Albeit streptomycetes are studied for many decades as proficient producers of bioactive compounds, there are still severe limitations concerning efficiency of mutagenesis protocols that often hamper systems metabolic engineering and Synthetic Biology approaches. We have therefore developed an extensive CRISPR/Cas9-based toolkit (5-8) for streptomycetes that now also includes tools that utilize multiplexing and DSB-free base editing technology to highly effectively engineer actinomycetes.