Algorithms in Bioinformatics

Book on Phylogenetic Networks

This is the official website for our book entitled "Phylogenetic Networks" by Daniel H. Huson, Regula Rupp and Celine Scornavacca, published by Cambridge University Press (2010).

Although many biologists believe that reticulate events such as hybridization, horizontal gene transfer, recombination and reassortment play an important role in evolution, most published studies use trees to represent the evolutionary history for the set of species studied. One reason for this is the lack of robust and accepted methods for inferring non-tree histories or phylogenetic networks. A lot of work has been done in recent years to address this problem.  Our work in this area was originally focused on developing methods for inferring unrooted phylogenetic networks and our computer program, SplitsTree, is currently the most widely-used software for the construction of phylogenetic networks. More recently, we have focused on developing methods for rooted phylogenetic networks, which we will make available via our  tree- and network drawing program Dendroscope.

Phylogenetic networks were one of the main topics of the four month research programme on Phylogenetics held at the Newton Institute of Cambridge University in 2007. There we formed the impression that the field of phylogenetic networks had advanced to a point where there was enough material to warrant a book that gives an introduction to the field and attempts to present the different questions and approaches in a unified manner. Our goal was to write a book that covers the field in a style that is accessible to bioinformatics, biologists that are interested in methods and algorithms and computer scientists that are interested in evolution. This book is the ideal companion to our SplitsTree4 and Dendroscope programs.

'Networks - rather than just trees - are fast becoming the essential tool for making sense of the complexities of evolution, and conflicting signal[s] in genomic data. Phylogenetic Networks provides a long-overdue exposition of network-based methods, their possible uses, and details on practical software. A detailed and unified treatment of the many different types of networks is complemented by a crisp synopsis of the underlying theory. Numerous example[s] and illustrations make the text easy to follow. This book will further transform the way biologists use genomic data to study evolution. The Tubingen group has led the development of phylogenetic network algorithms, and this book delivers a clear exposition for biologists bewildered by a plethora of recent methods, as well as for bioinformaticians aiming to develop the field further. It is essential reading for any scientist or student seeking to understand how genomic data can be used to represent and study the intricate 'web of life'.' Mike Steel, University of Canterbury.

'This textbook, by one of the leaders of the field (Daniel Huson) and his co-authors, provides a mathematically rigorous introduction to one of the most exciting and beautiful research areas in computational biology: phylogenetic networks. The text is clear and provides all the necessary biology background; it should be accessible to graduate students (or upper-division undergraduates) in mathematics, computer science, or statistics.' Tandy Warnow, University of Texas.

'This wonderfully accessible book is by far the most thorough and up-to-date treatment of phylogenetic networks about. Many evolutionary processes in nature do not conform to the simple model of phylogenetic trees; examples are hybridizations, symbioses, and lateral gene transfer. The more we probe nature with genomics, the more significant and numerous these examples become, so there is a real need for using networks in phylogenetics. This volume is a must for researchers working with phylogenetic networks. It is for an advanced college audience. Beautifully organized and clearly written, it really fills a void.' Bill Martin, University of Düsseldorf.

Table of contents
Part I Introduction 
1 Basics
1.1 Overview 
1.2 Undirected and directed graphs
1.3 Trees 
1.4 Rooted DAGs 
1.5 Traversals of trees and DAGs 
1.6 Taxa, clusters, clades and splits 
2 Sequence Alignment
2.1 Overview 
2.2 Pairwise sequence alignment 
2.3 Multiple sequence alignment 
3 Phylogenetic Trees 
3.1 Overview 
3.2 Phylogenetic trees 
3.3 The number of phylogenetic trees 
3.4 Models of DNA evolution 
3.5 The phylogenetic tree reconstruction problem 
3.6 Sequence-based methods 
3.7 Maximum parsimony 
3.8 Branch-swapping methods 
3.9 Maximum likelihood estimation 
3.10 Bootstrap analysis 
3.11 Bayesian methods 
3.12 Distance-based methods 
3.13 UPGMA 
3.14 Neighbor-joining
3.15 Balanced Minimum Evolution
3.16 Comparing trees
3.17 Consensus trees
3.18 The Newick format
4 Introduction to Phylogenetic Networks
4.1 Overview
4.2 What is a phylogenetic network?
4.3 Unrooted phylogenetic networks
4.4 Rooted phylogenetic networks
4.5 The extended Newick format
4.6 Which types of networks are currently used in practice?
Part II Theory
5 Splits and Unrooted Phylogenetic Networks
5.1 Overview
5.2 Splits
5.3 Compatibility and incompatibility
5.4 Splits and clusters
5.5 Split networks
5.6 The canonical split network
5.7 Circular splits and planar split networks
5.8 Weak compatibility
5.9 The split decomposition
5.10 Representing trees in a split network
5.11 Comparing split networks
5.12 T-Theory
6 Clusters and Rooted Phylogenetic Networks
6.1 Overview
6.2 Clusters, compatibility and incompatibility
6.3 Hasse diagrams
6.4 Cluster networks
6.5 Rooted phylogenetic networks
6.6 The lowest stable ancestor
6.7 Representing trees in rooted networks
6.8 Hardwired and softwired clusters
6.9 Minimum rooted phylogenetic networks
6.10 Decomposability
6.11 Topological constraints on rooted networks 148
6.12 Cluster containment in rooted networks
6.13 Tree containment
6.14 Comparing rooted networks
Part III Algorithms and Applications
7 Phylogenetic Networks from Splits
7.1 The convex hull algorithm
7.2 The circular network algorithm
8 Phylogenetic Networks from Clusters
8.1 Cluster networks
8.2 Divide-and-conquer using decomposition
8.3 Galled trees
8.4 Galled networks
8.5 Level-k networks from clusters
9 Phylogenetic Networks from Sequences
9.1 Condensed alignments
9.2 Binary sequences and splits
9.3 Parsimony splits
9.4 Median networks
9.5 Quasi-median networks
9.6 Median-joining
9.7 Pruned quasi-median network
9.8 Recombination networks
9.9 Galled trees
10 Phylogenetic Networks from Distances
10.1 Distances and splits
10.2 Minimum spanning network
10.3 Split decomposition
10.4 Neighbor-net
10.5 T-rex
11 Phylogenetic Networks from Trees
11.1 Consensus split networks
11.2 Consensus super split networks for unrooted trees
11.3 Distortion-filtered super split networks for unrooted trees
11.4 Consensus cluster networks for rooted trees
11.5 Minimum hybridization networks
11.6 Minimum hybridization networks and galled trees
11.7 Networks from multi-labeled trees
11.8 DLT reconciliation of gene- and species trees
12 Phylogenetic Networks from Triples or Quartets
12.1 Trees from rooted triples 272
12.2 Level-k networks from rooted triples
12.3 The quartet-net method
13 Drawing Phylogenetic Networks
13.1 Overview
13.2 Cladograms for rooted phylogenetic trees
13.3 Cladograms for rooted phylogenetic networks
13.4 Phylograms for rooted phylogenetic trees
13.5 Phylograms for rooted phylogenetic networks
13.6 Drawing rooted phylogenetic networks with transfer edges
13.7 Radial diagrams for unrooted trees
13.8 Radial diagrams for split networks
14 Software
14.1 SplitsTree
14.2 Network
14.3 TCS
14.4 Dendroscope
14.5 Other programs
376 pages, ~190 figures, ~80 exercises

Cambridge University Press, book
Most of the phylogenetic networks in the book were generated using our programs SplitsTree and Dendroscope. Source files for most of the figures are available here.
Errata (first printing)
Page 4: The condition that every edge is incident to two different nodes excludes parallel edges or multi-graphs (not multi-edges).
Page 6: Clarification: strongly connected requires a path from any one node to any other, so, in particular, two paths between each pair of nodes.
Page 121: Lemma 5.10.1: Philip Gambette pointed out that this result does not hold for general split networks. Indeed, the network shown in Figure 5.7b is a counterexample. We believe that this can be fixed as follows: The result holds for canonical split networks. It also holds for split networks constructed using the circular network algorithm.
Page 130: An incompatible cluster set is a set of clusters in which at least one pair of clusters is pairwise incompatible (and not, as erroneously stated, a set of clusters in which all clusters are pairwise incompatible).
Page 167: In Exercise 6.11.21, we erroneously state that the network displayed in Figure6.24c is a cluster network. This is not true since for this network the uniqueness condition of Definition 6.4.1 is not fulfilled. Please use  this network  instead (each node is    shown as a small disk).
Page 168: Exercise 6.12.2 is correct, but not easy... (Hint: use the result that the lca of any two nodes can be computed in constant time, after a linear amount of preprocessing (Harel and Tarjan 1984).)
Page 176: The last tripartition of tree T_2 is: ({b,c},\emptyset,{a,d,e})
Page 181: In the list of nested labels for the network N_2, remove the curly brackets around taxon a in each of the first three lines (thanks to Benjamin Albrecht for pointing this out).
The follow errors were pointed out by Simone Linz:
Page 275: 4th line of section 11.5.1: Script C should be plain C.
Page 278: 21st line: ... then that subtree is a component... should be ...then that subtree is contained in a component...
Page 281: Theorem 11.5.7. Should also state that if the acyclic agreement forest ist maximum with h components, then any phylogenetic network N on X that represents both T1 and T2 will have at least h-1 reticulations.
Page 282: 4th line: ...problem of finding an acyclic agreement forest... should be ... problem of finding a maximum acyclic agreement forest...
Page 283: 20th line, holds: should be hold: