ParSeq: A Software Tool for Searching Motifs with Structural and Biochemical Properties in Biological Sequences

Project and Members

The following departments of the university of Tuebingen are involved in the research project, which is supported by the "Landesschwerpunktprogramm" of Baden-Wuerttemberg, Germany.

Computer Science Departments:
- Computer Engineering
- Theoretical Computer Science
- Parallel Computation
- Computer Architecture
Biology Departments:
- Animal Genetics
- Microbe Genetics

Abstract

Searches for variable motifs, like protein binding sites or promotor regions are more complex than the search for casual motifs. On amino acid sequences comparing motifs alone mostly proves to be insufficient to detect regions which represent proteins with a special function, because the function depends on biochemical properties of individual amino acids (such as polarity, hydrophobicity, and electric charge). Pure string matching programs are not able to find these motifs. Hence, we propose a software tool that combines the search for motifs with certain structural properties, the verification of biochemical properties, and an approximate search mechanism. Because it is very difficult to describe such motifs exactly, the tool supports a step by step creation of this description by allowing to search on previously obtained results. The description itself is a query language based on regular expressions and extended by the possibility to formulate conditions on biochemical properties. In order to be useful in practice, the response time must be within seconds or, in the worst case, minutes, to be acceptable for users. By intelligently distributing the computation over a number of machines, the response time can be sufficiently reduced. In this project, parallel sequence analysis algorithms are developed. The algorithms are planned to run as a service on the Kepler Cluster, a highly parallel cluster (98 Dual Pentium III PCs nodes with a Myrinet interconnect), located at the University of Tübingen.

The actual version of ParSeq can be used to make searches on your local computer using raw sequence files. Within the next months, we will provide the possibility to integrate remote computer capacities like e.g. a parallel computer like the Kepler-Cluster or a casual workstation-pool. The user will be able to start either a local or a remote search session from the same user interface. If you want to test the possibilities of searching motifs with our program, you will find a Java-Web-Start link at the end of this page.

Screenshots

A screenshot of the GUI for sequence analysis

Download

On this site you can download or start the first version of ParSeq that was published in the Bioinformatics Journal. If you are interested in using the most actual version, please follow this link for the latest version. Otherwise, if you want to use the version described in the Bioinformatics Applications Note then please use the software provided below.
The software is deployed using Java Web Start technology. Please refer to http://java.sun.com/products/javawebstart/ for more information about Java Web Start.

Sequential version of ParSeq (Java Web Start Application) (requires Java 1.4 or higher)

Anwenderdokumentation (PDF Datei)

User Documentation (PDF file)