Searches for variable motifs, like protein binding sites or promotor regions are more complex than the search for casual motifs. On amino acid sequences comparing motifs alone mostly proves to be insufficient to detect regions which represent proteins with a special function, because the function depends on biochemical properties of individual amino acids (such as polarity, hydrophobicity, and electric charge). Pure string matching programs are not able to find these motifs. Hence, we propose a software tool that combines the search for motifs with certain structural properties, the verification of biochemical properties, and an approximate search mechanism. Because it is very difficult to describe such motifs exactly, the tool supports a step by step creation of this description by allowing to search on previously obtained results. The description itself is a query language based on regular expressions and extended by the possibility to formulate conditions on biochemical properties. In order to be useful in practice, the response time must be within seconds or, in the worst case, minutes, to be acceptable for users. By intelligently distributing the computation over a number of machines, the response time can be sufficiently reduced. In this project, parallel sequence analysis algorithms are developed. The algorithms are planned to run as a service on the Kepler Cluster, a highly parallel cluster (98 Dual Pentium III PCs nodes with a Myrinet interconnect), located at the University of Tübingen.
The actual version of ParSeq can be used to make searches on your local computer using raw sequence files. Within the next months, we will provide the possibility to integrate remote computer capacities like e.g. a parallel computer like the Kepler-Cluster or a casual workstation-pool. The user will be able to start either a local or a remote search session from the same user interface. If you want to test the possibilities of searching motifs with our program, you will find a Java-Web-Start link at the end of this page.