Parallel Sequence Algorithms for PC-Clusters
"Landesschwerpunktprogramm"
In modern molecular biology, computers are used for many different tasks. Probably the most frequent application is searching for similarities in DNA or amino-acid sequences. Classical tools like FASTA and BLAST provide for fast and efficient search in large databases of annotated sequences when the query sequence is known. The search problem becomes much more complicated if the query cannot be precisely defined, but only described in fuzzy terms, and if the search is to be performed in not annotated and possibly non-coding ranges of a DNA sequence. Due to fuzziness of the query, efficient heuristics for string matching cannot be directly used and the computation effort is much higher.
In order to be useful in practice, the response time must be within
seconds or, in the worst case, minutes, to be acceptable for users. By
intelligently distributing the computation over a number of machines,
the response time can be sufficiently reduced. In this project, parallel
sequence analysis algorithms are developed and combined with advanced
heuristics, like neural networks and evolutionary algorithms for
non-deterministic high-probability pattern matching. The algorithms are
planned to run as a service on Kepler (depicted on the above photo),
a highly parallel cluster (98 Dual Pentium III PCs nodes with a
Myrinet interconnect), located at the University of Tübingen. Envisioned
applications are differential analysis of staphylococci genome and
molecular phylogenic of development-relevant transcription factors.
Additional resources
Java Webstart Prototyp (requires Java 1.4 or higher)
Documentation (PDF file, in German)
A screenshot of the GUI for sequence analysis
Project Members
- Computer Science Departments:
- Biology Departments:
Contact
Igor Fischer, Phone: +49 7071 29-77176, fischer@informatik.uni-tuebingen.de

