The Helix research group
Research themes
Work in progress and results
Publications
Software and databases
News from Helix
The Helix research group > Research activities
Home page
Site map Mail to Helix
String algorithmics
When macromolecules are seen as strings over a 4 or 20-letter alphabet
 

At a first level, biological molecules may be seen as strings over a 4 or 20-letter alphabet, and many classical problems in string algorithms, such as string comparison or pattern matching, have therefore found applications in computational biology.

Biology presents however both characteristics and problems that are peculiar to it. Among the main characteristics is the fact that biological molecules can mutate up to a point, or present a certain degree of variability without losing the function(s) encoded by its sequence of nucleic or amino acids. Various models of approximate matching are therefore required when dealing with biological sequences.

Concerning problems, there are many more things that are unknown than the opposite. Instead of having to look for occurrences of known patterns, one is therefore often confronted with the problem of having to infer the existence of patterns with "unusual" characteristics of which very little, or even nothing may be known at start.

Although such problems as pattern (also called motif in a more general context) inference are often categorized as syntactic parsing problems or first level syntactic annotation, the more advanced current research in the area is already trying to integrate relationnal information into the inference process.

 
In the same section
Comparative genomics
Evolutionary biology
String algorithmics
Tree and graph algorithmics
Data and knowledge modeling
 
    Top of page   Home page  Prepare to print