At a first level, biological molecules may be seen as strings over a 4 or 20-letter alphabet, and many classical problems in string algorithms, such as string comparison or pattern matching, have therefore found applications in computational biology.
Biology presents however both characteristics and problems that are peculiar to it. Among the main characteristics is the fact that biological molecules can mutate up to a point, or present a certain degree of variability without losing the function(s) encoded by its sequence of nucleic or amino acids. Various models of approximate matching are therefore required when dealing with biological sequences.
Concerning problems, there are many more things that are unknown than the opposite. Instead of having to look for occurrences of known patterns, one is therefore often confronted with the problem of having to infer the existence of patterns with "unusual" characteristics of which very little, or even nothing may be known at start.
Although such problems as pattern (also called motif in a more general context) inference are often categorized as syntactic parsing problems or first level syntactic annotation, the more advanced current research in the area is already trying to integrate relationnal information into the inference process. |