The Helix research group
Research themes
Work in progress and results
Publications
Software and databases
News from Helix
Publications
Home page
Site map Mail to Helix
Pattern inference under many guises
in B. Reed and C. L. Sales (eds.), Recent advances in algorithms and combinatorics, Springer Verlag, 2003.
 
M.-F. Sagot, Y. Wakabayashi
 
The paper surveys some of the main combinatorial methods for inferring patterns from a string, or a set of strings. The types of problems that will be addressed are repeat identification and common pattern inference. The strings that will concern us represent biological entities, nucleic acid and protein sequences or, in some cases, structures. As is well-known, exact ("identical") patterns hardly make sense in biology; we consider here two types of similar ("non-identical") patterns. One comes from looking at what "hides" behind each letter of the DNA/RNA or protein alphabet while the other corresponds to the more familiar notion of "errors". The errors concern mutational events that may affect a molecule during DNA replication. Those that will be of interest to us are point mutations, that is, mutations operating on single letters of a biological sequence: substitution, insertion or deletion. Various notions of "similarity" are discussed, as well as other constraints the patterns may have to satisfy depending on the biological situation one is facing. Algorithms for the various notions, together with complexity analyses, are presented in some detail.
    Top of page   Home page  Prepare to print