The Helix research group
Research themes
Work in progress and results
Publications
Software and databases
News from Helix
The Helix research group > Members > Post-doctoral students
Home page
Site map Mail to Helix
Dufayard, Jean-François
ATER, université Claude Bernard
 

INRIA Rhône-Alpes

655 avenue de l'Europe, Montbonnot
38334 Saint Ismier cedex
France

Tel. +33 4 76 61 55 05
Fax +33 4 76 61 54 08

Secretary (Françoise de Coninck)
Tel. +33 4 76 61 53 63

Jean-Francois.Dufayard@inrialpes.fr


-  Short curriculum vitae
-  Present work, my PhD thesis subject
-  Courses
-  Selected publications and talks


 

Short curriculum vitae


I was born in Grenoble on November 15th 1976. I joined the Helix research group, headed by François Rechenmann, during my Bachelor in computer science in september 1999. My first research subject dealt with tree reconciliation: I developped and implemented an algorithm which locates duplication nodes in large databases of homologous gene families. This algorithm has been applied to databases that are developped at the Pôle Bio-Informatique Lyonnais, such as HOVERGEN. This work was directed by Laurent Duret and François Rechenmann.

During my Master of Science in computer science (DEA ISC, University Joseph Fourier, Grenoble I), I worked on a tree pattern matching problem. I developped and implemented an algorithm which allows to formulate phylogenetic requests in a database of homologous gene families. This algorithm has been embedded in Famfetch, the client program of databases developped in the Pôle Bio-Informatique Lyonnais. This work was directed by Laurent Duret, Guy Perrière and François Rechenmann.

I'm currently in my third year of PhD, and my subject is described below.


 

Present work, my PhD subject - Incremental algorithms for the management of families of similar sequences


There are many different methods for retrieving a multiple alignment and a phylogenetic tree from a family of similar sequences. These methods are very efficient and can be used to study large databases of families.

However, the most critical parameter remains the size of each family. A family that gathers more than 1,000 sequences begins to be quite difficult to manage with classical algorithms, particularly with the alignment methods. Moreover, new sequences are frequently added to sequence families, and the only way to deal with these new sequences using classical algorithms is to re-compute entirely the alignment and the tree.

I have thus developed and implemented an incremental algorithm which allows to add new sequences to a multiple alignment and its phylogenetic tree. The algorithm computes simultaneously the alignment and the tree.

-  The location of the new sequence is searched for in the tree, and the result provides its location in the alignment.
-  Then, the alignment is modified using the progressive multiple alignment method. The phylogenetic tree replaces advantageously the guide tree.

The main target of these efforts is the European Small Subunit Ribosomal Database [2,4,5] which contains 20,000 manually aligned RNA sequences. The objective is to be able to add automatically every new sequence.

The general method is based on the progressive multiple alignment method [1,3,4]. It consists in aligning successive blocks of sequences, following the topology of a guide tree. The closest the guide tree is to the evolutionary tree, the best is the alignment. So, we maintain the phylogenetic tree during the addition of the new sequence. The tree and the alignment are very closely related, and to maintain both the tree and the alignment in the same time appears to be one of the most valuable approaches to solve the problem.

The search for the location of the new sequence in the tree is based on an internal branche recomputation, using the Neighbor Joining method [6,8]. The new sequence S is roughly aligned with the block formed by all others sequences, and some internal branches are recomputed. Doing this, a first location is found for S. Then S is realigned with the block that has been found and some internal branches that start form this point are reccomputed, providing a better location for S, and so one... Until the location becomes stable.

The alignment is modified, in order to fit with what the alignment would be if entirely recomputed with S using the progressive multiple alignment. Thus, block alignments from the insertion node of S to the root are re-evaluated.

This work is co-directed by Manolo Gouy, Guy Perrière and François Rechenmann.

Reference:
[1] Gotoh, O. (1993). Optimal alignment between groups of sequences and its application to multiple sequence alignment. the Computer Applications in the Biosciences 9: 361-370.
[2] Maidak, B. L., J. R. Cole, et al. (2000). The RDP (Ribosomal Database Project) continues. Nucleid Acids Research 28: 173-174.
[3] Needleman, S. B. and C. D. Wunch (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology 48: 443-453.
[4] Rijk, P. D., J. Wuyts, et al. (2000). The european large subunit ribosomal RNA database. Nucleid Acids Research 28: 177-178.
[5] Rijk, P. D., J. Wuyts, et al. (2000). The european small subunit ribosomal RNA database. Nucleid Acids Research 28: 175-176.
[6] Saitou, N. and M. Nei (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4: 406-425.
[7] Smith, T. F. and M. S. Waterman (1981). Identification of common molecular subsequences. Journal of molecular biology 147: 195-197.
[8] Studier, J. A. and K. J. Keppler (1988). A note on the neighbor-joining method of Saitou and Nei. Molecular Biology and Evolution 6: 729-731.


 

Courses


I give courses in the IUT Services réseau et communications since 2001.

Year 2001/2002:
-  TP Java, oriented-object programming (60 hours)
-  TP networking, client-server programming (30 hours)

Year 2002/2003
-  TD structure of the computers (24 hours)
-  TP Java, oriented object programming. (60 hours)
-  TP networking, client-server programming (6 hours)

Mini-glossary of french courses description:
TP = Lab stations (Travaux pratiques in French)
TD = Training courses (Travaux dirigés in French)


 

Selected publications and talks


Talk Journées post génomique de la Doua - 415.7 ko
Talk Journées post génomique de la Doua
A 20 minutes talk given in Lyon during the Journées post-génomiques de la Doua in May 2002: tree pattern matching in phylogenetic trees, without any algorithmic description. Slides in English.
(PDF, 415.7 ko)
Talk Séminaire du club génome - 668.9 ko
Talk Séminaire du club génome
A 40 minutes talk given in Marseille during the Séminaire du Club Génome in October 2002: tree pattern matching algorithm and its transposition in phylogenetic trees. Slides in French.
(PDF, 668.9 ko)
Poster JOBIM 2002 - 252.2 ko
Poster JOBIM 2002
Poster presented in Saint Malo during JOBIM in June 2002: phylogenetic tree pattern matching and its application in Famfetch.
(PDF, 252.2 ko)
Proceedings JOBIM 2002 - 403 ko
Proceedings JOBIM 2002
A short paper written (in French) for the proceedings of JOBIM in June 2002, in Saint Malo (France): a complete description of the implementation of the tree pattern matching in Famfetch, and the tree reconciliation applied to HOVERGEN. Paper in French.
(PDF, 403 ko)
Talk for the inter EPST project - 669 ko
Talk for the inter EPST project
A 30 minutes talk given during the inter-EPST project meeting, in March 2003: the algorithm we have developped to add a new sequence in a tree and the associated alignment, as it was in March 2003. Slides in English.
(PDF, 669 ko)
Proceedings JOBIM 2004 - 129.6 ko
Proceedings JOBIM 2004
Short article, in French, describing my work on incremental alignments and phylogenies. This article is publicated in the proceedings of JOBIM 2004 (Montreal).
(PDF, 129.6 ko)
Talk JOBIM 2004 - 442.6 ko
Talk JOBIM 2004
Short talk of 5 minutes, in French, presented during JOBIM 2004 (Montreal). It deals with incremental alignments and phylogenies.
(PDF, 442.6 ko)

    Top of page   Home page  Prepare to print