The Helix research group
Research themes
Work in progress and results
Software and databases
News from Helix
What is bioinformatics? > A short introduction to bioinformatics
Home page
Site map Mail to Helix
Inferring gene functions from homology relationships

Clearly the difficulties are not over once a gene is discovered. Its function or functions still have to be discovered. Structural homology suggests functional homology, so the strategy is based on a database search for genes with a similar sequence. But this method also has its limits. Once a certain similarity has been identified, genes which are orthologues must be distinguished from those which are paralogues. What does this difference mean in real terms? It is quite common for some genes to duplicate themselves. While the original copy of the gene generally retains its original function (this is a true homologue, hence its name of 'orthologue'), the duplicate or duplicates (paralogues) may evolve independently and acquire completely different functions. These two cases can only be distinguished through an evolutionary analysis, by constructing phylogenetic trees.

The first step is to "align" the sequences of homologous genes, that is, to estimate what mutations have appeared during their divergent evolution from a common ancestor. If only two sequences are available, a dynamic-programming algorithm is used (see inset "Aligning two sequences"). Where large numbers of sequences are available, as is the case with certain genes coding for ribosomal RNA, higher-speed heuristics have to be used, but these are not guaranteed to find an optimal alignment. After deciding on an evolutionary model, it is normally possible to differentiate between paralogues and orthologues by estimating the total number of changes along the branches of the phylogenetic tree linking each pair of sequences. However it is impossible to validate the resulting tree experimentally. At best it can be checked against prior knowledge from the field of systematics.

In the same section
The first genome projects
Whole genome sequencing
Genomic databases
The problem of heterogeneous databases
Searching for homology through similarity of sequences
Finding genes in procaryotic genomes
Finding genes in eucaryotic genomes
Inferring gene functions from homology relationships
The quest for gene fonction has not yet found an algorithmic solution
Modeling and simulating gene interaction networks and metabolic pathways
Biological data and knowlege need to be formalized
    Top of page   Home page  Prepare to print