Clearly the difficulties are not over once a gene is discovered. Its function or functions still have to be discovered. Structural homology suggests functional homology, so the strategy is based on a database search for genes with a similar sequence. But this method also has its limits. Once a certain similarity has been identified, genes which are orthologues must be distinguished from those which are paralogues. What does this difference mean in real terms? It is quite common for some genes to duplicate themselves. While the original copy of the gene generally retains its original function (this is a true homologue, hence its name of 'orthologue'), the duplicate or duplicates (paralogues) may evolve independently and acquire completely different functions. These two cases can only be distinguished through an evolutionary analysis, by constructing phylogenetic trees.
The first step is to "align" the sequences of homologous genes, that is, to estimate what mutations have appeared during their divergent evolution from a common ancestor. If only two sequences are available, a dynamic-programming algorithm is used (see inset "Aligning two sequences"). Where large numbers of sequences are available, as is the case with certain genes coding for ribosomal RNA, higher-speed heuristics have to be used, but these are not guaranteed to find an optimal alignment. After deciding on an evolutionary model, it is normally possible to differentiate between paralogues and orthologues by estimating the total number of changes along the branches of the phylogenetic tree linking each pair of sequences. However it is impossible to validate the resulting tree experimentally. At best it can be checked against prior knowledge from the field of systematics.