The Helix research group
Research themes
Work in progress and results
Software and databases
News from Helix
What is bioinformatics? > A short introduction to bioinformatics
Home page
Site map Mail to Helix
Finding genes in procaryotic genomes
Translated from "Donner un sens au génome", La Recherche, n° 332, June 2000

A prokaryotic genome is fairly dense - almost the entire sequence corresponds to genes - and we know the codons (sets of three nucleotides) which mark the beginning and end of translation of a region which codes for a protein. But unfortunately it is not that simple, as there are certain ambiguities: for example, the codons which mark the beginning of translation also code for an amino acid. ATG, the most common start codon, codes for methionine. So there is only one possible "necessary condition" defining where to look for a coding sequence: between two codons which mark the end of translation (known as STOP codons), in what is called an Open Reading Frame (ORF).

Any sequence included in an ORF which begins with a START codon and which is judged to be long enough (for example 300 nucleotides for a prokaryote, which corresponds to a protein of 100 amino acids) is considered to be a potential coding region. If significant sub-sequences, particularly a promoter or a ribosome binding site, are found upstream from this region, this supports the hypotheses, as does the existence of similar sequences in the nucleotide and protein bases. Finally, the same sequence can be "read" in three different ways, grouping the letters in threes, codon by codon, and each of the two complementary strands of DNA can be read, so that in practice the search for coding regions must be carried out on six different virtual sequences. Together with Antoine Danchin's group at the Institut Pasteur, the authors have developed software tools to facilitate genome analysis [1], but there are many others.

[1] Médigue, C. et al; Bioinformatics vol 15, No 1, p2, 1999
In the same section
The first genome projects
Whole genome sequencing
Genomic databases
The problem of heterogeneous databases
Searching for homology through similarity of sequences
Finding genes in procaryotic genomes
Finding genes in eucaryotic genomes
Inferring gene functions from homology relationships
The quest for gene fonction has not yet found an algorithmic solution
Modeling and simulating gene interaction networks and metabolic pathways
Biological data and knowlege need to be formalized
    Top of page   Home page  Prepare to print