Helix bioinformatics


	Context and situation

	Research activities

	Partnerships

	Teaching activities

	Members

	Former members


	Evolution of species and gene families

	Spatial organization of genomic information

	Syntaxic and functionnal genome annotation

	Proteomics

	Modeling and simulation of genetic regulatory networks

	Information extraction from texts


	Evolution of gene and gene families

	Spatial organization of genomic information

	Syntaxic and functionnal genome annotation

	Proteomics

	Modeling and simulation of genetic regulatory networks

	Information extraction from text


	Publications by year

	Publications by author

	Export


	The GenoStar integrated bioinformatics platform for exploratory genomics

	GEB: GenoExpertBacteria

	GNA: Genetic Network Analyzer

	PepLine: high throughput proteomics

	Herbs: checking the consistency of proteome annotations

	ISee: In Silico biology e-learning environment

	BOX: XML specifications of genomic data

	AROM: entity-relationship knowledge modeling


	Software and database releases

	Talks, seminars, poster presentations,...

	PhD and Master thesis defenses

	Training and job opportunities

What is bioinformatics? > A short introduction to bioinformatics

Finding genes in procaryotic genomes


		Translated from "Donner un sens au génome", La Recherche, n° 332, June 2000

A prokaryotic genome is fairly dense - almost the entire sequence corresponds to genes - and we know the codons (sets of three nucleotides) which mark the beginning and end of translation of a region which codes for a protein. But unfortunately it is not that simple, as there are certain ambiguities: for example, the codons which mark the beginning of translation also code for an amino acid. ATG, the most common start codon, codes for methionine. So there is only one possible "necessary condition" defining where to look for a coding sequence: between two codons which mark the end of translation (known as STOP codons), in what is called an Open Reading Frame (ORF).

Any sequence included in an ORF which begins with a START codon and which is judged to be long enough (for example 300 nucleotides for a prokaryote, which corresponds to a protein of 100 amino acids) is considered to be a potential coding region. If significant sub-sequences, particularly a promoter or a ribosome binding site, are found upstream from this region, this supports the hypotheses, as does the existence of similar sequences in the nucleotide and protein bases. Finally, the same sequence can be "read" in three different ways, grouping the letters in threes, codon by codon, and each of the two complementary strands of DNA can be read, so that in practice the search for coding regions must be carried out on six different virtual sequences. Together with Antoine Danchin's group at the Institut Pasteur, the authors have developed software tools to facilitate genome analysis [1], but there are many others.

[1] Médigue, C. et al; Bioinformatics vol 15, No 1, p2, 1999

	The first genome projects
	Whole genome sequencing
	Genomic databases
	The problem of heterogeneous databases
	Searching for homology through similarity of sequences
	Finding genes in procaryotic genomes
	Finding genes in eucaryotic genomes
	Inferring gene functions from homology relationships
	The quest for gene fonction has not yet found an algorithmic solution
	Modeling and simulating gene interaction networks and metabolic pathways
	Biological data and knowlege need to be formalized