Helix bioinformatics


	Context and situation

	Research activities

	Partnerships

	Teaching activities

	Members

	Former members


	Evolution of species and gene families

	Spatial organization of genomic information

	Syntaxic and functionnal genome annotation

	Proteomics

	Modeling and simulation of genetic regulatory networks

	Information extraction from texts


	Evolution of gene and gene families

	Spatial organization of genomic information

	Syntaxic and functionnal genome annotation

	Proteomics

	Modeling and simulation of genetic regulatory networks

	Information extraction from text


	Publications by year

	Publications by author

	Export


	The GenoStar integrated bioinformatics platform for exploratory genomics

	GEB: GenoExpertBacteria

	GNA: Genetic Network Analyzer

	PepLine: high throughput proteomics

	Herbs: checking the consistency of proteome annotations

	ISee: In Silico biology e-learning environment

	BOX: XML specifications of genomic data

	AROM: entity-relationship knowledge modeling


	Software and database releases

	Talks, seminars, poster presentations,...

	PhD and Master thesis defenses

	Training and job opportunities

What is bioinformatics? > A short introduction to bioinformatics

Searching for homology through similarity of sequences


		Translated from "Donner un sens au génome", La Recherche, n° 332, June 2000

To return to sequencing, if we use the classic metaphor in which the DNA bases are seen as letters, then once the text (the sequence) has been obtained, the first difficulty is to identify the words (the genes) which make it up. Next comes the question of the meaning - the function of the genes.

A biologist's first reflex, when a new sequence is available, is to compare it, together with its potential translations into protein sequences, with those already held in banks and databases, looking for similar rather than identical sequences. With the exception of sequencing errors, any differences represent mutations which have accumulated in the course of evolution. If there is enough similarity, the two fragments are considered to result from divergent evolution from one ancestral fragment, and they are said to be homologues. If the fragment includes a gene, homology suggests that the proteins it codes for have a similar function, but it does not prove this, as will be seen later. The search for similarity has led to a wealth of technical and methodological developments, both to shorten the computer run time, when a sequence is compared to all the sequences that are already known, and also to take prior knowledge about evolutionary mechanisms into account when designing algorithms. There are limits to what this strategy can achieve. A similarity search may fail simply because no homologous sequence has yet been identified. When the yeast genome was sequenced, almost half its genes were completely unknown, and they did not resemble anything found in the banks. Such genes are known as "orphans". Besides, relying exclusively on the information in the databanks means that if this information is incorrect, as is all too often the case, the errors are propagated, resulting in what some researchers call a "house of cards". So it is essential to have access to direct gene identification methods which do not rely on homology. This research is much easier when the genome in question comes from a prokaryote (a bacterium) than if it comes from a eukaryote (any other organism).

	The first genome projects
	Whole genome sequencing
	Genomic databases
	The problem of heterogeneous databases
	Searching for homology through similarity of sequences
	Finding genes in procaryotic genomes
	Finding genes in eucaryotic genomes
	Inferring gene functions from homology relationships
	The quest for gene fonction has not yet found an algorithmic solution
	Modeling and simulating gene interaction networks and metabolic pathways
	Biological data and knowlege need to be formalized