Helix bioinformatics


	Context and situation

	Research activities

	Partnerships

	Teaching activities

	Members

	Former members


	Evolution of species and gene families

	Spatial organization of genomic information

	Syntaxic and functionnal genome annotation

	Proteomics

	Modeling and simulation of genetic regulatory networks

	Information extraction from texts


	Evolution of gene and gene families

	Spatial organization of genomic information

	Syntaxic and functionnal genome annotation

	Proteomics

	Modeling and simulation of genetic regulatory networks

	Information extraction from text


	Publications by year

	Publications by author

	Export


	The GenoStar integrated bioinformatics platform for exploratory genomics

	GEB: GenoExpertBacteria

	GNA: Genetic Network Analyzer

	PepLine: high throughput proteomics

	Herbs: checking the consistency of proteome annotations

	ISee: In Silico biology e-learning environment

	BOX: XML specifications of genomic data

	AROM: entity-relationship knowledge modeling


	Software and database releases

	Talks, seminars, poster presentations,...

	PhD and Master thesis defenses

	Training and job opportunities

What is bioinformatics? > A short introduction to bioinformatics

The problem of heterogeneous databases


		Translated from "Donner un sens au génome", La Recherche, n° 332, June 2000

Each database addresses different biological questions, and this shapes the way the data are structured within them. They thus each have a different conceptual plan, so hoping to organise all the genomic data - the sequences and the various other data which are attached to them - within a single database is a lost cause. On the other hand, their integration does need to be improved; in other words it should be made easier to search these different bases at the same time, in response to a complex request from a biologist who has his own method of approaching a problem. This as much a conceptual issue as a technical one. How can different databases be reconciled, when their structure is based on different definitions, (all too often in a way which is not even explicit), especially definitions of such fundamental concepts as the genes themselves? Some databases consider the gene to be limited to those regions of DNA which code for its product or products (protein or RNA) while for others it includes the various regions which come into play during transcription (from DNA into RNA) and translation (from RNA into proteins), that is, a large number of regulatory sequences.

Remember that the term 'genome' is not without ambiguity either. Generally, it refers to the DNA macromolecule contained in the chromosomes, but there is also non-chromosomal DNA, in the plasmids of bacteria and the organelles (mitochondria or chloroplasts for example) of eukaryotic organisms. The term also applies to the whole set of genes of an organism.

	The first genome projects
	Whole genome sequencing
	Genomic databases
	The problem of heterogeneous databases
	Searching for homology through similarity of sequences
	Finding genes in procaryotic genomes
	Finding genes in eucaryotic genomes
	Inferring gene functions from homology relationships
	The quest for gene fonction has not yet found an algorithmic solution
	Modeling and simulating gene interaction networks and metabolic pathways
	Biological data and knowlege need to be formalized