Helix bioinformatics


	Context and situation

	Research activities

	Partnerships

	Teaching activities

	Members

	Former members


	Evolution of species and gene families

	Spatial organization of genomic information

	Syntaxic and functionnal genome annotation

	Proteomics

	Modeling and simulation of genetic regulatory networks

	Information extraction from texts


	Evolution of gene and gene families

	Spatial organization of genomic information

	Syntaxic and functionnal genome annotation

	Proteomics

	Modeling and simulation of genetic regulatory networks

	Information extraction from text


	Publications by year

	Publications by author

	Export


	The GenoStar integrated bioinformatics platform for exploratory genomics

	GEB: GenoExpertBacteria

	GNA: Genetic Network Analyzer

	PepLine: high throughput proteomics

	Herbs: checking the consistency of proteome annotations

	ISee: In Silico biology e-learning environment

	BOX: XML specifications of genomic data

	AROM: entity-relationship knowledge modeling


	Software and database releases

	Talks, seminars, poster presentations,...

	PhD and Master thesis defenses

	Training and job opportunities

Invisibles > SIB

HAMAP


		High-quality Automated and Manual Annotation of microbial Proteomes

In the past few years, the main reason for the exponential growth of protein databases has been the sequencing of complete genomes. Currently one genome is submitted to the DNA databases about every two weeks. They vary greatly in size, from about 500 protein coding sequences (CDS) in 0.58 megabases (Mb), for Mycoplasma genitalium, to about 8400 CDS in 9.11 Mb, for Bradyrhizobium japonicum. Thus, in contrast with the situation 10 years ago, a huge fraction of protein sequences have no experimental characterization data. Prokaryotic genomes tend to have excellent prediction rates for coding sequences, but the quality of the functional annotation of the predicted genes is very variable.

Due to the amount of data involved, we have decided to supplement the traditional curation process of Swiss-Prot with a semi-automatic annotation approach that closely interacts with human experts. This process named HAMAP for "High-quality Automated and Manual Annotation of microbial Proteomes" aimed at achieving the same quality of annotation generated by manual curation rather than maximal coverage. Thus, the procedure provides many checks in order to prevent the propagation of wrong annotation at protein level and to spot problematic cases, which are channelled to manual curation. This annotation process is only applied on two very distinct subsets of proteins: sequences with no detectable similarity and members of well defined curated families whose functions are known (ex: proteins involved in a metabolic process).

As complete prokaryotic proteomes (i.e. set of all annotated proteins of the organism) are available, it is currently possible to check the consistency of annotation at the organism level. In order to help annotators to deal with this task, a specific inference system using expert rules and metabolic knowledge on microbial organisms is being developed. This system is named Herbs for "HAMAP Expert Rule Based System".

	HAMAP
	The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003