Far from being an end in itself, the availability of a complete genome sequence opens up the possibility of a systematic approach to the genes within it. But progress is still needed. The human genome sequence is now available, and the mouse genome soon will be, but making reliable predictions of eukaryotic genes on the basis of those sequences is a classic example of a problem which is still wide open.
It is the extreme variety of the information available, and way in which it is interrelated, which causes the problem, rather than its volume. In fact, improvements in the efficiency of comparative techniques are keeping pace with the availability of sequences of new organisms. But to reap the benefit of this multiplier effect, whereby new information is produced on the basis of existing information and the analysis of new data, it is no longer good enough to record this information only in textual form and in natural language, even if it is stored in IT format. This form is an obstacle to wide-ranging, integrated searches of large numbers of databases, even when powerful search engines are used. The key issues in bioinformatics research are therefore not only to design new, increasingly powerful and above all appropriate algorithms or heuristics, but also to provide tools which will make it easier to model, structure, examine and visualise biological knowledge. |