Info Cene database of known and predicted genes

Genomic information is growing faster every day, but unfortunately the proportion of experimentally confirmed data is decreasing and the com plexity of extraction of useful information is increasing. The InfoGene database [4, 5] provides one of the most complete gene-centered genomic databases practically with all coherent information that can be obtained from the GenBank feature tables. We can obtain the necessary information without looking into many GenBank entries, where the information about a particular gene might be stored.

The InfoGene database includes known and predicted gene structures with a description of their basic functional signals and gene components. All major organisms are presented in separate divisions. The information about a gene structure might be collected from dozens GenBank entries. This information can be applied to create different sets of functional gene component for extraction their significant characteristics used in gene prediction systems. InfoGene is realized under a JAVA interactive environment system [15] that provides visual analysis of known information about complex gene structure (Figure 3.12) and searches different gene components and signals. The database is available through WWW server of Computational Genomics Group at http://genomic.sanger.ac.uk/infodb.shtml.

The value of sequence information for the biomedical community will strongly depend on the availability of candidate genes that are computationally predicted in these sequences. Currently information about predicted genes is absent in sequence databases if the gene has no similarity on the protein level with a known protein. Using gene prediction the scientific community can start experimental work with most human genes, because

IHIQUENt.- Hum.™ ijneiiit ti^lii mm

XWAHK

InfoGene [15] presentation of human PACE4 (AB001898) gene. This gene has several alternative forms and described in

17 entries of GenBank. Continues sequences regions corresponding different GenBank entries are separated by the vertical bars.

IHIQUENt.- Hum.™ ijneiiit ti^lii ilcLrrc

Drnnn mm

SMch.risWs

XWAHK

InfoGene [15] presentation of human PACE4 (AB001898) gene. This gene has several alternative forms and described in

17 entries of GenBank. Continues sequences regions corresponding different GenBank entries are separated by the vertical bars.

gene-finding programs usually predict accurately at least the major part of the exons in a gene sequence. InfoGene includes all predicted genes for Human and Drosophila genome drafts and several chromosomes of the Arabidopsis genome. The database is available currently through WWW server at httpd://www.softberry.com/infodb.html. Recently, the similar project Ensembl was started as a collaboration between the Sanger Center and European Bioinformatics Institute (http://www.ensembl.org/).

Continue reading here: Annotation of human genome draft

Was this article helpful?

0 0