Functional analysis and verification of predicted genes

Large scale functional analysis of predicted and known genes might be done using expression micro-array technology (see Chapter 5 of Volume 2 of this book). Often genes are presented on the chips by unique oligonucleotides close to the 3 '-end of the mRNA. But there are many predicted new genes that have no known corresponding EST sequences. We can study the expression of such genes in a large number of human tissues using predicted exon sequences represented on one or several Affymetrix type DNA chips. As a result we will know not only expression properties of genes, but we can identify what exons are real. Observing coordinated expression of neighboring exons in different tissues it will often be possible to define gene boundaries, which is very difficult using ab initio gene prediction. Moreover such experiments might have additional value in defining disease

102 I 3 Structure, Properties and Computer Identification of Eukaryotic Genes Tab. 3.18

PfamA domains identified in the predicted human genes. Domain of the same type localized in neighboring exons were counted only once.

Number

PfamA short name

Name

467

Pkinase

Eukaryotic protein kinase domain

372

7tm_1

7 transmembrane receptor (rhodopsin family)

308

Myc_N_term

Myc amino-terminal region

256

Topoisomerase_I

Eukaryotic DNA topoisomerase I

224

Ig

Immunoglobulin domain

183

Rrm

RNA recognition motif

182

PH

PH domain

180

Myosin_tail

Myosin tail

166

EGF

EGF-like domain

159

Filament

Intermediate filament proteins

154

Syndecan

Syndecan domain

143

Ras

Ras family

138

RNA_pol_A

RNA polymerase A/beta'/A '' subunit

123

BTB

BTB/POZ domain

119

Granin

Granin (chromogranin or secretogranin)

119

Troponin

Troponin

113

Herpes_glycop_D

Herpesvirus glycoprotein D

111

Homeobox

Homeobox domain

110

SH3

SH3 domain

102

Trypsin

Trypsin

102

helicase_C

Helicases conserved C-terminal domain

100

KRAB

KRAB box

98

dehydrin

Dehydrins

96

ABC_tran

ABC transporter

95

ERM

Ezrin/radixin/moesin family

89

Collagen

Collagen triple helix repeat

87

Tryp_mucin

Mucin-like glycoprotein

84

Fn3

Fibronectin type III domain

81

pro_isomerase

Cyclophilin type peptidyl-prolyl cis-trans isomerase

81

HMG_box

HMG (high mobility group) box

79

SH2

Src homology domain 2

tissue specific genes, which can be used for the development of potential therapeutics.

The chip designed by EOS Biotechnology included all exons from Chromosome 22 predicted by Fgenesh and Genescan as well as exons from human genomic sequences of Phase 2 and 3 predicted by Fgenesh. It was found that the predicted exon sequences present a good alternative to EST sequences that open a possibility to work with predicted genes on a large scale.

In Figure 3.13 we have an example of expression behavior of five sequential exons along the Chromosome 22 sequence (expression data were received in EOS Biotechnology Inc.). Exons 2, 3 and 4 are Myoglobin gene exons. Tissue specific expression of them is clear seen with the major peaks located

Coordinative expression of three exons (EOS34842, EOS34842, EOS34842) of the Human Myoglobin gene from Chromosome 22 (exons were predicted by the Fgenesh program and used to design the EOS Biotechnology Human genome chip) in 50 different tissues. The high level of expression is observed only in several specific tissues. The two exons of Myoglobin predicted on the left side (EOS34841) and on the right side (EOS37009) respectively show completely different patterns of expression. The visualization is presented by the SELTAG program for analysis ofexpression data developed by Softberry Inc.

-aSEBaaas

_inix

Hir ipfilUt ii ij f l r J d

Coordinative expression of three exons (EOS34842, EOS34842, EOS34842) of the Human Myoglobin gene from Chromosome 22 (exons were predicted by the Fgenesh program and used to design the EOS Biotechnology Human genome chip) in 50 different tissues. The high level of expression is observed only in several specific tissues. The two exons of Myoglobin predicted on the left side (EOS34841) and on the right side (EOS37009) respectively show completely different patterns of expression. The visualization is presented by the SELTAG program for analysis ofexpression data developed by Softberry Inc.

in skeletal muscle, heart and diaphragm tissues. The level of expression in these tissues is 10-100 times higher than the level of signals for other tissues as well as the average level of expression for randomly chosen exons. We found that three Myoglobin exons have expression level correlation coefficient 0.99, when for random exons it is about 0.06. These exons were predicted correctly by the Fgenesh program and were used for selection of oligonucleotide probes. From this result we can conclude that the predicted exons can be used as gene representatives. At the same time two flanking exons (1 and 5) from different genes show no correlation with the Myoglobin exons. This clearly demonstrates how expression data can be used to define gene boundaries. Another application of expression data is functional analysis and identification of alternatively spliced genes (exons), if in particular tissues some exons (or their parts) have very different expression intensities comparing with the other exons from the same gene. If 5 '-alternative exons define different functional forms of genes (normal and disease-specific, for example), then the 3'-EST generated probes cannot be used for identification of disease specific gene variants.

3 Structure, Properties and Computer Identification of Eukaryotic Genes 3.9

Continue reading here: Internet sites for gene finding and functional site prediction

Was this article helpful?

0 0