Internal exon recognition

For internal intron prediction we consider all open reading frames in a given sequence that flanked the AG (on the left) and GT (on the right) base pairs as potential internal exons. The structure of such exons is presented in Figure 3.9. The components of the recognition function for internal exon

Different functional regions of the first (a), internal (b), last (c) and single exons corresponding to components of recognition functions.

Different functional regions of the first (a), internal (b), last (c) and single exons corresponding to components of recognition functions.

90 | 3 Structure, Properties and Computer identification of Eukaryotic Genes Tab. 3.10

Significance of selected internal exon characteristics. Characteristics 1 and 2 are the values of donor and acceptor site recognition functions; 3 is the octanucleotide preferences for being coding of potential exon region; 4 are the octanucleotide preferences for being intron 70 bp region on the left and 70 bp region on the right of potential exon region.

Characteristics

1

2

3

4

5

a

Individual D2

15.0

12.1

0.4

0.2

1.5

b

Combined D2

15.0

25.3

25.8

25.8

25.9

prediction consist of the octanucleotide preferences for an intron 70 bp to the left of the potential intron region; the value of the acceptor splice site recognition function, the octanucleotide preferences for the coding ORF, the value of the donor splice site recognition function and the octanucleotide preferences for intron 70 bp to the right of potential intron region. The values of 5 characteristics were calculated for 952 authentic exons and for 690 714 pseudo-exon training sequences from the set. The Mahalonobis distances showing significance of each characteristic are given in Table 3.10. We can see that the strongest characteristics for exons are the values of recognition functions of flanking donor and acceptor splice sites (D2 = 15.04 and D2 = 12.06, respectively). The preference of ORF being a coding region has D2 = 1.47 and adjacent left intron region has D2 = 0.41 and right intron region has D2 = 0.18.

The accuracy of the discriminant function based on these characteristics was calculated on the recognition of 451 exon and 246 693 pseudo-exon sequences from the test set. The general accuracy of exact internal exon prediction is 77% with a specificity of 79%. At the level of individual nucleotides, the sensitivity of exon prediction is 89% with a specificity of 89%; and the sensitivity of the intron prediction positions is 98% with a specificity of 98%. This accuracy is better than in the dynamic programming and neural network-based method [46], which has 75% accuracy of the exact internal exons prediction with a specificity of 67%. The method has 12% fewer false exon assignments with the better level of correct exon prediction.

Continue reading here: Recognition of flanking exons

Was this article helpful?

0 0