This Chapter concentrates on secondary structure prediction and on protein threading methods since, given current data and state-of-the-art of the field, these are the most promising approaches to tertiary structure prediction, fold recognition and remote homology detection on a large genomic scale. Ab initio methods are not covered here because, to date, they are less application relevant, with the exception of recent approaches combining local sequence-structure similarities with ab initio conformation sampling (see Section 6.2.5). Homology-based modeling based on an alignment or on a rough structural model is discussed in Chapter 5 of this volume.

Bioinformatics approaches to protein structure prediction and the identification of homology and similarities involve the solution of search problems. As the range of possible protein sequences and folding conformations is astronomical, appropriate pruning of the search space is inherently necessary. The so-called Levinthal paradox [4] states that even real proteins cannot try out all the possible conformations during the time they fold into their native structure. One possibility would be to concentrate on local subproblems and assemble the subsolutions to form overall solutions. Fortunately, proteins are composed of highly regular recurring elements (secondary structures) such as helices or strands. The secondary structures, in turn, form certain supersecondary structure motifs. Unfortunately, building up a protein structure locally is not possible directly, as short peptide segments of a protein can fold into different structures in different protein environments. It is immediately apparent that the 3D structure brings amino acids that are distant in the sequence in close contact in the fold. Such long-range contacts often may overrule local conformational tendencies and have to be dealt with, e.g. via conformational search approaches [5].

For the detection of remote homologies it may be required to investigate complex evolutionary and functional relationships. Homologues are proteins that relate back to the same common ancestor in evolution. Evolutionary relationship often points to similar structure and function. Finding close homologues, such as protein family members or related (paralogous, see also Chapter 5) sequences from other species can be detected via similarities exhibited with standard sequence search methods. Such methods include sequence alignment [6-8] and heuristic statistical search methods such as BLAST [9-13] and FASTA [14-18]. Current profile and multiple alignment methods, HMMs [19-28] and iterative searches [29] allow to identify more distant and insecure homology (see also Chapter 2 on sequence analysis in this volume). Sequence, structure and function analysis problems are closely intertwined, thus, so are the associated prediction problems. Structure prediction problems range from aligning a sequence to a sequence of known structure across threading the sequence to structural environments without significant sequence similarity, across mapping the sequence to structural templates according to contacts between residues within protein structures [30], to mere functional similarity based on indicative functional motifs [31]. Typically, biological problems have fuzzy, non-local aspects which are invariably highly interconnected. This is also the case for protein structure prediction, which apparently looks like a simple optimization problem, but turns out to be a knowledge discovery task exploiting heterogeneous information with efficient algorithms in the most discriminative and critical way.

The practical problem of performing as detailed as possible a prediction for a given sequence usually involves a procedure of analysis steps (Figure 6.1), so that results from early steps can and should be used in subsequent steps, e.g. predicted secondary structure can help for selecting fold sets or improve sequence-structure alignments, multiple alignments can help to both improve secondary structure prediction and fold recognition, functional predictions can provide further evidence for top scoring threading models. In order to deal with various kinds of information in the individual steps, it is helpful to provide appropriate means to store features and results of prediction steps. One such approach to represent and visualize sequence o


full atom structure model refined structure model intermediate results and features in a uniform way and to exploit them efficiently in subsequent prediction programs is to use an extensible (XML-based) protein description language (ProML [32, 33]).

It is probably fair to conclude that most of the improvements on prediction performance are due to efficient use of more evolutionary information (homologs) in individual methods and/or due to clever processes of analyses [317] trying to mimic the approach of human experts, which is still the state-of-the-art [34] in protein structure prediction.

Continue reading here: Definition of terms

Was this article helpful?

0 0