# A

This figure illustrates the RDP procedure: (a) illustrates the finding of initial hits and how to proceed with further searches based on the contact structure of the template and the remaining not yet aligned profiles, sequence parts, (b) illustrates the basic

This figure illustrates the RDP procedure: (a) illustrates the finding of initial hits and how to proceed with further searches based on the contact structure of the template and the remaining not yet aligned profiles, sequence parts, (b) illustrates the basic search step given a partial alignment: the partial alignment implies contact profiles for the remaining positions. Optimal threadings can be found using dynamic programming of sequence parts onto the

positions. The initial hit also partitions the query sequence into aligned and not aligned parts. Now, the procedure uses dynamic programming to find the best possible match from the not aligned parts of the query with the structural regions represented by the contact profile. This results in more partial alignments extending the partial alignment under consideration. The procedure is continued recursively until no further significant partial alignment can be found in the still unaligned parts. The alignment is then complete and the rest of the structure and the query is considered to be insertions and treated as gaps in the alignment.

This recursive process starts with a set of initial partial alignments each partitioning the problem into a set of subproblems, which need to be solved to obtain an overall solution. From all solutions for each member of the set, the best overall solution(s) are considered as solutions to the overall problem.

Especially in the early phases of this process, many incorrect or suboptimal partial alignments have to be expected due to the partial information used so far. Therefore many alternatives are considered using a variety of oracles.

Technically, the search space of RDP can be represented as an AND-OR tree with alternating levels of AND and OR nodes. An AND node contains a set of subproblems to be solved in order to solve the problem represented by the node. An OR node contains a set of problems, where the solution of any of them could be used as the solution of the overall problem represented by the OR node.

Usually, the search is organized using a priority queue, holding the nodes in the order of most promising subproblems first. The search selects the most promising subproblem and determines solutions for it in the context of the partial assignments already made.

A detailed description and discussion can be found in [188]. The approach has been shown to improve on profile threading methods on several comprehensive benchmark sets containing structural pairs of increasing difficulty.

The approach is fast enough to allow threading a query sequence against a small fold library of several hundred domains. For large scale threading applications using thousands of template structures or for predicting thousands of sequences, e.g. for whole genome analysis, the normal mode of operation is to apply fast threading methods such as ToPLign/123D* first to select a smaller set of candidates, against which the query is threaded with RDP to obtain additional evidence or more accurate alignments.

Additional functional information and constraints derived from experiments can be exploited in similarity methods to balance inaccurate scoring functions or excessive computing requirements in order to improve fold recognition and alignments of sequences to structures. Distance constraints provide orthogonal measures for scoring alignments and structural models and means for drastically reducing the search space, and thus computing and memory resources. There are essentially two ways to incorporate such distance constraints into an improved prediction method: the first uses the constraints in evaluating the predicted models either via discarding invalid (i.e. incompatible with the measured constraints) models or by re-sorting the ranked list of models via a score combining threading score and fulfilled constraints. The second approach uses the constraints in the prediction algorithm itself thereby excluding partial model structures in conflict with the experimental constraints. Here a XML-based specification language ProML is employed to represent and access the additional information on the target sequence and the fold template in the prediction algorithm [32, 33].

The RDP method lends itself naturally to such an approach. The search procedure can be easily tailored to special needs via using specialized generating functions to compute the sets of partial solutions and specialized evaluation functions to select the best assembled solutions. If a set of distance constraints is given, the generating functions can be restricted such that only solutions respecting the set of constraints are generated, thereby focusing on consistent alignments in order to guide the search process towards valid alignments and discarding huge search spaces not containing any valid alignment. On the other hand, distance constraints can also be checked between aligned residues after having assembled partial solutions into overall solutions. Therefore, the evaluation function is also restricted with respect to the constraints in order to filter out invalid alignments. Again, the idea is to eliminate invalid alignments as early as possible to accelerate the computation and to improve the alignment quality of solutions.

Experimentally measured distance constraints can be of valuable use in similarity methods for structure prediction. Even for moderate numbers (sequence length/10) of distance constraints, we could gain one-third more recognized folds by re-ranking 123D alignments and about two-thirds by employing better alignments on a representative benchmark set [32].