vector machine is trained. The dotted lines denote the margin between the separating surface and the closest preclassified points. The mapping from (b) to (c) is performed implicitly by a so-called kernel function $

vector machine is trained. The dotted lines denote the margin between the separating surface and the closest preclassified points. The mapping from (b) to (c) is performed implicitly by a so-called kernel function $

Often hyperplanes are not sufficient for the classification of point sets. Support vector machines allow nonlinear separation between the two point classes (Figure 3 b). They do so by mapping the points nonlinearly into a higher-dimensional feature space and performing linear separation there (Figure 3 c). The mapping is done implicitly by a so-called kernel function, thereby making very high-dimensional feature spaces computationally affordable. The choice of the kernel function is the essential ingredient in the development of the support vector machine.

Support vector machines receive their name through their second feature. Rather than taking all preclassified points into account, they only consider those points that come to lie close to the separating surface. These points are called the support vectors. The surface is placed such that the distance to the closest support vectors is maximized (modulo preventing overfitting, ^ Regularization). Here we allow a limited number of misplacements.

Support vector machines have been applied to a wide variety of classification problems in bioinformatics including finding translation initiation sites in DNA sequences (Zien et al., 2000), classifying genes via their promoter regions (Pavlidis et al., 2001a), analyzing gene expression data (Brown et al., 2000; Furey et al., 2000), prediction of gene function (Pavlidis et al., 2001b), determining secondary structures of proteins (Hua & Sun, 2001), protein fold recognition (Ding & Dubchak, 2001), prediction of proteinprotein interactions (Bock & Gough, 2001), and protein sorting (Cai et al., 2000).

Tabu search

Meta-heuristic for local search strategies. The goal is to avoid getting caught in cycles while traversing the search space. Tabu search starts out by looking for local minima. To avoid retracing the moves just made the method protocols characteristic features of solutions that were recently visited in one or several tabu lists, each list for a certain class of solution attributes. Tabu search methods vary in the details of the management of the tabu lists (Glover, 1989; Glover, 1990). Tabu search is mentioned in Chapter 7 of Volume I. Additonal material on tabu search in docking can be found in (Hou et al., 1999; Westhead et al., 1997).


I am grateful to Holger Clau├čen, Daniel Hoffmann, Jochen Selbig, Alexander Zien, and Ralf Zimmer for helpful comments on this glossary. Alexander Zien provided Figure 3.


Althaus, E., Kohlbacher, O., Lenhoe, H. P. & Muller, P. (2000). A combinatorial approach to protein docking with flexible side-chains. In Annual International Conference on Computational Molecular Biology (RECOMB), pp. 8-14. ACM Press.

Anzali, S., Barnickel, G., Krug, M., Sadowski, J., Wagener, M., Gasteiger, J. & Poianski, J. (1996). The comparison of geometric and electronic properties of molecular surfaces by neural networks: application to the analysis of corticosteroid-binding globulin activity of steroids. J Comput Aided Mol Des 10(6), 521-34.

Asai, K., Hayamizu, S. & Handa, K. (1993). Prediction of protein secondary structure by the hidden Markov model. Comput Appl Biosci 9(2), 141-6.

Baldi, P. & Brunak, S. (1998). Bioinformatics: the machine learning approach. Adaptive computation and machine learning, MIT Press, Cambridge, Mass.

Baldi, P. & Chauvin, Y. (1994). Hidden Markov Models of the G-protein-coupled receptor family. J Comput Biol 1(4), 311-36.

Baum, L. E., Pertie, T., Soules, G. & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1), 164-171.

Binder, K. & Landau, D. P. (2000). A Guide to Monte Carlo Simulations in Statistical Physics, Cambridge University Press.

Bock, J. R. & Gough, D. A. (2001). Predicting protein-protein interactions from primary structure. Bioinformatics 17(5), 455-60.

Bracewell, R. N. (1986). The Fourier transform and its applications. 2nd, rev. edit. McGraw-Hill series in electrical engineering. Circuits and systems, McGraw-Hill, New York.

Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M., Jr. & Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 97(1), 262-7.

Brunger, A. T., Adams, P. D. & Rice, L. M. (1997). New applications of simulated annealing in X-ray crystallography and solution NMR. Structure 5(3), 325-36.

Burges, C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2(2), 121-167.

Cai, Y. & Chen, C. (1995). Artificial neural network method for discriminating coding regions of eukaryotic genes. Comput Appl Biosci 11(5), 497-501.

Cai, Y. D., Liu, X. J., Xu, X. & Chou, K. C. (2000). Support vector machines for prediction of protein subcellular location. Mol Cell Biol Res Commun 4(4), 230-3.

Cai, Y. D., Yu, H. & Chou, K. C. (1998). Artificial neural network method for predicting HIV protease cleavage sites in protein. J Protein Chem 17(7), 607-15.

Cappelli, A., Donati, A., Anzini, M., Vomero, S., De Benedetti, P. G., Menziani, M. C. & Langer, T. (1996). Molecular structure and dynamics of some potent 5-HT3 receptor antagonists. Insight into the interaction with the receptor. Bioorg Med Chem 4(8), 1255-69.

Christensen, I. T. & Jorgensen, F. S. (1997). Molecular mechanics calculations of proteins. Comparison of different energy minimization strategies. J Biomol Struct Dyn 15(3), 473-88.

Christianini, N. & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines, University Press, Cambridge, UK.

Church, B. W. & Shalloway, D. (2001). Top-down free-energy minimization on protein potential energy landscapes. Proc Natl Acad SciUSA 98(11), 6098-103.

Cooley, J. W. & Tukey, J. W. (1965). An algorithm for the nachine calculation of complex Fourier series. Mathematics of Computation 19(90), 297-301.

Cormen, T. H., Leiserson, C. E. & Rivest, R. L. (1990). Introduction to algorithms, MIT Press, Cambridge, MA, USA.

Desmet, J., De Maeyer, M., Hazes, B. & Lasters, I. (1992). The deadend elimination theorem and its use in protein side-chain positioning. Nature 356, 539-542.

Ding, C. H. & Dubchak, I. (2001). Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4), 349-58.

Eddy, S. R. (1998). Profile Hidden Markov Models. Bioinformatics 14(9), 755-63.

Elber, R., Meller, J. & Olender, R. (1999). Stochastic path approach to compute atomically detailed trajectories: application to the folding of C peptide. Journal of Physical Chemistry B 103(6), 899-911.

Feller, W. (1968). An Introduction to Probability Theory % Its Applications, Vol 1. Probability & Mathematical Statistics.

Ferran, E. A., Pflugfelder, B. & Ferrara, P. (1994). Self-organized neural maps of human protein sequences. Protein Sci 3(3), 507-21.

Fogel, L. J., Owens, A. J. & Walsh, M. J. (1966). Intelligent decision making through a simulation of evolution. Behav Sci 11(4), 253-72.

Forney, G. D., Jr. (1973). The Viterbi algorithm. Proceedings of the IEEE 61(3), 268-78.

Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M. & Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906-14.

Gardiner, E. J., Willett, P. & Artymiuk, P. J. (2001). Protein docking using a genetic algorithm. Proteins 44(1), 44-56.

Geman, S. G. D., Geman, S. & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and Bayesian restoration of images. IEEE Transactions on Pattern Analysis % Machine Intelligence PAMI-6(6), 721-41.

Glover, F. (1989). Tabu search. 1. ORSA Journal on Computing 1(3), 190-206.

Glover, F. (1990). Tabu search. II. ORSA Journal on Computing 2(1), 4-32.

Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley.

Grate, L., Herbster, M., Hughey, R., Haussler, D., Mian, I. S. & Noller, H. (1994). RNA modeling using Gibbs sampling and stochastic context free grammars. Proc Int Conf Intell Syst Mol Biol 2, 138-46.

Gusfield, D. (1997). Algorithms on strings, trees, and sequences: Computer science and computational biology, Cambridge University Press, Cambridge [England]; New York.

Hare, B. J. & Prestegard, J. H. (1994). Application of neural networks to automated assignment of NMR spectra of proteins. J Biomol NMR 4(1), 35-46.

Hart, P., Nilsson, N. & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Systems Sei Cybernet 4(2), 100-107.

Henderson, J., Salzberg, S. & Fasman, K. H. (1997). Finding genes in DNA with a Hidden Markov Model. J Comput Biol 4(2), 127-41.

Holland, J. H. (1973). Genetic algorithms and the optimal allocation of trials. SIAM Journal on Computing 2(2), 88-105.

Hou, T., Wang, J., Chen, L. & Xu, X. (1999). Automated docking of peptides and proteins by using a genetic algorithm combined with a tabu search. Protein Eng 12(8), 639-48.

Hua, S. & Sun, Z. (2001). A novel method of protein secondary structure prediction with high segment overlap measure: Support Vector Machine approach. J Mol Biol 308(2), 397-407.

Illgen, K., Enderle, T., Broger, C. & Weber, L. (2000). Simulated molecular evolution in a full combinatorial library. Chem Biol 7(6), 433-41.

Jacoboni, I., Martelli, P. L., Fariselli, P., De Pinto, V. & Casadio, R. (2001). Prediction of the transmembrane regions of beta-barrel membrane proteins with a neural network-based predictor. Protein Sei 10(4), 779-87.

Karplus, K., Barrett, C. & Hughey, R. (1998). Hidden Markov models for detecting remote protein homologies. Bioinformaties 14(10), 846-56.

Karplus, K., Sjolander, K., Barrett, C., Cline, M., Haussler, D., Hughey, R., Holm, L. & Sander, C. (1997). Predicting protein structure using hidden Markov models. Proteins Suppl(1), 134-9.

Kaufman, L. & Rousseeuw, P. J. (1990). Finding groups in data: an introduction to cluster analysis. Wiley series in probability and mathematical statistics. Applied probability and statistics, Wiley, New York.

Kirew, D. B., Chretien, J. R., Bernard, P. & Ros, F. (1998). Application of Kohonen Neural Networks in classification of biologically active compounds. SAR QSAR Environ Res 8(1-2), 93-107.

Kirkpatrick, S. (1984). Optimization by simulated annealing: quantitative studies. Journal of Statistical Physics 34(5-6), 975-86.

Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE 78(9), 1464-80.

Kothekar, V., Raha, K. & Prasad, H. K. (1998). Molecular dynamics simulation of interaction of histone-like protein of mycobacterium tuberculosis (Hlpmt) and histone of clostridium pasteurianum (DBHclopa) with 35 based paired GC rich U-bend DNA. J Biomol Struct Dyn 16(2), 223-35.

Krogh, A., Brown, M., Mian, I. S., Sjolander, K. & Haussler, D. (1994). Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol 235(5), 1501-31.

Kulp, D., Haussler, D., Reese, M. G. & Eeckman, F. H. (1996). A generalized hidden Markov model for the recognition of human genes in DNA. In Conference on Intelligent Systems for Molecular Biology (ISMB), Vol. 4, pp. 134-42.

Lathrop, R. H. & Smith, T. F. (1996). Global optimum protein threading with gapped alignment and empirical pair score functions. J Mol Biol 255(4), 641-65.

Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S.,

Neuwald, A. F. & Wootton, J. C. (1993). Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262(5131), 208-14.

Leach, A. R. (1994). Ligand docking to proteins with discrete side-chain flexibility. J Mol Biol 235(1), 345-56.

Leach, A. R. & Lemon, A. P. (1998). Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins 33(2), 227-39.

Li, L., Darden, T. A., Freedman, S. J., Furie, B. C., Furie, B., Baleja, J. D., Smith, H., Hiskey, R. G. & Pedersen, L. G. (1997). Refinement of the NMR solution structure of the gamma-carboxyglutamic acid domain of coagulation factor IX using molecular dynamics simulation with initial Ca2+ positions determined by a genetic algorithm. Biochemistry 36(8), 2132-8.

Matis, S., Xu, Y., Shah, M., Guan, X., Einstein, J. R., Mural, R. & Uberbacher, E. (1996). Detection of RNA polymerase II promoters and polyadenylation sites in human DNA sequence. Comput Chem 20(1), 135-40.

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. (1953). Equations of states calculations by fast computing machines. J. Chem. Phys. 21, 1087-1092.

Nair, T. M., Tambe, S. S. & Kulkarni, B. D. (1995). Analysis of transcription control signals using artificial neural networks. Comput Appl Biosci 11(3), 293-300.

Nielsen, H. & Krogh, A. (1998). Prediction of signal peptides and signal anchors by a hidden Markov model. Proc Int Conf Intell Syst Mol Biol 6, 122-30.

Pavlidis, P., Furey, T. S., Liberto, M., Haussler, D. & Grundy, W. N. (2001a). Promoter region-based classification of genes. Pac Symp Biocomput, 151-63.

Pavlidis, P., Weston, J., Cai, J. & Grundy, W. N. (2001b). Gene functional classification from heterogeneous data. In Annual International Conference on Computational Biology, pp. 249-255. ACM Press.

Pedersen, A. G., Baldi, P., Brunak, S. & Chauvin, Y. (1996). Characterization of prokaryotic and eukaryotic promoters using hidden Markov models. Proc Int Conf Intell Syst Mol Biol 4, 182-91.

Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech recognition. Proceedings of the IEEE 77(2), 257-286.

Rost, B., Sander, C. & Schneider, R. (1994). PHD - an automatic mail server for protein secondary structure prediction. Comput Appl Biosci 10(1), 53-60.

Russell, S. J. & Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice Hall Series in Artificial Intelligence, Prentice Hall, Upper Saddle River, NJ.

Sadowski, J. & Kubinyi, H. (1998). A scoring scheme for discriminating between drugs and nondrugs. J Med Chem 41(18), 3325-9.

Schneider, G. & Wrede, P. (1998). Artificial neural networks for computer-based molecular design. Progress in Biophysics % Molecular Biology 70(3), 175-222.

Schuchhardt, J., Schneider, G., Reichelt, J., Schomburg, D. & Wrede, P. (1996). Local structural motifs of protein backbones are classified by self-organizing neural networks. Protein Eng 9(10), 833-42.

Schulze-Kremer, S. (2000). Genetic algorithms and protein folding. Methods Mol Biol 143, 175-222.

Sonnhammer, E. L., von Heijne, G. & Krogh, A. (1998). A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conflntell Syst Mol Biol 6, 175-82.

Sprik, M. (1996). Introduction to molecular dynamics methods. Proceedings of the Conference on Monte Carlo and Molecular Dynamics of Condensed Matter Systems 49, 43-88.

Stahl, M., Taroni, C. & Schneider, G. (2000). Mapping of protein surface cavities and prediction of enzyme class by a self-organizing neural network. Protein Eng 13(2), 83-8.

Szustakowski, J. D. & Weng, Z. (2000). Protein structure alignment using a genetic algorithm. Proteins 38(4), 428-40.

Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E. S. & Golub, T. R. (1999). Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6), 2907-12.

Tanner, J. J., Smith, P. E. & Krause, K. L. (1993). Molecular dynamics simulations and rigid body (TLS) analysis of aspartate carbamoyltransferase: evidence for an uncoupled R state. Protein Sci 2(6), 927-35.

Trosset, J. Y. & Scheraga, H. A. (1998). Reaching the global minimum in docking simulations: a Monte Carlo energy minimization approach using Bezier splines. Proc Natl Acad Sci U S A 95(14), 8011-5.

Vapnik, V. N. (2000). The nature of statistical learning theory. 2nd edit. Statistics for engineering and information science, Springer, New York.

Wang, H. C., Badger, J., Kearney, P. & Li, M. (2001). Analysis of codon usage patterns of bacterial genomes using the self-organizing map. Mol Biol Evol 18(5), 792-800.

Waterman, M. S. (1995). Introduction to computational biology: Maps, sequences and genomes. 1st edit, Chapman & Hall, London; New York, NY.

Westhead, D. R., Ciark, D. E. & Murray, C. W. (1997). A comparison of heuristic search algorithms for molecular docking. J Comput Aided Mol Des 11(3), 209-28.

Willett, P. (1995). Genetic algorithms in molecular recognition and design. Trends Biotechnol 13(12), 516-21.

Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lengauer, T. & Muller, K. R. (2000). Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16(9), 799807.

Continue reading here: Integrating and Accessing Molecular Biology Resources

Was this article helpful?

0 0