To evaluate the diversity of compounds and of combinatorial libraries, there first needs to be some kind of description system for each substance in the form of a number or a set of numbers which allows a quantitative comparison of the structural differences between two entities. Instead of a set of numbers, a bitstring is often used which displays properties just in terms of''absent'' (i.e. 0) or ''present'' (i.e. 1). This system allows a fast comparison of two bitstrings.
Descriptors can be differentiated according to their dimensionality [21-23]:
• One-dimensional (1D) descriptors describe the molecular properties of molecules holistically with a unique number such as log P, molecular weight, or number of hydrogen bond donors and acceptors. These values have a large impact on pharmacokinetic parameters and are often used as filters in library design rather than as actual descriptors for molecular similarity and diversity .
• Two-dimensional (2D) descriptors are based on the 2-dimensional representation of molecules, i.e. the structural formula. They normally use bitstrings to describe the presence or absence of structural fragments (e.g. carboxylic acids or certain ring systems) or of certain atom patterns of 2-7 atoms' length (molecular fingerprints) . For the use of structural fragments, the MACCS (Molecule Accessing System) structural fragment keys  are often used, which were originally developed for substructure searching in chemical databases. These fragment keys suffer from their lack of generality, since not all possible structural fragments can be described by a reasonable set of fragment keys. Molecular fingerprints generate a bitstring from indexing all possible paths of defined lengths through a molecule. Since the number of all possible paths is too high to assign every different path to unique bits, the fingerprints are hashed, meaning that bits are associated with several paths. This hashing causes a lack of accuracy in the description.
• Three-dimensional (3D) descriptors take the spatial relationships of chemical features into account. Since distances and angles between functional groups can adopt continuous values, a distance or angle range (e.g. 2-10 A or 0-180°) is defined which is subdivided into a number of bins of certain bin width. A set of bits is then assigned to these bins to encode all possible conformations of predefined pairs of features between which the distances or the angles are determined. These descriptors do not consider high-energy conformations but treat all possible conformations which descend from the rotation around C-C bonds as being equal. The computational effort used for these flexible descriptors is very high. Therefore, sometimes rigid descriptors are used, which only consider one or a few minimum energy conformations of the molecule.
Pharmacophores are also used as 3D descriptors. These pharmacophores are specified by three interaction centers of seven-center types which are responsible for the interaction with molecular receptor sites: these are hydrogen bond donors, hydrogen bond acceptors, hydrogen bond donors and acceptors, aromatic rings, hydrophobic regions, acidic sites, and basic sites. While the arrangement of three pharmacophores does not allow the differentiation of enantiomeric compounds, a further refinement of this concept uses four pharmacophoric points, which also consider the chirality of the centers. A major drawback to the use of three- or four-point pharmacophores is the need for rather long calculation times (about 1 min per compound in the case of four-point pharmacophores) , which can be a limiting criterion in the evaluation of whole virtual libraries containing numerous compounds.
Was this article helpful?