Fig. 25.3. Clustering may produce diverse sets of compounds, in which diversity occurs along one dimension.
Depending on the values of K and M, the Jarvis-Patrick procedure tends to produce either very large clusters (if M and K are rather low) or many singleton clusters (if M and K are high), but it is a relatively fast algorithm compared with the hierarchical clustering methods.
A major drawback to clustering is that it gives no information about the chemical space which is covered. The clusters may appear to be very widespread, but if diversity occurs just along one or a few dimensions of chemical space, the other dimensions still may not be covered at all. In Fig. 25.3, the compound clusters seem to be diverse, but they are almost exclusively spread along the x-axis. There is hardly any diversity along the y-axis. Another approach which circumvents this problem is a partition-based selection. In this method, chemical space is divided into a number of segments along each axis. These segments generate a number of smaller volume elements (''bins'') in the chemical space envisaged. A diverse set of compounds can then be chosen to maximize the number of different bins filled with representatives. The same set of compounds as in Fig. 25.3 shows in Fig. 25.4 that the chemical space along the x-axis is covered very uniformly, whereas the space along the y-axis is hardly covered at all.
Was this article helpful?