i- co in ooi- Tf i—i—i—i—CMON.1— LULU LU LULU0- CL □_<




I ■ 1





cyclin A















Other i-COIOOOT- -tf T- T- T- T- CM o r*. 1-LUUJLULULUa.Q.Q.<

i-COIOOOT- -tf T- T- T- T- CM o r*. 1-LUUJLULULUa.Q.Q.<


expression. Colors distinguish among recognized functional categories: red, neurotransmitter receptors; purple, neuroglial markers; green, peptide signaling; blue, diverse. Genes are clustered into "waves," with the average temporal expression pattern for all genes in a wave graphed below (see the later discussion of clustering). From Wen et al. [8].

Computational analyses in drug target discovery

Shannon entropy

Although it may be tempting to examine individual gene expression patterns and note that those patterns concur with the literature, a large data set becomes much more useful when examined in a global way using computational analysis. A certain amount of analysis can be accomplished by visual inspection, but the larger data sets such as those provided by DNA microarrays require more computing power.

One method of organizing temporal gene expression data is by applying a measure known as Shannon entropy, originally developed for analyzing telegraphic signals [11]. This is an information theoretic measure that provides the complexity or information content of a series of events. Fuhrman et al. [12] have proposed its application to gene expression data as a way of identifying drug target candidates from among thousands of genes expressed in parallel. Shannon entropy is a very direct way for pharmaceutical scientists to deal with large-scale gene expression data, and is defined as H = —Zpi log2 pi, where p is the probability (frequency) of an event.

Shannon entropy is a measure of variation or change over a time series, and this is the same criterion used by biologists to determine which genes are the most interesting. Genes that exhibit significant changes are regarded as good drug target candidates, while those that remain relatively invariant in expression are more often ignored. In most cases these changes are observed in single-time point studies, such as comparisons of non-diseased versus diseased tissue. For data sets containing multiple time points, however, Shannon entropy is useful in selecting out genes with the greatest variation in expression over an entire time course.

Figure 5.2 explains how Shannon entropy works. Expression data must be binned for this calculation, which can be performed with spreadsheet software. Fuhrman et al. [14] contains instructions for calculating Shannon entropy in a computer spreadsheet. The number of bins is determined by the number of time points in the series, and should be <log (base 2) of the number of time points (Bruce Sawhill, Santa Fe Institute, unpublished). So, for fewer than eight time points, two bins may be used; for eight or more time points, three bins; for 16 or more, four bins, etc. It is preferable to use a large number of time points, in part to help avoid the problem of binning artifacts caused by expression values that lie close to bin boundaries.

Once the calculation of entropy is completed, the genes can be rank-ordered. In terms of rank-ordering of genes, Shannon entropy distinguishes itself as a measure of complexity when applied to a temporal series with at least eight time points; with fewer than eight time points, Shannon entropy provides the same rank-ordering as the variance. Genes with the highest

Continue reading here: Spilog2pI

Was this article helpful?

0 0