Research project
Cheminformatics: Analyzing small-molecule activity data
While bioinformatics methods deal with the analysis of sequence information (be it proteins or DNA), the field of cheminformatics is concerned with the analysis of small-molecule datasets.
- Contact
- Gerard van Westen
Though several computational methods used in the two are indeed overlapping, it is still the case that there are also fundamental differences between a protein sequence and a small molecule. For instance, it is difficult to represent small molecules as linear sequences (think of rings for example), and also the type of analysis performed can be quite different.

The figure shows a plot of all FDA approved drugs (yellow, approximately 1000), a comparable set of preclinical compounds (blue, approximately 1000), and a set of compounds aimed at a certain class of proteins (red, approximately 300). The goal can be to visualize how similar these three rather different sets are.
However, there are more things one can do with cheminformatics, some typical questions are the following:
• We have a large database of inhibitors of a given enzyme. What are the substructures correlated with activity? (This information can be used to optimize inhibitor activity in drug discovery programs.)
 • What is the solubility of structure A? (Drugs need to be soluble in water in order to get to their site of action - cheminformatics methods can generate structure-property models to predict properties such as water solubility.)
 • We want to synthesize allosteric inhibitors of a certain protein, what is a good place to start?
 • We want to systematically store the results from the synthesis and testing of compounds in our group, such that these structures are searchable. How do we best do that?
