PubChem is a free database of information about small organic molecules and their activities against biological assays. It was created by NIH in 2004 and is maintained by the National Library of Medicine. The database connects chemical information with biomedical research and clinical information, organizing facts in numerous databases into a unified whole.
PubChem consists of three dynamically growing databases:
- PubChem Compound: Contains pure and characterized chemical compounds.
- PubChem Substance: Contains mixtures, extracts, complexes and uncharacterized substances.
- PubChem BioAssay: Contains database results from high-throughput screening programs with several million values.
The integration of these databases makes PubChem a critical tool to speed the development of new treatments for patients, bringing information about the biological activities of chemical substances to biomedical researchers on a broad scale. NCATS makes assay data available in PubChem.
PubChem Data Guideline
The quantitative high-throughput screening (qHTS) data in PubChem are preliminary. For this reason and because of limited compound quantities, PubChem does not supply probe compounds to investigators other than those who originally submitted the assay.
NCATS-generated data presented in PubChem represent primary qHTS data. Each sample is tested as a titration series to provide a concentration-response output. Although the results accurately describe the effect of the sample on the assay end point, the “actives” are not necessarily due to effects on the intended target (i.e., false-positives). PubChem provides these primary data to enable analysis using cheminformatic algorithms, to guide the selection of compounds for subsequent chemistry optimization, and to populate the “chemical genomics” database of compound-activity profiles. The value of this database increases as additional assays and compounds are added.
In interpreting and using qHTS data, investigators should remember the following:
- The sample tested is limited in quantity, so NCATS cannot supply screening samples upon request. Some samples are commercially available and can be purchased inexpensively from vendors directly.
- The effect of the sample on the assay described in PubChem may reflect artifacts that result from the sample’s physical or spectroscopic properties, such as its interference in the assay due to aggregation in aqueous buffer or absorbance of emitted fluorescence for signal detection. Flags indicating the propensity for interfering phenomenon from samples in the library are included in the data set.
- Quality control information is not necessarily current. The results are determined from “samples” because the term “compound” implies a single chemical entity. Subsequent analysis by liquid chromatography-mass spectrometry and verification of the activity are performed for a subset of the samples, and these data are entered into PubChem.
The IC50/EC50s (referred to as AC50s) determined from the normalized titration-response data (n = 1) are estimates. Curve-fitting artifacts can occur due to the high-throughput nature of the analysis. Flags indicating whether a curve fit is verified are updated over time. The primary data are available for interpretations by others.