Skip to main content
HHS Logo U.S. Department of Health & Human Services Divider arrow NIH logo National Institutes of Health Alt desc
Core Technologies


Broadly, the goal of informatics is to transform raw, numeric data obtained from large-scaleexperiments into actionable decisions in chemistry and biology. Given the wide range of science carried out at NCATS, the Center’s informatics scientists apply techniques from a broad array of disciplines, including cheminformatics, bioinformatics, computational biology and chemistry, to enable experimental decision-making. Key to these activities is the development of algorithms and software to disseminate research results to the broader community. The informatics scientists at NCATS also form collaborative relationships with other investigators to develop robust assay designs and analytics. Additionally, NCATS’ informatics experts develop chemical libraries, such as the NCATS Pharmaceutical Collection and the broader collection of drug-like compounds.

NCATS’ informatics activities can be broadly classified into two areas. The first involves day-to-day support of data processing and analysis from high- and low-throughput screens. The second area consists of research and development of novel methodologies to enhance the analysis of high-throughput and high-content screening data for small molecule and RNAi screens.


General informatics capabilities at NCATS include:

  • Cheminformatics and computational chemistry
  • Bioinformatics
  • Mathematical and statistical modeling
  • Scientific software development

Additionally, the informatics team collaborates closely with experimentalists to develop robust assay designs and analytics. NCATS experts tackle a wide variety of ligand- and protein structure–related modeling tasks, ranging from quantitative structure-activity relationship modeling to docking and molecular dynamics simulations. The team also supports bioinformatics analyses for RNAi screens and is in the process of developing protocols to integrate results from RNAi and small molecule screens. The team members also are developing infrastructure and software for the analysis of high-content screens, with the aim of integrating multiple imaging hardware and software platforms.

In addition to scientific tasks, the informatics team supports the backend databases and services on which many tools and applications depend, and team members develop Web interfaces for a variety of high-throughput screening operations.

Informatics and Quantitative High-Throughput Screening (qHTS)

NCATS’ qHTS technology directly depends on the informatics team’s computational infrastructure to convert measured responses from millions of microtitre plate wells to dose-response curves, enabling the identification of active and inactive compounds. Given that a screen can generate results for more than 400,000 compounds, the informatics team has developed an efficient grid-based curve-fitting algorithm that has been shown to outperform R and Excel. A stand-alone version of the code also is available.

Currently, NCATS’ backend databases host upwards of 60 million dose-response curves. Although access to dose-response curves provides many advantages in the high-throughput screening workflow, the team has further enhanced use of these data by developing a heuristic classification. Using this classification scheme, team members can rapidly identify high-quality curves, representing active compounds; curves with no fit, representing inactive compounds; and curves whose fits are of low quality, representing compounds with inconclusive activity.


Algorithm implementation tools include:

  • NCATS Chemical Genomics Center CurveFit
    A public, open-source version of the informatics team’s curve-fitting software, which automatically fits and classifies thousands of dose-response curves.
  • PubChem Fingerprint (FP) for JChem
    Implementation of PubChem’s fragment-based FP using ChemAxon’s JChem library, provided for public use for integration in JChem-based software development projects.
  • Chemical Structure Processing
    A Java class for generating canonical structures from a high-throughput screening perspective so that common chemical entities can be identified from compound collections.
  • Atom Pair Descriptors
    Implementation of atom pair descriptors using the JChem library, provided for public use for integration in JChem-based software development projects.


  • Rajarshi Guha, Ph.D.
  • Xin Hu, Ph.D.
  • Ruili Huang, Ph.D.
  • Dac-Trung Nguyen
  • Min Shen, Ph.D.
  • Noel Southall, Ph.D.
  • Hongmao Sun, Ph.D.
  • Yuhong Wang
  • Tongan Zhao