Skip to main content

Scientific Computing, Services and Research

The Informatics Core spearheads efforts to increase the computing capacity of large consortia, support research and provide solutions to the wider translational science community.

Research Infrastructure Support

We provide informatics support for conducting consortium-wide projects and analyses.

NCATS Biomedical Data Translator

The Biomedical Data Translator program is a consortium of NCATS and extramural data science researchers that supports the integration of existing medical and biological data sources to produce tools for understanding the pathophysiology of human disease to augment human reasoning and inference. The informatics backbone of this effort is the development of community standards for data reuse, including Biolink as a semantic standard, Smart-API for discoverability and the Translator Reasoner API as a communication standard.

COMETS Analytics

COMETS Analytics supports and streamlines consortium-based analyses of metabolomics data. Unique features of COMETS Analytics include an algorithmic and reproducible approach to diagnose, document and fix model issues. These features enable users to run standardized models across many cohorts in a timely manner and eliminate the need for manually customizing models by cohort, which can be very time consuming and error prone.

Cheminformatics and Other Utilities


Molwitch is a cheminformatics bridge layer application programming interface (API) that allows users to switch the underlying cheminformatics library, such as Jchem, CDK or Indigo, without having to recompile their code.

OpenData Renderer

Molwitch-renderer takes in a chemical structure in molfile or smiles format and produces a rendered image of that structure. The software uses the Molwitch library.


Stitcher provides a graph-based approach to entity stitching and resolution using clique detection. This software currently is used to support work on providing reference data sets for drugs and rare diseases.

Structure Indexer

Structure Indexer is an inverted index data structure to support fast structure searching. The implementation is based on Apache Lucene. The software can be used as a standalone or embedded within a service. It currently is used by the Global Substance Registration System (G-SRS) software.

Support for NCATS Scientific Computing

The Informatics (IFX) Core produces customized computational workflows to enable and streamline the analysis of data obtained from novel technologies (e.g., metabolomics, RASL-Seq, etc.). These workflows are then embedded within the NCATS scientific computing environment to meet the needs of DPI. These methodologies could readily be embedded in other environments as well.

Examples of customized computational workflows include bulk and single-cell RNA sequencing pipelines, high-throughput screening analyses using Spotfire, compound registration and management, and comprehensive metabolomic profiling.

Collaborative Research Efforts

The IFX Core applies state-of-the-art analysis methodologies, some of which are developed by our group, to large molecular and -omics data sets collected in translational research. Generally, we aim to identify molecules (e.g., DNA, RNA, proteins, metabolites, etc.) that identify cellular and disease states and to facilitate interpretation of these complex data to further our knowledge of biological mechanisms underlying disease and cellular mechanisms.

Metabolomics and Multi-Omics Profiling to Identify Putative Biomarkers and Elucidate Disease Processes

  • Evaluate metabolomic and proteomic profiles in 2-D and 3-D lung models to understand cellular responses to infection.
  • Conduct metabolomic analysis of human blood samples in prospective studies to identify markers of disease severity (e.g., macular degeneration, COVID-19 and other infectious diseases).
  • Characterize the effects of diet and prebiotic supplementation on the microbiome and metabolome that lead to the development of aberrant crypt foci and behavioral changes, respectively.
  • Use comprehensive metabolomic and lipidomic characterization of dedifferentiated liposarcoma cell lines to identify MDM2-dependent molecular rewiring that underlies chemoresistance.

Single-Cell Sequencing Techniques to Gain Insights into Small-Molecule Chemical Biology

  • Evaluate stem cell differentiation through single- and multi-compound studies to optimize for the intended cellular fate, including cell type classification and tracking marker gene sets through differentiation time courses.
  • Evaluate cellular response to small molecules in cancer models to understand cell type–specific responses and response heterogeneity.