Scientific Computing, Services and Research

On this page:

Research Infrastructure Support

NCATS Biomedical Data Translator

The Biomedical Data Translator program is a consortium of NCATS and extramural data science researchers that supports the integration of existing medical and biological data sources to produce tools for understanding the pathophysiology of human disease to augment human reasoning and inference. The informatics backbone of this effort is the development of community standards for data reuse, including Biolink as a semantic standard, Smart-API for discoverability and the Reasoner API as a communication standard.

COMETS Analytics

COMETS Analytics supports and streamlines consortium-based analyses of metabolomics data. Unique features of COMETS Analytics include an algorithmic and reproducible approach to diagnose, document, and fix model issues. These features enable users to run standardized models across many cohorts in a timely manner and eliminate the need for manually customizing models by cohort, which can be very time-consuming and error-prone.

UI/UX Research and Design

UI/UX research and design is being increasingly applied to augment usability of our apps. We apply a user-centered approach to software development, through the use of user research methods, to query and understand user needs. This process helps ensure that the apps developed meet user needs, thereby minimizing risk and development efforts.

Cheminformatics and Other Utilities

Layered Chemical Identifier (LyChI)

The Layered Chemical Identifier (LyChI) is a chemical standardization tool that generates a unique hash for chemicals that is layered and used for quick fuzzy uniqueness checks and searches. A unique feature of the LyChI hash keys is that they are, to a certain extent, lexicologically meaningful.




MolVec is an optical chemical structure recognition software that converts images into structured data for computation. The software can take images of chemical rendering in a variety of formats (e.g., PNG, TIFF, GIF) as input and produces vectorized 2-D formats (e.g., SDF) that faithfully reconstruct the drawn structures. MolVec currently is considered one of the most accurate open-source tools for this task.


Molwitch is a cheminformatics bridge layer application programming interface (API) that allows users to switch the underlying cheminformatics library, such as Jchem, CDK or Indigo, without having to recompile their code.


Molwitch-renderer takes in a chemical structure in molfile or smiles format and produces a rendered image of that structure. The software uses the Molwitch library.

Scaffold Hopper

Scaffold Hopper allows a user to “hop” between related contexts (i.e., structures, documents, targets, MeSH terms) with a single click. A novel feature of this software is that it can automatically perceive R group decomposition.


Stitcher provides a graph-based approach to entity stitching and resolution using clique detection. This software currently is used to support work on providing reference data sets for drugs and rare diseases.

Structure Indexer

Structure Indexer is an inverted index data structure to support fast structure searching. The implementation is based on Apache Lucene. The software can be used as a standalone or embedded within a service. It currently is used by the Global Substance Registration System (G-SRS) software.

Support for NCATS Scientific Computing

The Informatics (IFX) Core produces customized computational workflows and to enable and streamline the analysis of data obtained from novel technologies (e.g. metabolomics, RASL-Seq, etc.). These workflows are then embedded within the NCATS scientific computing environment to meet the needs of DPI. These methodologies could readily be embedded in other environments as well.

Examples of customized computational workflows include bulk and single cell RNA sequencing pipelines, high throughput screening analyses using Spotfire, compound registration and management, and qHTS and matrix data analysis applications.

Collaborative Research Efforts

The IFX Core applies state-of-the-art analysis methodologies, some of which are developed by our group, to large molecular and -omics data sets collected in translational research. Generally, we aim to identify molecules (e.g., DNA, RNA, proteins, metabolites, etc.) that identify cellular and disease states and to facilitate interpretation of these complex data to further our knowledge of biological mechanisms underlying disease and cellular mechanisms.

Metabolomics and Multi-Omics Profiling to Identify Putative Biomarkers and Elucidate Disease Processes

  • Use comprehensive metabolomic and lipidomic characterization of dedifferentiated liposarcoma cell lines to identify MDM2-dependent molecular rewiring that underlies chemoresistance.
  • Evaluate metabolomic and proteomic profiles in 2-D and 3-D lung models to understand cellular responses to infection.
  • Conduct metabolomic analysis of human plasma samples in a prospective study of COVID-19 patients to identify markers of disease severity.
  • Characterize the effects of diet and prebiotic supplementation on the microbiome and metabolome that lead to the development of aberrant crypt foci and behavioral changes, respectively.

Single-Cell Sequencing Techniques to Gain Insights into Small-Molecule Chemical Biology

  • Evaluate stem cell differentiation through single- and multi-compound studies to optimize for the intended cellular fate, including cell type classification and tracking marker gene sets through differentiation time courses.
  • Evaluate cellular response to small molecules in cancer models to understand cell type–specific responses and response heterogeneity.