Translational Data Analytics
Our research activities involve the development of novel analysis methodologies of various types of research data and the application of such novel methodologies — or existing ones — to various translational research projects.
Machine Learning and Descriptive Modeling
The IFX Core engages in various research efforts where machine learning (ML) methods are applied in new ways or novel ML approaches are developed. Our goal is to translate big biomedical data into knowledge for supporting clinical and preclinical research systematically using state-of-the-art computational techniques. Our approaches allow us to collect, integrate and analyze large and diverse bodies of preclinical and clinical data, which can help researchers prioritize therapeutic hypotheses and reveal hidden relations between drugs, targets and diseases.
Prediction of Chemical Properties
- Building machine learning models to predict a wide range of ADME (Absorption, Distribution, Metabolism and Excretion) properties (e.g. rat liver microsomal stability, parallel artificial membrane permeability assays, kinetic aqueous solubility, cytochrome P450 mediated metabolism) using the ADME database
- Curated multispecies acute toxicity data, primarily focusing on the various endpoints, such as lethal dose 50, lethal dose low and toxic dose low. The data were obtained from ChemIDPlus. We developed multitask prediction models using random forests, deep neural networks and graph-based neural networks
- Curated bioactivity data for hERG channel inhibition; the data were obtained from ChEMBL and integrated with NCATS’ in-house data from a thallium-flux assay, a high-throughput assay for measuring hERG channel activity. We provide prediction models built on the integrated data set using both classical and modern AI approaches.
Extracting Knowledge From Data in Rare Diseases
- Development of natural language processing (NLP)–based approaches to systematically analyze PubMed abstracts, social media, and NIH funding pertinent to rare diseases. These analyses help identify gaps and scientific challenges that remain unaddressed in rare disease research (John, et.al., AMIA Annu Symp Proc. 2021; Zhu, et.al., Orphanet J Rare Dis. 2021; Karas B, et.al., Front. Artif. Intell. 2022; Zhu, et.al., Front. Artif. Intell. 2022; Kariampuzha, et al., J Transl Med. 2023).
- Development of a computational approach to support data harmonization and data interoperability with existing standardized terminologies and ontologies for NCATS’ Genetics and Rare Diseases (GARD) Information Center. One outcome from these efforts, the GARD Data Tree, has facilitated curation efforts (Zhu, et al., JMIR Med Inform. Oct 2020).
Molecular Profiling and Multi-Omic Methods
The IFX Core is actively working on developing omic and multi-omic algorithms and tools to help interpret these data. These efforts are highly collaborative and involve investigators within NCATS’ DPI and beyond.
- Development of multi-omic (e.g. metabolomics, proteomics, transcriptomic), pathway-based and numerical-based integration methods
- Development of methods for the analysis of dose response transcriptomic profiles
- Recent advances in mass spectrometry-based computational metabolomics
- Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources