February 7, 2024: Unlocking the Power of Data for Faster Health Solutions
Finding important links among different types of data is key to bringing more treatments to all people more quickly. We have invested in new ways to connect, access, and learn from large and complex data sets. We also create and use data tools and methods in new ways to speed translational research.
Data science and informatics drive so much of our work. Activities include data-linking efforts like RARe-SOURCE™ for understanding and advancing treatments for rare diseases and research platforms like OpenData Portal for openly and quickly sharing experimental data. These also can be pivoted to address emerging needs. CURE ID, our collaboration with the U.S. Food and Drug Administration, has a new emphasis on the experiences of patients and providers with long COVID. You can learn more about how we turn data into solutions on Our Impact on Big Data page.
Two NCATS projects stand out as examples of data innovation that lead to better health. The NCATS National COVID Cohort Collaborative (N3C) connects de-identified data from electronic health records to provide quick and useful public health insights. Our Biomedical Data Translator (“Translator”) combines data from more than 100 public sources to find existing drugs that could be tested for new uses through clinical trials. Both projects show that using bold and rigorous approaches and team science are crucial.
N3C: A Sixth Sense for COVID-19 Research
N3C has been a key resource in COVID-19 research, working almost like a sixth sense to help researchers and others gain new information. In addition to its large collection of clinical data, N3C currently links to 71 other data sets to create a more complete picture of COVID-19 health outcomes. Multidisciplinary research teams continue to work within the N3C Data Enclave to explore important COVID-19 health questions.
A current project uses N3C data to test whether the diabetes drug metformin can reduce or even prevent severe outcomes of COVID-19, including long COVID. The findings will add to data from clinical trials, such as ACTIV-6, potentially offering more definitive conclusions sooner and helping to assess the usefulness of N3C’s real-world data to simulate traditional clinical trials. Such emulation trials could be an important tool for optimizing clinical trials before they launch. Other projects have used N3C to find that a drug typically used to treat depression (SSRIs) could reduce the risk of getting long COVID and that increasing access to Paxlovid could save lives.
N3C, which built upon data science work done by the Rare Diseases Clinical Research Network, was created through the incredible leadership and determination of our Clinical and Translational Science Awards (CTSA) Program. N3C’s data come from nearly all institutions funded by the CTSA Program, plus others funded by a program of NIH’s National Institute of General Medical Sciences, as well as OCHIN’s vast network of community health organizations. A team of grantees, federal staff, and data analytics partners ensured that the data were compatible and could be compared in a secure platform.
Our N3C secure data platform is considered “best in class.” It is a model that is being used and adapted by other major players in this space, including the All of Us Research Program and Advanced Research Projects Agency for Health, or ARPA-H.
Translator: Evidence-Based Connections for Drug Repurposing
Translator pulls together public data from research databases and scientific publications, as well as open clinical data from every branch of biomedicine. It uses a set of software tools to organize this data into knowledge graphs that can show how and why concepts are connected in ways that previously may not have been known, or readily discovered, such as that between a molecular pathway and a disease phenotype.
Translator’s cutting-edge reasoning tools make evidence-based connections, rather than those based on likelihood, like ChatGPT and other large language models (LLMs). That said, we are looking at ways to add LLMs to Translator and other NCATS projects. NCATS’ recent colloquium on LLMs covered the challenges for data sharing, reproducibility, education and training, regulatory oversight, and ethics and trust that we will need to address as a community.
The alpha version of Translator is now available. This release is currently limited to a set of specific queries, but it will demonstrate the immense capability of Translator for drug repurposing. For example, a research team using Translator found that a common blood pressure medicine, guanfacine, possibly could be used to treat a young girl with a rare neurodevelopmental disorder. Given guanfacine’s known safety and mechanism of action, the researchers shared this information with the medical team, who decided to try it as a treatment. After a few months on the medicine, the girl’s motor, social, and behavioral skills improved. Translator insights also led to a clinical trial testing ketamine for another neurodevelopmental disorder known as ADNP syndrome.
Translator’s software tools and user interface were built by a consortium of more than a dozen teams. Their members are part of committees, working groups, and task forces that meet regularly to enhance Translator’s abilities. They also are finding ways to improve the validity and reproducibility of the results, include clinical data in a useful way, and keep up with regular updates to the open-source data model.
We are just scratching the surface of how N3C and Translator can use data to quickly address unmet medical needs. We recently finished a pilot project to explore whether non-COVID research can be done using N3C’s secure data enclave structure. The findings will help us move toward a National Clinical Cohort Collaborative.
It may even be possible for tools like N3C and Translator to connect, but we must think carefully and thoughtfully about how to do that. Our vision is to make data useful to all researchers. While data have the power to create equity, they also have the power to create more inequity. We have to understand the limitations of data, build well-tested tools for using them, and, most importantly, protect the privacy of people who share their data so that the knowledge gained helps all people.
Your partner in science and health,
Joni L. Rutter, Ph.D.
Director
National Center for Advancing Translational Sciences