February 26, 2021: Informatics: The ultimate translational team sport

A colleague of mine refers to the mass collection of data in the belief that it will automatically produce novel insights as “data composting”. As an avid gardener, I can attest to the magic of piles of leaves and grass to produce a wonderfully rich product all by itself. But as an avid scientist, I also can attest to the utter inability of piles of data to do the same.

I have reflected in previous messages about the power of data sharing to accelerate translation. But simply sharing data is not enough: It must be shared in a way that is useful when combined and interpretable by others. The necessary ingredient in transforming piles of data into actionable information is the involvement of translational team members from the fields of data science and informatics.

NCATS places enormous emphasis on informatics in all its programs. That’s because data are the most easily shared translational reagent, both instantaneously and in virtually unlimited quantity. But as with so many other issues in translational science, the methods, standards, and operational best practices required to efficiently produce useful new insights from the aggregated data have yet to be developed and demonstrated. Our preclinical informatics work in drug screening, repurposing, and toxicology, our rare diseases informatics efforts, and the ambitious Biomedical Data Translator program all have begun to greatly accelerate translational discovery.

Clinical informatics — the sharing and analysis of data about and from humans — has both special challenges and unprecedented potential for insights of immediate translational value for medicine and patients. Human data must be protected from misuse with utmost security, so NCATS spent the last several years developing a data enclave environment with unmatched protections. At the same time, NCATS informaticians worked with their colleagues across the CTSA Program consortium and at the Center for Clinical Data to Health (CD2H) to connect nearly a hundred health care centers across the country and enable secure sharing, harmonization and analysis of clinical data, focusing on the systematic barriers to data interoperability and the use of electronic health records (EHRs) for research.

We all knew that this massive team effort would allow insights into the causes, characterization, and treatment of human disease in ways utterly inconceivable before. But we never could have foreseen how timely and critical this work would be in addressing the greatest public health crisis in more than 100 years: COVID-19. These efforts enabled us to quickly stand up the National COVID Cohort Collaborative (N3C) Data Enclave, a centralized, harmonized, high-granularity EHR repository. It has grown to become the largest, most representative U.S. cohort for COVID-19, with EHR data from more than 800,000 people diagnosed with COVID-19 and 2.6 million controls. Through the more than 100 projects involving 1,900 investigators utilizing the N3C Data Enclave from more than 500 institutions across the country, we are learning an enormous amount — in close to real time — about both COVID-19 and the post-COVID-19 syndrome that is still being defined.

I often speak of NCATS’ “3 Ds” operational paradigm: to Develop, Demonstrate and Disseminate solutions to barriers to translational efficiency and effectiveness. N3C is the largest Demonstration project we have ever undertaken. It was made possible by years of Development by hundreds of informaticians and data scientists all working together in an amazing and inspiring team effort. Once COVID-19 is finally behind us, we look forward to Disseminating what N3C is teaching us to improve clinical research and care for all diseases.

Stay well,

Christopher P. Austin, M.D.
National Center for Advancing Translational Sciences