September 2, 2020
Researchers studying COVID-19 now are able to access an innovative new analytics platform that contains clinical data from the electronic health records of people who were tested for the novel coronavirus or who have had related symptoms. Part of the NCATS National COVID Cohort Collaborative (N3C) Data Enclave, the centralized and secure data platform features powerful analytics capabilities for online discovery, visualization and collaboration. The data are robust in scale and scope and are transformed into a harmonized data set to help scientists study COVID 19, including potential risk factors, protective factors and long-term health consequences.
The N3C Data Enclave is anticipated to be one of the largest collections of data on COVID-19 patients in the United States. Data analysis within the enclave is supported by both R and Python, the most widely used open-source platforms for statistical analysis and data science (Watch a demonstration of the platform). Researchers requesting access to, or working within, the enclave are encouraged to assemble collaborative teams with diverse expertise in such areas as clinical research, statistical analysis and informatics to make the best use of the N3C Data Enclave.
Researchers interested in accessing the data will need to register with N3C and submit a Data Use Request for review by the N3C Data Access Committee. Learn more about the process and requirements, including data security training, for data access.
A Resource Unlike Any Other
- Harmonized data
- The platform translates the different ways that contributing hospitals store patient data into a single, common format to enable combined “apples-to-apples” analyses.
- Robust in scale and scope
- Currently, 57 sites across the country have agreed to transfer diverse data from individuals tested for COVID-19, including demographics, symptoms, laboratory test results, procedures, medications, medical conditions, physical measurements and more.
- By marshalling the national reach of the Clinical and Translational Science Awards Program network, N3C is ensuring that the data represent the diversity of the country so researchers can understand and address geographic and population disparities during the pandemic.
- Powerful analytics capabilities
- The platform is built to enable machine-learning approaches and rigorous statistical analyses to identify connections and patterns more quickly than can be done through traditional methodologies. These advanced analytics approaches can lead to the simultaneous exploration of multiple questions — and the revealing of likely answers — on a powerful scale.
- Centralized and secure
- The data reside and remain in NCATS’ secure, cloud-based database, certified through the Federal Risk and Authorization Management Program, or FedRAMP, which provides standardized assessment, authorization and continuous monitoring of cloud products and services, thereby ensuring the validity of the data while protecting patient privacy.
- Three levels of protected data are included for analysis: a synthetic data set that is statistically comparable to the original data and does not contain protected health information (PHI); a de-identified data set that has been stripped of PHI identifiers; and a Limited Data Set that includes only two of the 18 HIPAA-defined elements: patient ZIP code and dates of service.
Learn more about N3C data including data stewardship and protections.
A Powerful Tool for Researchers
Having access to a centralized enclave of this magnitude will help research teams study, probe and answer clinically important questions about COVID-19 that they previously could not, such as —
- Can we predict who might have severe outcomes?
- Who might need dialysis because of kidney failure?
- Who might need a ventilator because of lung failure?
- Do some therapies work better than others?
- Why are some people asymptomatic?
- What kind of long-term health consequences will people need to know about?
“The exciting transformation this platform represents is in providing an environment where the power of the analytics can be used to quickly examine new COVID-19 hypotheses.”
—Warren A. Kibbe, Ph.D.
Chief, Translational Biomedical Informatics in the Department of Biostatistics and Bioinformatics
Chief Data Officer, Duke Cancer Institute
The N3C is a collaborative and community-driven partnership among the NCATS-supported Clinical and Translational Science Awards (CTSA) Program hubs and the National Center for Data to Health (CD2H), with overall stewardship by NCATS.
If you have questions about the N3C, please email NCATS_N3C@nih.gov.