N3C Data Overview

The NCATS National COVID Cohort Collaborative (N3C) Data Enclave is a centralized, secure, national clinical data resource with powerful analytics capabilities that the research community can use to study COVID-19, including potential risk factors, protective factors and long-term health consequences.

Data Dashboard

This dashboard provides a snapshot of the status of N3C’s data and will be updated frequently.

Image showing N3C key metrics as of November 23, 2020 (Total Patients: 2.1M+; COVID-19 Positive Patients: 292,226; Rows of Data: 2.0B+; and Sites Contributing Data: 72. To learn more about N3C, visit the N3C page on the NCATS website.

Data Types

The N3C systematically and regularly collects data derived from the electronic health records of people who were tested for the novel coronavirus or who had related symptoms, as well as data from individuals infected with pathogens that can support comparative studies, such as SARS 1, MERS and H1N1. The data set includes such information as demographics, symptoms, lab test results, procedures, medications, medical conditions, physical measurements and more. 

Three levels of data are available for analysis:

  • Synthetic data set: artificial, statistically comparable, computational derivative of the original data; it does not contain individually identifiable health information, also known as protected health information (PHI) as defined by HIPAA
  • De-identified data set: patient data that have been stripped of PHI identifiers as defined by HIPAA
  • Limited data set: patient data that include only two of the 18 elements defined as PHI under the HIPAA Privacy Rule (dates of service and patient zip code)

Access Requirements by Data Level

Data Stewardship and Protection

As the steward of the data, NCATS is taking every reasonable precaution to guarantee the confidentiality, security and integrity of the data. NCATS oversees the use of the enclave through user registration, federated login, data use agreements with institutions and data use requests with users. All work must be done within the enclave, and no data may be downloaded. The N3C Data Enclave’s secure, cloud-based environment is certified through the Federal Risk and Authorization Management Program, or FedRAMP, which provides standardized assessment, authorization and continuous monitoring for cloud products and services, ensuring the validity of the data while protecting patient privacy. NCATS monitors the data protections in place on an ongoing basis and may adjust or augment them. See FAQs about privacy and security.

Data Sources and Harmonization

NCATS has established a COVID-19 Data Transfer Agreement (DTA) that provides terms and conditions for data transfer and outlines the general terms of data use. Institutions contributing data sign the DTA, then work with NCATS to transfer a limited data set relevant to COVID-19 in the institution’s preferred common data model (derived from electronic health records) to the N3C Data Enclave on a recurring basis. The N3C data harmonization team ingests the limited data set, runs quality checks and transforms different data models into a harmonized OMOP analytics data set.

See a list of institutions that have executed DTAs with NCATS.

Data Access

N3C data may be used only for COVID-19 research purposes. Before researchers can request access to the data, their institutions must execute a Data Use Agreement (DUA) with NCATS. Once a DUA is in place, researchers can submit a Data Use Request (DUR). This sample DUR form shows the information that will be requested, including the project research title, names of project personnel, a non-confidential research statement, the project proposal and the requested data access level. Additional requirements include reviewing and agreeing to comply with the N3C Data User Code of Conduct. The N3C Data Access Committee reviews and approves DURs. Learn more about how to apply for data access or see FAQs about using the data