N3C Data Overview

The NCATS National COVID Cohort Collaborative (N3C) Data Enclave is a centralized, secure, national clinical data resource with powerful analytics capabilities that the research community can use to study COVID-19, including potential risk factors, protective factors and long-term health consequences.

Data Dashboard

This dashboard provides a snapshot of the status of N3C’s data and will be updated frequently.

Image showing N3C key metrics as of July 12, 2021 (Total Patients: 6.3M+; COVID-19 Positive Patients: 2.1M+; Rows of Data: 7.2B+; and Approved Projects: 224. To learn more about N3C, visit the N3C page on the NCATS website.)

Data Types

The N3C systematically and regularly collects data derived from the electronic health records of people who were tested for the novel coronavirus or who had related symptoms, as well as data from individuals infected with pathogens that can support comparative studies, such as SARS 1, MERS and H1N1. The data set includes such information as demographics, symptoms, lab test results, procedures, medications, medical conditions, physical measurements and more. 

NCATS asks medical institutions and health care organizations to contribute a limited data set, pursuant to the requirements in the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.

A limited data set is defined as protected health information that excludes certain direct identifiers of an individual or of relatives, employers or household members of the individual — but may include city, state, ZIP code and elements of dates. A limited data set can be disclosed only for purposes of research, public health or health care operations.

Three levels of data are available for analysis:

  • Limited Data Set (LDS): Consists of patient data that retain the following protected health information —
    • dates of service
    • patient ZIP code
  • De-identified Data Set: Consists of patient data from the LDS with the following changes —
    • Dates of service are algorithmically shifted to protect patient privacy.
    • Patient ZIP codes are truncated to the first three digits or removed entirely if the ZIP code represents fewer than 20,000 individuals.
  • Synthetic Data Set: Consists of data that are computationally derived from the LDS and that resemble patient information statistically but are not actual patient data.

Access Requirements by Data Level

Data Stewardship and Protection

As the steward of the data, NCATS is taking every reasonable precaution to guarantee the confidentiality, security and integrity of the data. NCATS oversees the use of the enclave through user registration, federated login, data use agreements with institutions and data use requests with users. All work must be done within the enclave, and no data may be downloaded. The N3C Data Enclave’s secure, cloud-based environment is certified through the Federal Risk and Authorization Management Program, or FedRAMP, which provides standardized assessment, authorization and continuous monitoring for cloud products and services, ensuring the validity of the data while protecting patient privacy. NCATS monitors the data protections in place on an ongoing basis and may adjust or augment them. Additionally, in conjunction with the U.S. Department of Health and Human Services, NCATS participates in ongoing security testing of multiple aspects of the enclave. See FAQs about privacy and security.

Data Sources and Harmonization

NCATS has established a COVID-19 Data Transfer Agreement (DTA) that provides terms and conditions for data transfer and outlines the general terms of data use. Institutions contributing data sign the DTA, then work with NCATS to transfer a limited data set relevant to COVID-19 in the institution’s preferred common data model (derived from electronic health records) to the N3C Data Enclave on a recurring basis. The N3C data harmonization team ingests the limited data set, runs quality checks and transforms different data models into a harmonized OMOP analytics data set.

See a list of institutions that have executed DTAs with NCATS.

Data Access

N3C data may be used only for COVID-19 research purposes. Before researchers can request access to the data, their institutions must execute a Data Use Agreement (DUA) with NCATS. Once a DUA is in place, researchers can submit a Data Use Request (DUR) through the N3C Data Enclave. In the DUR, researchers will need to include, among other information, the project research title, names of project personnel, a non-confidential research statement, the project proposal and the requested data access level. Additional DUR requirements include reviewing and agreeing to comply with the N3C Data User Code of Conduct. The N3C Data Access Committee reviews and approves DURs. Learn more about how to apply for data access or see FAQs about using the data