N3C COVID Enclave Data Overview
The N3C COVID Data Enclave is a centralized, secure, national clinical data resource with powerful analytics capabilities that the research community can use to study COVID-19.
About N3C COVID Enclave Data
Participating institutions release data to N3C for the COVID Enclave under the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Read more about the HIPAA Privacy Rule.
Data Types
The N3C COVID Enclave systematically and regularly collects data derived from the electronic health records of people who were tested for COVID-19 or who had related symptoms, as well as data from individuals infected with pathogens that can support comparative studies, such as SARS 1, MERS and H1N1. The data set includes such information as demographics, symptoms, lab test results, procedures, medications, medical conditions, physical measurements and more.
For detailed information about the data elements, download the N3C data dictionary (PDF - 439KB).
NCATS asks medical institutions and health care organizations to contribute this information as a limited data set, pursuant to the requirements in the HIPAA Privacy Rule.
A limited data set is defined as protected health information that excludes certain direct identifiers of an individual or of relatives, employers or household members of the individual — but may include city, state, ZIP code and elements of dates. A limited data set can be disclosed only for purposes of research, public health or health care operations.
Three levels of data are available for analysis:
- Limited Data Set (LDS): Consists of patient data that retain the following protected health information —
- dates of service
- patient ZIP code
- De-identified Data Set: Consists of patient data from the LDS with the following changes —
- Dates of service are algorithmically shifted to protect patient privacy.
- Patient ZIP codes are truncated to the first three digits or removed entirely if the ZIP code represents fewer than 20,000 individuals or represents Tribal lands.
- Synthetic Data Set: Consists of data that are computationally derived from the LDS and that resemble patient information statistically but are not actual patient data.
Access Requirements for Researchers by Data Level
Additional Access Requirements
N3C COVID Enclave data may be used only for COVID-19 research purposes. Before researchers can request access to the data, their institutions must execute a Data Use Agreement (DUA) (PDF - 826KB) with NCATS. Once a DUA is in place, researchers can submit a Data Use Request (DUR) through the N3C COVID Enclave. In the DUR, researchers will need to include, among other information, the project research title, names of project personnel, a non-confidential research statement, the project proposal and the requested data access level. Additional DUR requirements include reviewing and agreeing to comply with the N3C Data User Code of Conduct. The N3C Data Access Committee reviews and approves DURs.
Learn more about how to apply for data access or see FAQs about using the data.
Data Stewardship and Protection
As the steward of the data, NCATS is taking every reasonable precaution to guarantee the confidentiality, security and integrity of the data. NCATS oversees the use of the enclave through user registration, federated login, data use agreements with institutions and data use requests with users. All work must be done within the enclave, and no data may be downloaded. The N3C COVID Enclave’s secure, cloud-based environment is certified through the Federal Risk and Authorization Management Program, or FedRAMP, which provides standardized assessment, authorization and continuous monitoring for cloud products and services, ensuring the validity of the data while protecting patient privacy. NCATS monitors the data protections in place on an ongoing basis and may adjust or augment them. Additionally, in conjunction with the U.S. Department of Health and Human Services, NCATS participates in ongoing security testing of multiple aspects of the enclave.
For more information
- Read about N3C’s four pillars of data security – Regulatory and Policy, Privacy Measures, Security Testing and Monitoring, and Researcher Responsibilities.
- Read our FAQs about privacy and security.
Data Sources and Harmonization
NCATS has established a COVID-19 Data Transfer Agreement (DTA) (PDF - 139KB) that provides terms and conditions for data transfer and outlines the general terms of data use. Institutions contributing data sign the DTA, then work with NCATS to transfer a limited data set relevant to COVID-19 in the institution’s preferred common data model (derived from electronic health records) to the N3C COVID Enclave on a recurring basis. The N3C data harmonization team ingests the limited data set, runs quality checks and transforms different data models into a harmonized OMOP analytics data set.
See a list of institutions that have executed DTAs with NCATS.
Privacy Preserving Record Linkage
Privacy Preserving Record Linkage (PPRL) is a means of connecting records using secure, pseudonymization processes in a data set that refer to the same individual across different data sources while maintaining the individuals’ privacy. Linking multiple data sets enhances COVID-19 real-word data research in an N3C enclave.
All organizations contributing data to the N3C COVID Enclave must have an approved Data Transfer Agreement (DTA). In addition to the DTA, these organizations have the option of signing the Linkage Honest Broker Agreement (LHBA) (PDF - 1.1MB). The LHBA is an agreement between the organization, NCATS and The Regenstrief Institute, which serves as the linkage honest broker. The data remains under the complete control of the organizations that provide data to N3C and is never accessible by or under the control of the linkage honest broker.
Learn how N3C’s PPRL initiative enables data connectivity while maintaining security.
Read frequently asked questions about PPRL and the LHBA.
Watch a demonstration of N3C’s PPRL initiative (video length: 0:51).