N3C Data Overview
The NCATS National COVID Cohort Collaborative (N3C) Data Enclave is a centralized, secure, national clinical data resource with powerful analytics capabilities that the research community can use to study COVID-19.
About N3C Data
Participating institutions release data to N3C under the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Read more about the HIPAA Privacy Rule.
For the most up-to-date information about N3C’s data, visit the N3C Dashboards. The Dashboards are frequently updated and provide a sense of the scope and size of information in the N3C Data Enclave. The resource includes information about participating institutions, research projects and recent publications related to N3C. There also is an interactive public health data browser, which allows you to explore N3C data on COVID-19 and specific topics, including long COVID, reinfection, diabetes, smoking and medication.
The N3C systematically and regularly collects data derived from the electronic health records of people who were tested for COVID-19 or who had related symptoms, as well as data from individuals infected with pathogens that can support comparative studies, such as SARS 1, MERS and H1N1. The data set includes such information as demographics, symptoms, lab test results, procedures, medications, medical conditions, physical measurements and more.
For detailed information about the data elements, download the N3C data dictionary.
NCATS asks medical institutions and health care organizations to contribute this information as a limited data set, pursuant to the requirements in the HIPAA Privacy Rule.
A limited data set is defined as protected health information that excludes certain direct identifiers of an individual or of relatives, employers or household members of the individual — but may include city, state, ZIP code and elements of dates. A limited data set can be disclosed only for purposes of research, public health or health care operations.
Three levels of data are available for analysis:
- Limited Data Set (LDS): Consists of patient data that retain the following protected health information —
- dates of service
- patient ZIP code
- De-identified Data Set: Consists of patient data from the LDS with the following changes —
- Dates of service are algorithmically shifted to protect patient privacy.
- Patient ZIP codes are truncated to the first three digits or removed entirely if the ZIP code represents fewer than 20,000 individuals or represents Tribal lands.
- Synthetic Data Set: Consists of data that are computationally derived from the LDS and that resemble patient information statistically but are not actual patient data.
Access Requirements for Researchers by Data Level
Additional Access Requirements
N3C data may be used only for COVID-19 research purposes. Before researchers can request access to the data, their institutions must execute a Data Use Agreement (DUA) with NCATS. Once a DUA is in place, researchers can submit a Data Use Request (DUR) through the N3C Data Enclave. In the DUR, researchers will need to include, among other information, the project research title, names of project personnel, a non-confidential research statement, the project proposal and the requested data access level. Additional DUR requirements include reviewing and agreeing to comply with the N3C Data User Code of Conduct. The N3C Data Access Committee reviews and approves DURs.
Data Stewardship and Protection
As the steward of the data, NCATS is taking every reasonable precaution to guarantee the confidentiality, security and integrity of the data. NCATS oversees the use of the enclave through user registration, federated login, data use agreements with institutions and data use requests with users. All work must be done within the enclave, and no data may be downloaded. The N3C Data Enclave’s secure, cloud-based environment is certified through the Federal Risk and Authorization Management Program, or FedRAMP, which provides standardized assessment, authorization and continuous monitoring for cloud products and services, ensuring the validity of the data while protecting patient privacy. NCATS monitors the data protections in place on an ongoing basis and may adjust or augment them. Additionally, in conjunction with the U.S. Department of Health and Human Services, NCATS participates in ongoing security testing of multiple aspects of the enclave.
For more information
- Read about N3C’s four pillars of data security – Regulatory and Policy, Privacy Measures, Security Testing and Monitoring, and Researcher Responsibilities.
- Read our FAQs about privacy and security.
Data Sources and Harmonization
NCATS has established a COVID-19 Data Transfer Agreement (DTA) that provides terms and conditions for data transfer and outlines the general terms of data use. Institutions contributing data sign the DTA, then work with NCATS to transfer a limited data set relevant to COVID-19 in the institution’s preferred common data model (derived from electronic health records) to the N3C Data Enclave on a recurring basis. The N3C data harmonization team ingests the limited data set, runs quality checks and transforms different data models into a harmonized OMOP analytics data set. For detailed information about the data set, download the N3C data dictionary.
Privacy Preserving Record Linkage
Privacy Preserving Record Linkage (PPRL) is a means of connecting records using secure, pseudonymization processes in a data set that refer to the same individual across different data sources while maintaining the individuals’ privacy. NCATS is piloting PPRL technology to determine if linking multiple data sets enhances COVID-19 real-word data research in the N3C.
All organizations contributing data to the N3C Data Enclave must have an approved Data Transfer Agreement (DTA). In addition to the DTA, these organizations have the option of signing the Linkage Honest Broker Agreement (LHBA) to participate in the PPRL pilot. The LHBA is an agreement between the organization, NCATS and The Regenstrief Institute, which serves as the linkage honest broker. A linkage honest broker in the PPRL’s infrastructure is a party that holds de-identified tokens and operates a service that matches tokens generated across disparate data sets to formulate a single Match ID for a specific use case. The data remains under the complete control of the organizations that provide data to N3C and is never accessible by or under the control of the linkage honest broker.
PPRL enables three functions within N3C: Deduplication of patient records, linkage of a patient’s records from different sources and cohort discovery. Deduplication is a requirement for any organization that participates in the LHBA because of its importance to the data quality of the N3C Data Enclave and its scientific mission. Organizations participating in the LHBA have the option of participating in linking multiple data sets and cohort discovery as well.