About the National COVID Cohort Collaborative

Updates from N3C

National COVID Cohort Collaborative (N3C) Data Enclave

  • More than 5 million COVID-19-positive patients and more than 14 billion rows of data are included in this enclave. Apply for N3C access today.
  • The public-facing N3C Cohort Exploration Dashboard provides high-level information about the N3C cohort and N3C Data Enclave.
  • More than 300 projects are underway using the enclave data to examine associations between COVID-19 patient outcomes and social determinants of health. View the current list of publications.
  • The N3C Data Enclave has a library of more than 30 external data sets, including mortality, viral variance and environmental data, that can be linked to the clinical data.

 

The National COVID Cohort Collaborative (N3C) maintains one of the largest collections of clinical data related to COVID-19 symptoms and patient outcomes in the United States. With stewardship from NCATS, more than 70 institutions worked together to build this extensive database. Having access to a large, centralized data resource allows research teams to study COVID-19 and identify potential treatments as the pandemic evolves. Learn the facts about the N3C by downloading and sharing the N3C fact sheet (PDF - 382KB).

On this page:

What is the N3C?

The N3C is a partnership among many organizations to provide clinical data in close to real time to improve our knowledge of COVID-19 and potential treatment strategies.

The N3C effort is centered on the following:

  1. Establishing a secure data repository (the N3C Data Enclave) for studying COVID-19-related data.
  2. Receiving existing patient data derived from electronic health records (EHRs) provided by participating U.S. health care sites.
  3. Providing operational support for researchers using the N3C Data Enclave to navigate in the N3C platform and collaborate on COVID-19 research.
  4. Ensuring research using the N3C Data Enclave follows the rules and expectations of NCATS and its partners to keep the data secure and protect patient privacy.

Since September 2020, the N3C has made data accessible to more than 3,000 researchers and clinicians to study the progression of COVID-19, identify risk and protective factors, search for effective treatments, understand the long-term disease effects, and determine how best to care for those with the disease.

N3C partners include the following:

  • Health care providers that provide the data in the N3C, including NCATS Clinical and Translational Science Awards (CTSA) Program hubs and the institutions supported by the NIH National Institute of General Medical Sciences’ Institutional Development Award Networks for Clinical and Translational Research (IDeA-CTR).
  • The National Center for Data to Health (CD2H), which guides and governs the collaborative science environment within the N3C and also serves as the CTSA Program’s informatics coordinating center.
  • NCATS, which provides governance, oversight and the secure research platform — the N3C Data Enclave — to maintain and protect the data.
  • The scientific community and research leaders with data science and clinical expertise who harmonize data so that it can be studied together and compared across the nation.

Learn more about key partners:

How does the N3C improve public health?

Identifying the Public Health Need

Since the beginning of the pandemic, health care providers and researchers have worked diligently to understand the novel SARS-CoV-2 virus and the disease it causes, COVID-19. Our understanding of COVID-19 — its signs, symptoms and effective courses of treatment — at the start of the pandemic was very limited. As this disease spread rapidly through communities, cities and countries, many clinics and hospitals collected important patient health information. However, when data are collected at different institutions and in different formats, it is difficult to put them together in a way that helps researchers and doctors understand the characteristics of the disease.

The medical and research communities urgently need large amounts of data to better understand COVID-19, including how it spreads, who is most at risk, which treatments help, and what the effects of the disease are, including long-term effects.

Making COVID-19 Data More Available for Research

In response to this urgent need, NCATS and its partners developed the N3C to collect existing EHR data from hospitals and clinics and to make these data available to researchers seeking to understand COVID-19.

The N3C receives patient information from more than 60 health care institutions across the country. NCATS harmonizes data from these institutions into a single format and makes them available for researchers and clinicians inside the N3C Data Enclave so they can study COVID-19 and potential treatments as the pandemic evolves. The N3C Data Enclave is a secure, cloud-based research environment with a powerful analytics platform provided by NCATS, which serves as the steward of N3C’s data. Data cannot be removed from the N3C Data Enclave.

Since the N3C Data Enclave opened to researchers in September 2020, researchers have used the data to improve our understanding of COVID-19 and health equity, diabetes, cancer, COVID-19 medications and chronic obstructive pulmonary disease. Researchers currently are studying HIV and COVID-19 risk, mortality rates in rural populations, long COVID and much more using the N3C Data Enclave.

Learn more:

What data does the N3C have and where does it come from?

The N3C's data come from existing patient records at participating institutions. The N3C receives data derived from EHRs of people who were tested for COVID-19 or who had related symptoms. EHRs include such information as age, sex, height and weight, medical history, lab results, health issues, medications, and treatments.

Participating partners and other collaborators provide data to the N3C after they execute a Data Transfer Agreement with NCATS. The N3C harmonizes the data and manages it in a way that maintains the data’s validity while protecting patient privacy.

Participating institutions do not obtain consent from individual patients for the data they send to the N3C. The 1996 Health Insurance Portability and Accountability Act (HIPAA) allows medical and health care institutions to release data for research without obtaining an individual’s authorization if direct identifying information is removed and appropriate oversight and agreements are in place.

Under the HIPAA Privacy Rule requirements, these institutions can release what is called a limited data set. This is what participating health sites send to the N3C.

The data set is “limited” because it leaves out 16 types of direct identifying information about the patient and their relatives, employers, or household members — such as names, account numbers, telephone numbers, email addresses and social security numbers. A limited data set may include city, state, ZIP code and elements of dates. The limited data set that the N3C receives includes ZIP codes and dates of service because these are critical for tracking the progress of the pandemic over time and place.

The N3C does not contain direct identifying information, and additional measures have been put in place to protect patient privacy. As a result, NCATS received a waiver of consent from an NIH Institutional Review Board, conforming to the Federal Policy for the Protection of Human Subjects (“Common Rule”).

Learn more:

How does the N3C keep data secure and protect patient privacy?

NCATS knows that the data it receives represent people and, as the steward of the data, NCATS takes its responsibility for keeping those data safe very seriously. NCATS has taken a comprehensive approach to address the security of the N3C Data Enclave and to protect patient privacy. It has invested significant time, resources and effort to keep N3C data private and secure.

NCATS follows all applicable policies and regulations, has integrated key privacy measures into the N3C Data Enclave and its governance processes, and performs security testing and monitoring of activity inside the N3C Data Enclave. It also requires researchers to, among other things, adhere to a code of conduct, sign an agreement with NCATS outlining terms and conditions for using the data, and take NIH information technology security training.

The table below — showing the N3C’s four pillars of data protection — provides additional detail about the steps NCATS takes to keep data secure and protect patient privacy.

Regulatory and Policy

  • Data-contributing sites abide by the HIPAA Privacy Rule
  • N3C research is subject to the Federal Policy for the Protection of Human Subjects in research ("Common Rule")
  • Data are provided as a HIPAA-defined limited data set
  • NIH IRB oversight and waiver of consent
  • For COVID-19–related research only
  • No genomic data
  • No emergency public health authorities were used to obtain the data under these conditions

Privacy Measures

  • Certificate of Confidentiality
  • Data stay within the N3C Data Enclave: No download or capture of raw data
  • Privacy Impact Assessment
  • Review of project requests by the Data Access Committee
  • Additional Tribal data privacy measures (while seeking a consulation with Tribal Nations)

Security Testing and Monitoring

  • Federal government–compliant enclave managed by NCATS
  • Meets government security controls for cloud security and privacy
  • Data encryption in transit and at rest, without exception
  • Scheduled penetration testing
  • Active monitoring and logging by NIH and HHS
  • Auditing of activities in the N3C Data Enclave

Researcher Responsibilities

  • A user's organization signs a Data Use Agreement with NCATS for terms and conditions of use
  • Users adhere to the N3C Data User Code of Conduct
  • Required NIH IT security training
  • Required Human Subjects Research Protection training
  • Follow N3C’s Community Guiding Principles

Learn more:

If you have questions about the N3C, please email NCATS_N3C@nih.gov.