Find answers to frequently asked questions about the National COVID Cohort Collaborative (N3C).
Purpose and Goals
The N3C represents a collaborative vision for a national data resource that will turn data into the knowledge that is urgently needed to address the COVID-19 pandemic. The N3C is systematically collecting data derived from electronic health records from different institutions and harmonizing these data in the NCATS N3C Data Enclave, a centralized resource available for collaborative research.
The N3C is making data available for the clinical and research community to use for studying COVID-19 and for identifying potential treatments as the pandemic continues to evolve. The N3C strives to coordinate and harmonize needed data derived from electronic health records to support efforts in understanding how best to direct research efforts and care of COVID-19 patients. Rapid collection of clinical, laboratory and diagnostic data from hospitals and health care plans during the pandemic will contribute to an understanding of the disease, informing the design of clinical studies and trials, facilitating the identification of effective interventions and informing care decisions. The goal is to aggregate and harmonize enough clinical data on a frequent basis from COVID-19 tested patients, or those with related symptoms, to support data analytics and statistics that require a large amount of data. The N3C aims to include clinical data for patients who represent diverse (e.g., geographic, socioeconomic, racial/ethnic, age, those with underlying medical conditions) populations.
The goals of the N3C are: 1) to create a robust data pipeline to harmonize electronic health records data into a common data model; 2) to make it fast and easy for the clinical and research community to access a wealth of COVID-19 clinical data and use it to research COVID-19 and identify effective interventions as the pandemic continues to evolve; 3) to establish a resource for the next 5 years to understand long-term health impact of COVID-19; and 4) to create a state-of-the-art analytics platform to enable novel analyses that will serve to address COVID-19 as well as to demonstrate that this collaborative analytics approach could be invaluable for addressing other diseases in the future.
The N3C is a partnership among the NCATS-supported Clinical and Translational Science Awards Program hubs and the National Center for Data to Health, with overall stewardship by NCATS. Collaborators contribute and use COVID-19 clinical data to answer critical research questions to address the pandemic.
Additional details about the program are available on the National Center for Data to Health’s N3C website.
Privacy and Security
NCATS is taking multiple precautions for security and privacy to keep these data safe within its protected cloud infrastructure, including role-based access controls and full system log entries; granular host and network level logging; robust end-to-end encryption of all traffic via SSL/TLS, authentication, white-listing mechanisms; and comprehensive auditing of all data processing and access within the cloud platform. The Palantir platform in use resides in Amazon Web Services GovCloud and is Federal Risk and Authorization Management Program authorized at a Moderate impact level. The Data Use Agreement specifies that N3C data will be used only for clinical and translational research and public health surveillance of COVID-19. The Limited Data Set contains demographics including patient zip codes and dates of service. Specific institutions will not be identified, though it might be possible to infer institutional identity. Disclosure of this information is prohibited. Data use is governed by an oversight committee. Users must be “approved” and can only analyze data within the platform; data cannot be removed or downloaded. NCATS monitors the data protections in place on an ongoing basis and may adjust or augment them.
NCATS is taking multiple precautions for security and privacy to keep these data safe within its protected cloud infrastructure:
- The data resides and remains in the NCATS environment. Approved users can analyze data only within the platform.
- All data is encrypted both in transit and at rest, without exception.
- The data only can be used for COVID-19 research related purposes. A Certificate of Confidentiality will protect the privacy of individuals and their data by prohibiting disclosure of identifiable, sensitive research information to anyone not connected to the research except when consent is obtained, or in a few other specific situations.
- NCATS oversees the use of N3C through user registration, federated login, Data Use Agreements with institutions and Data Use Requests with users.
NCATS uses Palantir for its software and expertise in the platform’s execution. Palantir is hosted by NCATS within this instance, and no data can leave this enclave or be accessed by the company for its use. All contractors with access to the NCATS GovCloud instance to implement and maintain the NCATS N3C Data Enclave are subject to all relevant NIH-specified clearances, non-disclosure agreements, training, rules and restrictions. Contractors are not allowed to independently access NCATS N3C Data Enclave data, remove it from the enclave or use it for commercial purposes.
Any data access incident will be reported no later than 2 business days after discovery by the researchers or the Accessing Institution to NCATSDataAccessIncidents@nih.gov. The occurrence of a data access incident may be grounds for termination or suspension of access to data. NCATS may also seek injunctive relief against the Accessing Institution to prevent any disclosure of data to anyone other than NCATS.
Yes. To ensure the safety and security of the data, the NCATS platform is the only place to access and analyze these data.
The NCATS N3C Data Enclave is an Amazon Web Services GovCloud system and is aligned with the following certifications, frameworks and attestations: SSAE18 SOC 2 Type II, ISAE 3000 SOC 2 Type II, Federal Risk and Authorization Management Program Moderate, and the Trusted Information Security Assessment Exchange (in process). The platform software includes several data and information protection functionalities to comply with regulations and industry requirements, such as Health Insurance Portability and Accountability Act, Federal Information Security Management Act, International Security Management Association, California Consumer Privacy Act, Criminal Justice Information Services, Department of Defense, Impact Level 4 and General Data Protection Regulation.
FedRAMP is a U.S. Government-wide program that provides a standardized approach to security assessment, authorization and continuous monitoring for cloud products and services. Documentation and control levels are available on the FedRAMP website.
GovCloud is a service provided by Amazon Web Services designed to host sensitive data and regulated workloads, and address the most stringent U.S. government security and compliance requirements.
Participate in the N3C
Partnerships are welcome. Data provision and data access are open to all entities that execute the NCATS Data Transfer Agreement (DTA) and NCATS Data Use Agreement (DUA), respectively. Contributing data is not required to access the data.
Joining the N3C provides an opportunity to contribute to the national response to the pandemic. Provision of COVID-19 clinical data will allow researchers to better understand the presentation and course of the disease in different populations, including potential health impact over time, to identify best practices for patient care, and to design and prioritize clinical studies and trials.
Researchers affiliated with organizations as well as citizen scientists can join the N3C. See an overview of the process for applying to access N3C data.
NCATS has established a COVID-19 Data Transfer Agreement (DTA) that provides terms and conditions for data transfer and outlines the general terms of data use. Institutions sign the DTA, then work with NCATS to transfer a Limited Data Set relevant to COVID-19 in the institution’s preferred common data model (derived from electronic health records) to the NCATS N3C Data Enclave, a centralized, secure, cloud-based data repository and investigational platform, on a recurring basis.
Note that if a Clinical and Translational Science Awards Program hub has multiple partners, it is likely that each institution will be required to sign separate DTAs, unless the hub can demonstrate that it has the legal authority to use data of the partner institution.
No; under an approved Data Use Agreement (DUA) with NCATS, anyone can access N3C data after receiving approval for their Data Use Request (DUR). Learn more about DUAs and DURs. N3C users can include, but are not limited to, nonprofit or not-for-profit organizations; federal, state and local health departments; researchers from industry; and citizen scientists. Access is dependent on the level of data being requested; human subjects research ethics training and IRB approval may be needed. All approved users must agree to a code of conduct and take NIH IT security training.
The term citizen scientist refers to any member of the public not affiliated with a research organization who may submit an N3C Data Use Request (DUR) outlining a proposed COVID-19 related research project that uses synthetic data. Citizen scientists need to execute a Data Use Agreement (DUA) with NCATS before submitting a DUR. Learn more about applying for data access.
- Clinical and Translational Science Award (CTSA) Program hubs and other entities can contribute data as soon as a Data Transfer Agreement (DTA) has been executed.
- If needed, limited funds are available to current CTSA UL1s under NOT-TR-20-028 Emergency Notice of Special Interest to support personnel and resources for data transfer (awards anticipated to be $50,000 to $100,000 Total Costs).
- NOT-TR-20-028 includes rolling submission of applications; however, as the end of the fiscal year approaches, NCATS must defer some applications for FY21 funding considerations.
- The number of awards is contingent upon NIH appropriations and the submission of a sufficient number of meritorious applications.
- Neither a DTA nor a Data Use Agreement (DUA) need to be in place before submission of an application in response to NOT-TR-20-028; awards will be issued with restrictive terms and conditions that prohibit use of funds for these activities until DTA and/or DUA is executed.
About the Data
The NCATS N3C Data Enclave contains real world data from patients who were tested for COVID-19 or whose symptoms are consistent with COVID-19, as well as data from individuals infected with pathogens such as SARS 1, MERS and H1N1, which can support comparative studies. The data includes information such as demographics, symptoms, lab test results, procedures, medications, medical conditions, physical measurements and more; see the full list of data.
The NCATS N3C Data Enclave is focused only on retrospective electronic health record data.
There are 3 tiers of data available for analysis:
- Synthetic dataset: artificial, statistically-comparable, computational derivative of the original data; it does not contain individually identifiable health information, also known as protected health information (PHI) as defined by the Health Insurance Portability and Accountability Act (HIPAA).
- De-identified dataset: patient data that has been stripped of PHI identifiers as defined by HIPAA.
- Limited Data Set: patient data that includes only two of the 18 elements defined as PHI under the HIPAA Privacy Rule (dates of service and patient zip code).
See an overview of access requirements for different levels of data.
Yes, the N3C aims to be as inclusive as possible.
The data are being provided to NCATS as a Limited Data Set (LDS) that retains only two of 18 elements defined in the Health Insurance Portability and Accountability Act (HIPAA) elements: patient zip code and dates of service. Through a Data Use Request, the N3C will provide access to different levels of data: an algorithmically derived “synthetic” data set, a de-identified data set from the LDS that excludes the two HIPAA elements, or the LDS itself. Institutional Review Board review is required to access the LDS and may be required for the HIPAA de-identified data set.
For more information about LDS and HIPAA regulation, please see —
- U.S. Department of Health and Human Services (HHS) Information on Limited Data Sets
- HHS Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule
- NIH Information on How Covered Entities Can Use and Disclose Protected Health Information for Research and Comply with the Privacy Rule
Because of the national urgency and the need to make this resource available, NCATS encourages interested parties to contact its Office of Strategic Alliances at NCATSPartnerships@mail.nih.gov for instructions on executing the Data Transfer Agreement (DTA). The goal is to start acquiring clinical data immediately and show proof of principle for answering important COVID-19 research and health care questions as soon as possible.
- Step 1: Institutional Review Board (IRB) approval to transfer the Limited Data Set (LDS): John Hopkins University has set up a central IRB (cIRB) that will serve as the reviewing IRB for institutions to transfer data. Using JHU cIRB is optional, and an institution may choose to use a local IRB instead.
- Step 2: Data Acquisition: To assist sites in transferring data, the N3C has written a series of scripts to give to the sites that will pull the data based on the institution's common data model and database. N3C will reach out to the site’s technical team and work with them to configure the transfer process.
- Step 3: Data Harmonization: Once the team is ready to transfer the data, the N3C data harmonization team will set up a secure file transfer protocol site that is specific to the institution. The data harmonization team will ingest the LDS and run quality checks and transform different data models into a harmonized OMOP analytics data set.
- Step 4: Collaborative Analytics: Once the institution has signed the NCATS Data Use Agreement, the investigators can apply to get access to the NCATS N3C Data Enclave. To get access, investigators will submit a Data Use Request. Once reviewed by the NCATS N3C Data Access Committee to ensure appropriateness, investigators will be given access to the collaborative analytics platform. Access to the analytics platform is free of charge and includes training and ongoing support.
Likely, yes. However, the requirement for an IRB review will vary by institution, so please check the local requirements of the institution. Contact Tricia Francis for an approved copy of the N3C protocol to submit to an institution’s IRB.
Reliance on a central IRB is optional but encouraged. The Johns Hopkins Medicine (JHM) IRB is acting as the single-site IRB of record for any organizations providing data to the N3C program that want to use this service. Sites that would like to rely on the JHM IRB must ensure that their institution is enrolled in the SMART IRB platform and has an executed Letter of Indemnification with JHM.
JHM has created a streamlined process to guide sites through onboarding:
- Step 1: Sites that want to be part of the collaboration contact JHM’s Tricia Francis.
- Step 2: JHM will determine whether the site’s institution is part of the SMART IRB and has a Letter of Indemnification in place. If either of these requirements is missing, JHM can assist in fulfilling it.
- Step 3: Once participation in SMART IRB and the Letter of Indemnification are in place, sites will provide JHM with an email expressing willingness to rely on the JHM IRB as the ethics board of record.
- Step 4: JHM will send a tailored letter stating that the institution will cede oversight to the JHM IRB as well as the original IRB approval letter, the JHM-approved protocol, a Health Insurance Portability and Accountability Act (HIPAA) form and a Local Context Questionnaire (LCQ). The LCQ then must be completed and returned.
- Step 5: Once JHM receives the completed LCQ, it will submit the site’s information to its IRB for approval. JHM will provide written approval from the JHM IRB of the site’s participation.
For more details, please contact JHM’s Tricia Francis.
No; however, as cited in Article 11 of the Data Transfer Agreement, an institution may discontinue its participation for any reason. Data from that institution will no longer be accessible for new Data Use Requests.
Synthetic data is artificial, statistically-comparable, computational derivative of the original data. There are multiple reasons that organizations providing data to the enclave cannot submit synthetic data, including:
- Synthetic data is derived from a Limited Data Set (LDS). The process of creating the synthetic data sets takes place within NCATS secure enclave.
- Valuation, harmonization and quality control must be done prior to deriving the synthetic data. All common data model information is raw and requires a significant amount of cleaning (transformation) to be used for analytics. The transformation process cannot be done on synthetically derived data. N3C data includes a validation pipeline that transforms LDS across four different data models to ensure syntactical and semantic harmonization.
Please email specific questions to NCATS_N3C@nih.gov.
Use the Data
No fee will be charged.
Data availability is dependent upon when data are deposited into the platform, the execution of a Data Use Agreement and an approved Data Use Request.
Yes. To ensure the safety and security of the data, the NCATS platform is the only place to access and analyze these data.
No. Data cannot be downloaded from this enclave or be removed based on the stipulations in the current Data Transfer Agreement.
A Data Use Request (DUR) must be submitted for a project being established for the first time. The DUR is project-specific (i.e., if you will be working on separate studies within N3C, you must submit separate DURs). Access to the NCATS N3C Data Enclave workspace(s) for approved DUR(s) will be effective for a period of one year starting from the date access is granted. A Data Use Agreement (DUA) must be in place for the entire term of a DUR. Learn more about DUAs and DURs.
No. Users requesting access to certain tiers of data will need to complete additional steps for the Data Use Request. See an overview of data access requirements. These requirements may change over time as NCATS adjusts and/or augments security and privacy measures in place to keep these data safe within its protected cloud infrastructure.
Data Use Requests (DURs) will be renewable. DURs approved by the Data Access Committee will be effective for one year starting from the date access is granted. When users renew their DURs, they will need to attest at that time that their training for access to the N3C Data Enclave is up to date. A Data Use Agreement must be in place for the entire term of a DUR.
The N3C Data Access Committee (DAC) is composed of federal employees with appropriate scientific, bioethics, information technology and/or human subjects research expertise who review and approve Data Use Requests (DURs). The DAC will:
- Ensure that the DUR is for COVID-19 related research or COVID-related operational activities.
- Assess that the level of access requested is justified based on the research project described and the rationale provided.
- Ensure that the DUR includes an attestation to human subjects training, if necessary, for the level of access requested.
- Ensure that certification of institutional review board (IRB) approval for human subjects research is provided for research that will involve use of the Limited Data Set.
- Ensure that NIH IT Security Training has been attested to (or documentation provided if requested).
- Ensure that human subjects research protections training has been attested to (or documentation provided if requested) for access requests for the HIPAA de-identified dataset or Limited Data Set requests.
Yes, with certain limitations. If the researcher is not affiliated with a U.S. institution, the researcher can only request access to the de-identified and synthetic datasets. All institutions must have a DUA with NCATS and can email NCATS_N3C@mail.nih.gov with questions.
Researchers should contact NCATS at NCATS_N3C@mail.nih.gov to initiate the change in their institution affiliation on their Data Use Request. Their approved N3C project will be held in the NCATS N3C Data Enclave while the change of institutional affiliation is made. If the institution does not have an active Data Use Agreement (DUA), the researcher will need to request the institution execute a DUA with NCATS. See the list of institutions with active DUAs.
Who can address any additional questions?
Please email questions about the N3C to NCATS_N3C@nih.gov.