Read answers to frequently asked questions about the National COVID Cohort Collaborative (N3C).
1. What is the National COVID Cohort Collaborative, or the N3C?
The N3C represents a collaborative vision for a national data resource that will turn data into the knowledge that is urgently needed to address the COVID-19 pandemic. The N3C will systematically and regularly collect data derived from electronic health records from different institutions and harmonize these data in the NCATS N3C Data Enclave, a centralized resource made available for collaborative research.
2. Why is the N3C needed?
The N3C will make data available for the clinical and research community to use for studying COVID-19 and for identifying potential treatments as the pandemic continues to evolve. The N3C strives to coordinate and harmonize needed data derived from electronic health records to support efforts in understanding how best to direct research efforts and care of COVID-19 patients. Rapid collection of clinical, laboratory and diagnostic data from hospitals and health care plans at the peak of the pandemic and as the pandemic evolves will contribute to an understanding of the disease, informing the design of clinical studies and trials, facilitating the identification of effective interventions and informing care decisions. The goal is to aggregate and harmonize enough clinical data on a recurring basis from COVID-19 tested patients, or those with related symptoms, to support data analytics and statistics that require a large amount of data. The N3C aims to include clinical data for patients who represent diverse (e.g., geographic, socioeconomic, racial/ethnic, age, those with underlying medical conditions) populations.
3. What are the goals of the N3C?
The goals of the N3C are: 1) to create a robust data pipeline to harmonize electronic health records data into a common data model; 2) to make it fast and easy for the clinical and research community to access a wealth of COVID-19 clinical data and use it to research COVID-19 and identify effective interventions as the pandemic continues to evolve; 3) to establish a resource for the next 5 years to understand long-term health impact of COVID-19; and 4) to create a state-of-the-art analytics platform to enable novel analyses that will serve to address COVID-19 as well as to demonstrate that this collaborative analytics approach could be invaluable for addressing other diseases in the future.
4. Who is the N3C?
The N3C is a partnership among the NCATS-supported Clinical and Translational Science Awards Program hubs and the National Center for Data to Health, with overall stewardship by NCATS. Collaborators will contribute and use COVID-19 clinical data to answer critical research questions to address the pandemic.
5. Where can interested parties/institutions learn more about the N3C?
Additional details about the program are available on the National Center for Data to Health’s N3C website.
For Clinical and Translational Science Awards (CTSA) Program investigators: Watch the presentation NCATS Director Christopher P. Austin, M.D., gave to the CTSA Program consortium on May 8, 2020 or download the slides.
6. Who can join, and why should they join?
Partnerships are welcome. Data provision and data access are open to all entities that execute the NCATS Data Transfer Agreement and NCATS Data Use Agreement, respectively. You do not have to contribute data to be able to access the data.
Joining the N3C provides an opportunity to contribute to the national response to the pandemic. Provision of COVID-19 clinical data will allow researchers to better understand the presentation and course of the disease in different populations, including potential health impact over time, to identify best practices for patient care, and to design and prioritize clinical studies and trials.
7. What is expected of a participating institution?
NCATS has established a COVID-19 Data Transfer Agreement (DTA) that provides terms and conditions for data transfer and outlines the general terms of data use. Institutions sign the DTA, then work with NCATS to transfer a Limited Data Set relevant to COVID-19 in the institution’s preferred common data model (derived from electronic health records) to the NCATS N3C Data Enclave, a centralized, secure, cloud-based data repository and investigational platform, on a recurring basis.
Note that if a Clinical and Translational Science Awards Program hub has multiple partners, it is likely that each institution will be required to sign separate DTAs, unless the hub can demonstrate that it has the legal authority to use data of the partner institution.
8. What fields are the N3C requesting as part of the Limited Data Set?
9. When can data transfer start?
Because of the national urgency and the need to make this resource available, NCATS encourages interested parties to contact its Office of Strategic Alliances at NCATSPartnerships@mail.nih.gov for instructions on executing the Data Transfer Agreement. The goal is to start acquiring clinical data immediately and show proof of principle for answering important COVID-19 research and health care questions as soon as possible.
10. Who is the point of contact regarding data transfer or technical questions about the platform?
Please email specific questions to NCATS_N3C@nih.gov.
11. What happens after signing the Data Transfer Agreement?
- Step 1: Institutional Review Board (IRB) approval to transfer the Limited Data Set (LDS): John Hopkins University has set up a central IRB (cIRB) that will serve as the reviewing IRB for institutions to transfer data. Using JHU cIRB is optional, and an institution may choose to use a local IRB instead.
- Step 2: Data Acquisition: To assist sites in transferring data, the N3C has written a series of scripts to give to the sites that will pull the data based on the institution's common data model and database. N3C will reach out to the site’s technical team and work with them to configure the transfer process.
- Step 3: Data Harmonization: Once the team is ready to transfer the data, the N3C data harmonization team will set up a secure file transfer protocol site that is specific to the institution. The data harmonization team will ingest the LDS and run quality checks and transform different data models into a harmonized OMOP analytics data set.
- Step 4: Collaborative Analytics: Once the institution has signed the NCATS Data Use Agreement (forthcoming), the investigators can apply to get access to the NCATS N3C Data Enclave. To get access, investigators will submit a Data Use Request. Once reviewed by the NCATS N3C Data Access Committee to ensure appropriateness, investigators will be given access to the collaborative analytics platform. Access to the analytics platform is free of charge and includes training and ongoing support.
12. Can analytics on the data enclave only be done within the NCATS platform?
Yes. To ensure the safety and security of the data, the NCATS platform is the only place to access and analyze these data.
13. Can data be downloaded or removed from the NCATS N3C Data Enclave in any form?
No. Data cannot be downloaded from this enclave or be removed based on the stipulations in the current Data Transfer Agreement.
14. What sorts of compliance and certification does the platform have?
The NCATS N3C Data Enclave is an Amazon Web Services GovCloud system and is aligned with the following certifications, frameworks and attestations: SSAE18 SOC 2 Type II, ISAE 3000 SOC 2 Type II, Federal Risk and Authorization Management Program Moderate, and the Trusted Information Security Assessment Exchange (in process). The platform software includes several data and information protection functionalities to comply with regulations and industry requirements, such as Health Insurance Portability and Accountability Act, Federal Information Security Management Act, International Security Management Association, California Consumer Privacy Act, Criminal Justice Information Services, Department of Defense, Impact Level 4 and General Data Protection Regulation.
15. How does NCATS plan to ensure data security and privacy?
NCATS is taking multiple precautions for security and privacy to keep these data safe within its protected cloud infrastructure, including role-based access controls and full system log entries; granular host and network level logging; robust end-to-end encryption of all traffic via SSL/TLS, authentication, white-listing mechanisms; and comprehensive auditing of all data processing and access within the cloud platform. The Palantir platform in use resides in Amazon Web Services GovCloud and is Federal Risk and Authorization Management Program authorized at a Moderate impact level. The Data Use Agreement specifies that N3C data will be used only for clinical and translational research and public health surveillance of COVID-19. The Limited Sata Set will contain demographics including patient zip codes and dates of service. Specific institutions will not be identified, though it might be possible to infer institutional identity. Disclosure of this information will be prohibited. Data use will be governed by an oversight committee. Users must be “approved” and can only analyze data within the platform; data cannot be removed or downloaded.
16. What is the Federal Risk and Authorization Management Program (FedRAMP)?
FedRAMP is a U.S. Government-wide program that provides a standardized approach to security assessment, authorization and continuous monitoring for cloud products and services. Documentation and control levels are available on the FedRAMP website.
17. I’m worried the data will be used for other purposes. How is NCATS safeguarding against this?
NCATS is taking multiple precautions for security and privacy to keep these data safe within its protected cloud infrastructure:
- The data resides and remains in the NCATS environment. Approved users can analyze data only within the platform.
- All data is encrypted both in transit and at rest, without exception.
- The data only can be used for COVID-19 research related purposes. A Certificate of Confidentiality will protect the privacy of individuals and their data by prohibiting disclosure of identifiable, sensitive research information to anyone not connected to the research except when consent is obtained, or in a few other specific situations.
- NCATS oversees the use of N3C through user registration, federated login, Data Use Agreements with institutions and Data Use Requests with users.
NCATS uses Palantir for its software and expertise in the platform’s execution. Palantir is hosted by NCATS within this instance, and no data can leave this enclave or be accessed by the company for its use. All contractors with access to the NCATS GovCloud instance to implement and maintain the NCATS N3C Data Enclave are subject to all relevant NIH-specified clearances, non-disclosure agreements, training, rules and restrictions. Contractors are not allowed to independently access NCATS N3C Data Enclave data, remove it from the enclave or use it for commercial purposes.
18. How long until the data can be used?
The goal is to start acquiring clinical data immediately and show proof of principle for answering important COVID-19 research and health care questions as soon as possible. Data availability is dependent upon when data are deposited into the platform and the execution of a Data Use Agreement, which is currently under development.
19. What types of data are contained within the NCATS N3C Data Enclave?
The NCATS N3C Data Enclave contains real world data from patients who were tested for COVID-19 or whose symptoms are consistent with COVID-19, as well as data from individuals infected with pathogens such as SARS 1, MERS and H1N1, which can support comparative studies. The data includes information such as demographics, symptoms, lab test results, procedures, medications, medical conditions, physical measurements and more; see the full list of data.
The NCATS N3C Data Enclave receives data as a Limited Data Set that includes only two of the 18 Health Insurance Portability and Accountability Act (HIPAA) defined elements: patient zip code and dates of service. The N3C can also generate synthetic data, which is an artificial, statistically-comparable, computational derivative of the original data. Synthetic data does not contain personal health information as defined under HIPAA.
The NCATS N3C Data Enclave is focused only on retrospective electronic health record data.
20. Will the data set include clinical information about various populations (e.g., children, the elderly, patients who represent racial and ethnic minority populations)?
Yes, the N3C aims to be as inclusive as possible.
21. Why can’t synthetic data be submitted to the NCATS N3C Data Enclave?
Synthetic data is artificial, statistically-comparable, computational derivative of the original data. There are multiple reasons that organizations providing data to the enclave cannot submit synthetic data, including:
- Synthetic data is derived from a Limited Data Set (LDS). The process of creating the synthetic data sets takes place within NCATS secure enclave.
- Valuation, harmonization and quality control must be done prior to deriving the synthetic data. All common data model information is raw and requires a significant amount of cleaning (transformation) to be used for analytics. The transformation process cannot be done on synthetically derived data. N3C data includes a validation pipeline that transforms LDS across four different data models to ensure syntactical and semantic harmonization.
- Synthetic data has not been validated. Currently, synthetic data is being validated through a pilot that includes Washington University in St. Louis, University of Indiana and University of Washington. The purpose of the pilot is to establish whether the LDS data can be fully de-identified and whether the synthetic data derivative is statistically sound and can be used to accurately derive results.
22. What funding opportunities are available to support researchers’ participation?
- Clinical and Translational Science Award (CTSA) Program hubs and other entities can contribute data as soon as a Data Transfer Agreement (DTA) has been executed.
- If needed, limited funds are available to current CTSA UL1s under NOT-TR-20-028 Emergency Notice of Special Interest to support personnel and resources for data transfer (awards anticipated to be $50,000 to $100,000 Total Costs).
- NOT-TR-20-028 includes rolling submission of applications; however, as the end of the fiscal year approaches, NCATS must defer some applications for FY21 funding considerations.
- The number of awards is contingent upon NIH appropriations and the submission of a sufficient number of meritorious applications.
- Neither a DTA nor a Data Use Agreement (DUA) need to be in place before submission of an application in response to NOT-TR-20-028; awards will be issued with restrictive terms and conditions that prohibit use of funds for these activities until DTA and/or DUA is executed.
23. Is Institutional Review Board (IRB) review needed for providing data to N3C?
Likely, yes. However, the requirement for an IRB review will vary by institution, so please check the local requirements of the institution. Contact Tricia Francis for an approved copy of the N3C protocol to submit to an institution’s IRB.
Reliance on a central IRB is optional but encouraged. The Johns Hopkins Medicine (JHM) IRB is acting as the single-site IRB of record for any organizations providing data to the N3C program that want to use this service. Sites that would like to rely on the JHM IRB must ensure that their institution is enrolled in the SMART IRB platform and has an executed Letter of Indemnification with JHM.
JHM has created a streamlined process to guide sites through onboarding:
- Step 1: Sites that want to be part of the collaboration contact JHM’s Tricia Francis.
- Step 2: JHM will determine whether the site’s institution is part of the SMART IRB and has a Letter of Indemnification in place. If either of these requirements is missing, JHM can assist in fulfilling it.
- Step 3: Once participation in SMART IRB and the Letter of Indemnification are in place, sites will provide JHM with an email expressing willingness to rely on the JHM IRB as the ethics board of record.
- Step 4: JHM will send a tailored letter stating that the institution will cede oversight to the JHM IRB as well as the original IRB approval letter, the JHM-approved protocol, a Health Insurance Portability and Accountability Act (HIPAA) form and a Local Context Questionnaire (LCQ). The LCQ then must be completed and returned.
- Step 5: Once JHM receives the completed LCQ, it will submit the site’s information to its IRB for approval. JHM will provide written approval from the JHM IRB of the site’s participation.
For more details, please contact JHM’s Tricia Francis.
24. Will data access be limited only to those who contributed data?
No; under an approved Data Use Agreement with NCATS, anyone can access N3C data after receiving approval for their Data Use Request. N3C users can include, but are not limited to, nonprofit or not-for-profit organizations; federal, state and local health departments; researchers from industry; and citizen scientists. Access is dependent on the level of data being requested, and IRB approval may be needed. All approved users must agree to a code of conduct and take NIH IT security training and human subjects research ethics training.
25. Will there be a fee to access the N3C data?
No fee will be charged.
26. Can an organization that contributed data request to have it removed?
No; however, as cited in Article 11 of the Data Transfer Agreement, an institution may discontinue its participation for any reason. Data from that institution will no longer be accessible for new Data Use Requests.
27. Are the data being sent to N3C expected to be de-identified? If not, will NCATS de-identify the data?
The data are being provided to NCATS as a Limited Data Set (LDS) that retains only two of 18 elements defined in the Health Insurance Portability and Accountability Act (HIPAA) elements: patient zip code and dates of service. Through a Data Use Request, the N3C will provide access to different levels of data: an algorithmically derived “synthetic” data set, a de-identified data set from the LDS that excludes the two HIPAA elements, or the LDS itself. Institutional Review Board review is required to access the LDS and may be required for the HIPAA de-identified data set.
For more information about LDS and HIPAA regulation, please see —
- U.S. Department of Health and Human Services (HHS) Information on Limited Data Sets
- HHS Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule
- NIH Information on How Covered Entities Can Use and Disclose Protected Health Information for Research and Comply with the Privacy Rule
28. Who can address any additional questions?
Please email questions about the N3C to NCATS_N3C@nih.gov.