N3C COVID Enclave Frequently Asked Questions
Find answers to frequently asked questions about this data enclave available for COVID-19 research.
N3C COVID Enclave Frequently Asked Questions
Privacy and Security
How does NCATS plan to ensure data security and privacy?
NCATS is taking multiple precautions for security and privacy to keep these data safe within its protected cloud infrastructure, including participation in ongoing security testing in conjunction with the U.S. Department of Health and Human Services; role-based access controls and full system log entries; granular host and network level logging; robust end-to-end encryption of all traffic via SSL/TLS, authentication, white-listing mechanisms; and comprehensive auditing of all data processing and access within the cloud platform. The Palantir platform in use resides in Amazon Web Services GovCloud and is Federal Risk and Authorization Management Program authorized at a Moderate impact level. The Data Use Agreement (PDF - 826KB) specifies that N3C COVID Enclave data will be used only for clinical and translational research and public health surveillance of COVID-19. The Limited Data Set contains demographics including patient zip codes and dates of service. Specific institutions will not be identified, though it might be possible to infer institutional identity. Disclosure of this information is prohibited. Data use is governed by an oversight committee. Users must be “approved” and can only analyze data within the platform; data cannot be removed or downloaded. NCATS monitors the data protections in place on an ongoing basis and may adjust or augment them.
Learn more about how NCATS protects data, including regulatory and policy protections, privacy measures, security testing and monitoring, and researcher responsibilities.
How is data access restricted?
The NIH maintains strict control of the NCATS N3C COVID Enclave and makes all administrative determinations regarding user access and data permissions. NIH administrators can restrict access to any data set stored in the enclave, down to the row or column level. The N3C COVID Enclave’s access control framework also enables NIH administrators to secure information at the level of data sets and derivative analytical work products, and administrators also assign specific degrees of access for different user groups. Data access tiers have been determined by the NIH and the central Institutional Review Board and are implemented within the N3C COVID Enclave.
I’m worried the data will be used for other purposes. How is NCATS safeguarding against this?
NCATS is taking multiple precautions for security and privacy to keep these data safe within its protected cloud infrastructure:
- The data resides and remains in the NCATS environment. Approved users can analyze data only within the platform. No data from the EHRs may be downloaded. Users are allowed to download research results after they are reviewed by a data download committee.
- All data is encrypted both in transit and at rest, without exception.
- The data only can be used for COVID-19 research related purposes. A Certificate of Confidentiality will protect the privacy of individuals and their data by prohibiting disclosure of identifiable, sensitive research information to anyone not connected to the research except when consent is obtained, or in a few other specific situations.
- NCATS oversees the use of N3C through user registration, federated login, Data Use Agreements (PDF - 826KB) with institutions and Data Use Requests with users.
NCATS uses Palantir for its software and expertise in the platform’s execution. Palantir is hosted by NCATS within this instance, and no data can leave this enclave or be accessed by the company for its use. All contractors with access to the NCATS GovCloud instance to implement and maintain the NCATS COVID Enclave are subject to all relevant NIH-specified clearances, non-disclosure agreements, training, rules and restrictions. Contractors are not allowed to independently access N3C COVID Enclave data, remove it from the enclave or use it for commercial purposes.
Does the Data Use Agreement prohibit the identification of the care providers?
Yes. The Data Use Agreement and N3C Data User Code of Conduct prohibit the re-identification of individuals, individual providers and sites of care.
How will data access incidents be handled?
Any data access incident will be reported by email no later than 2 business days after discovery by the researchers or the Accessing Institution. The occurrence of a data access incident may be grounds for termination or suspension of access to data. NCATS may also seek injunctive relief against the Accessing Institution to prevent any disclosure of data to anyone other than NCATS.
Can analytics run on data in the N3C COVID Enclave only be done within the NCATS platform?
Yes. To ensure the safety and security of the data, the NCATS platform is the only place to access and analyze these data.
What sorts of compliance and certification does the platform have?
The NCATS N3C COVID Enclave is an Amazon Web Services GovCloud system and is aligned with the following certifications, frameworks and attestations: SSAE18 SOC 2 Type II, ISAE 3000 SOC 2 Type II, Federal Risk and Authorization Management Program Moderate, and the Trusted Information Security Assessment Exchange (in process). The platform software includes several data and information protection functionalities to comply with regulations and industry requirements, such as Health Insurance Portability and Accountability Act, Federal Information Security Management Act, International Security Management Association, California Consumer Privacy Act, Criminal Justice Information Services, Department of Defense, Impact Level 4 and General Data Protection Regulation.
What is the Federal Risk and Authorization Management Program (FedRAMP)?
FedRAMP is a U.S. Government-wide program that provides a standardized approach to security assessment, authorization and continuous monitoring for cloud products and services. Documentation and control levels are available on the FedRAMP website.
What is GovCloud?
GovCloud is a service provided by Amazon Web Services designed to host sensitive data and regulated workloads and address the most stringent U.S. government security and compliance requirements.
Participate in the N3C
Who can join, and why should they join?
Partnerships are welcome. Data provision and data access are open to all entities that execute the NCATS Data Transfer Agreement (DTA) (PDF - 139KB) and NCATS Data Use Agreement (DUA) (PDF -826KB), respectively. Contributing data is not required to access the data.
Provision of COVID-19 clinical data will allow researchers to better understand the presentation and course of the disease in different populations, including potential health impact over time, to identify best practices for patient care, and to design and prioritize clinical studies and trials.
See a list of institutions that have executed a DTA.
Researchers affiliated with organizations as well as citizen scientists can join the N3C. See an overview of the process for applying to access N3C COVID Enclave data.
What is the difference between the Data Transfer Agreement and the Data Use Agreement?
The Data Transfer Agreement (DTA) (PDF - 139KB) is a legal agreement between the contributing institution’s business official and NCATS that outlines how contributing institutions will transfer patient data to the N3C COVID Enclave.
The Data Use Agreement (DUA) (PDF - 826KB) is an umbrella agreement between the institution and NCATS that outlines the terms and conditions of how the institution’s users will access the patient data from the N3C COVID Enclave. The institution does not need to sign a DTA to sign a DUA.
What is expected of a participating institution interested in contributing data?
NCATS has established a COVID-19 Data Transfer Agreement (DTA) (PDF - 139KB) that provides terms and conditions for data transfer and outlines the general terms of data use. Institutions sign the DTA, then work with NCATS to transfer a Limited Data Set relevant to COVID-19 in the institution’s preferred common data model (derived from electronic health records) to the NCATS N3C COVID Enclave, a centralized, secure, cloud-based data repository and investigational platform, on a recurring basis.
Note that if a Clinical and Translational Science Awards Program institution has multiple partners, it is likely that each institution will be required to sign a separate DTA, unless the hub can demonstrate that it has the legal authority to use data of the partner institution.
Is data access limited only to institutions that contributed data?
No; under an approved Data Use Agreement (DUA) with NCATS, anyone can access N3C COVID Enclave data after receiving approval for their Data Use Request (DUR). Learn more about DUAs and DURs. N3C users can include, but are not limited to, nonprofit or not-for-profit organizations; federal, state and local health departments; researchers from industry; and citizen scientists. Access is dependent on the level of data being requested; human subjects research ethics training and IRB approval may be needed. All approved users must agree to a code of conduct and take NIH IT security training.
Does an institution sign a Data Use Agreement for each research project conducted at that institution?
No. The Data Use Agreement (PDF - 826KB) is signed once by the institution’s authorized signatory and covers all users at the institution who will access data from the N3C COVID Enclave. Individual users, however, must submit project-specific Data Use Requests (DURs) that will be reviewed and approved by the Data Access Committee. For more information about DURs, see the Use the Data FAQs.
Can the formal agreements for using and contributing N3C COVID Enclave data be extended?
Yes. The initial data transfer, data use and data linkage agreements will begin to expire in 2025. Institutions may extend their agreements via amendment through Sept. 30, 2029. Researchers who access N3C COVID Enclave data will need to check that their parent organization has obtained an N3C data use agreement extension. Access the extension amendment forms from the Forms and Resources page.
About the Data
What types of data are contained within the NCATS N3C COVID Enclave?
The NCATS N3C COVID Enclave contains real world data from patients who were tested for COVID-19 or whose symptoms are consistent with COVID-19, as well as data from individuals infected with pathogens such as SARS 1, MERS and H1N1, which can support comparative studies. The data includes information such as demographics, symptoms, lab test results, procedures, medications, medical conditions, physical measurements and more. The NCATS N3C COVID Enclave is focused only on retrospective electronic health record data.
For a detailed list of the data N3C requests from participating institutions, download the N3C’s data dictionary (PDF - 439KB).
What levels of data are available for analysis?
There are 3 tiers of data available for analysis:
- Limited Data Set (LDS): Consists of patient data that retain the following protected health information —
- Dates of service
- Patient ZIP code
- De-identified Data Set: Consists of patient data from the LDS with the following changes —
- Dates of service are algorithmically shifted to protect patient privacy.
- Patient ZIP codes are truncated to the first three digits or removed entirely if the ZIP code represents fewer than 20,000 individuals.
- Synthetic Data Set: Consists of data that are computationally derived from the LDS and that resemble patient information statistically but are not actual patient data.
See an overview of access requirements for different levels of data.
How are these data collected?
Participants are not recruited for the N3C. Rather, contributing sites provide existing data derived from the electronic health records (EHRs) of people who were tested for COVID-19 or who had related symptoms. EHRs are digital, machine readable versions of patients’ paper charts. They contain clinical information such as medical history, diagnoses, demographics, immunization records, lab results, medications, and more. EHRs also contain data that may identify a person, also known as protected health information.
Under the 1996 Health Insurance Portability and Accountability Act (HIPAA), covered entities—such as health care providers—may release data for research without obtaining an individual’s authorization if direct identifying information are removed and appropriate oversight and agreements are in place. Under the HIPAA privacy regulations for a Limited Data Set, de-identified health information may be used and disclosed for research purposes. The N3C received a waiver of consent from the NIH Institutional Review Board, and NIH is taking care to ensure the highest privacy and security requirements are met and adhered to for housing and protecting these data in the NIH-managed N3C COVID Enclave.
How has N3C managed records identified as American Indian and Alaska Native (AI/AN)?
From the establishment of the N3C, NCATS obscured AI/AN data in the platform and worked with the NIH Tribal Health Research Office (THRO) to seek Tribal Consultation on whether and how to make AI/AN data accessible to researchers. NCATS released its Tribal Consultation report (PDF - 1.3MB) in July 2022, summarizing the input received from Tribal Leaders and the Center’s responses. Having consulted with Tribal leaders, NCATS removed additional privacy measures placed on AI/AN data in the N3C COVID Enclave so that the benefits of the data can be realized for individuals identified as AI/AN. Implications and protections provided for AI/AN data are outlined in the Tribal Consultation Report and N3C’s updated four pillars of data protection.
Do records identified as AI/AN include Tribal affiliation?
No. The race and ethnicity information collected only includes AI/AN as a category — it does not include Tribal affiliation.
Why did NCATS make the decision to obscure the AI/AN data, given that these data could have been used to better understand the impact of the pandemic on Tribal communities?
In alignment with the National Institutes of Health Guidance on the Implementation of the HHS Tribal Consultation Policy, NCATS sought input from Tribal Nations on whether and how to provide AI/AN data within the N3C. For this reason, NCATS did not make this data accessible until after it had engaged with Tribal Leaders and completed a Tribal Consultation. Now that the Tribal Consultation has been conducted, AI/AN data is available for research in a manner that reflects the input that NIH received through the consultation process. The Tribal Consultation report (PDF - 1.3MB) details how this data is now being made available.
Will the data set include clinical information about various populations (e.g., children, the elderly, patients who represent racial and ethnic minority populations)?
Yes, the N3C aims to be as inclusive as possible. Contributing sites provide race and ethnicity data to the N3C COVID Enclave as part of a data set collected from patient electronic health records. Race and ethnicity information from health care settings is self-identified.
Are the data being sent to the N3C COVID Enclave expected to be de-identified? If not, will NCATS de-identify the data?
NCATS asks medical institutions and health care organizations to contribute a limited data set pursuant to the requirements in the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.
A limited data set is defined as protected health information that excludes certain direct identifiers of an individual or of relatives, employers or household members of the individual — but may include city, state, ZIP code and elements of dates. An LDS can be disclosed only for purposes of research, public health or health care operations.
The limited data set being provided to NCATS retain the dates of service and patient ZIP codes.
Within the N3C COVID Enclave, NCATS can use the limited data set to create de-identified data by algorithmically shifting the dates of service and truncating patient ZIP codes to the first three digits, or by removing them entirely if the ZIP code represents fewer than 20,000 individuals or represents Tribal lands.
All research using N3C data must be conducted in accordance with the Federal Policy for the Protection of Human Subjects, also known as the Common Rule. Users must be affiliated with an organization that has signed a Data Use Agreement with NCATS prior to submitting their DUR.
For more information about limited data sets, the HIPAA Privacy Rule and the Common Rule, please see —
What is Privacy Preserving Record Linkage?
Privacy Preserving Record Linkage (PPRL) is a means of connecting records using secure, pseudonymization processes in a data set that refer to the same individual across different data sources while maintaining the individuals’ privacy. NCATS uses PPRL technology to link multiple data sets, which enhances COVID-19 real-word data research in the N3C COVID Enclave.
Organizations contributing data to the N3C COVID Enclave organizations have the option of signing the Linkage Honest Broker Agreement (LHBA). The LHBA is an agreement between the organization, NCATS and the Regenstrief Institute, which serves as the linkage honest broker. A linkage honest broker in the PPRL’s infrastructure is a party that holds de-identified tokens and operates a service that matches tokens generated across disparate data sets to formulate a single Match ID for a specific use case. The data remains under the complete control of the organizations that provide data to N3C and is never accessible by or under the control of the linkage honest broker.
PPRL enables three functions within N3C: Deduplication of patient records, linkage of a patient’s records from different sources and cohort discovery. Deduplication is a requirement for any organization that participates in the LHBA because of its importance to the data quality of the N3C COVID Enclave data and its purpose.
Learn how N3C’s PPRL initiative enables data connectivity while maintaining security.
Read more FAQs about PPRL and the LHBA.
Contribute Data
Who owns the data in the NCATS N3C COVID Enclave?
The institutions that contribute data retain full ownership of their data at all times per the Data Use Agreement (PDF - 826KB) and Data Transfer Agreement (PDF - 139KB).
What happens after signing the Data Transfer Agreement?
- Step 1: Institutional Review Board (IRB) approval to transfer the Limited Data Set (LDS): John Hopkins University has set up a central IRB (cIRB) that will serve as the reviewing IRB for institutions to transfer data. Using JHU cIRB is optional, and an institution may choose to use a local IRB instead.
- Step 2: Data Acquisition: To assist sites in transferring data, the N3C has written a series of scripts to give to the sites that will pull the data based on the institution's common data model and database. N3C will reach out to the site’s technical team and work with them to configure the transfer process.
- Step 3: Data Harmonization: Once the team is ready to transfer the data, the N3C data harmonization team will set up a secure file transfer protocol site that is specific to the institution. The data harmonization team will ingest the LDS and run quality checks and transform different data models into a harmonized OMOP analytics data set.
- Step 4: Collaborative Analytics: Once the institution has signed the NCATS Data Use Agreement (PDF - 826KB), the investigators can apply to get access to the NCATS N3C Data Enclave. To get access, investigators will submit a Data Use Request. Once reviewed by the NCATS N3C COVID Enclave Data Access Committee to ensure appropriateness, investigators will be given access to the collaborative analytics platform. Access to the analytics platform is free of charge and includes training and ongoing support.
Is Institutional Review Board (IRB) review needed for providing data to the N3C COVID Enclave?
Likely, yes. However, the requirement for an IRB review will vary by institution, so please check the local requirements of the institution. Contact Tricia Francis for an approved copy of the N3C protocol to submit to an institution’s IRB.
Reliance on a central IRB is optional but encouraged. The Johns Hopkins Medicine (JHM) IRB is acting as the single-site IRB of record for any organizations providing data to the N3C program that want to use this service. Sites that would like to rely on the JHM IRB must ensure that their institution is enrolled in the SMART IRB platform and has an executed Letter of Indemnification with JHM.
JHM has created a streamlined process to guide sites through onboarding:
- Step 1: Sites that want to be part of the collaboration contact JHM’s Tricia Francis.
- Step 2: JHM will determine whether the site’s institution is part of the SMART IRB and has a Letter of Indemnification in place. If either of these requirements is missing, JHM can assist in fulfilling it.
- Step 3: Once participation in SMART IRB and the Letter of Indemnification are in place, sites will provide JHM with an email expressing willingness to rely on the JHM IRB as the ethics board of record.
- Step 4: JHM will send a tailored letter stating that the institution will cede oversight to the JHM IRB as well as the original IRB approval letter, the JHM-approved protocol, a Health Insurance Portability and Accountability Act (HIPAA) form and a Local Context Questionnaire (LCQ). The LCQ then must be completed and returned.
- Step 5: Once JHM receives the completed LCQ, it will submit the site’s information to its IRB for approval. JHM will provide written approval from the JHM IRB of the site’s participation.
For more details, please contact JHM’s Tricia Francis.
Can an organization that contributed data request to have it removed?
No; however, as cited in Article 11 of the Data Transfer Agreement (PDF - 139KB), an institution may discontinue its participation for any reason. Data from that institution will no longer be accessible for new Data Use Requests.
Why can’t synthetic data be submitted to the NCATS N3C COVID Enclave?
Synthetic data is artificial, statistically comparable, computational derivative of the original data. There are multiple reasons that organizations providing data to the enclave cannot submit synthetic data, including:
- Synthetic data is derived from a Limited Data Set (LDS). The process of creating the synthetic data sets takes place within NCATS secure enclave.
- Valuation, harmonization and quality control must be done prior to deriving the synthetic data. All common data model information is raw and requires a significant amount of cleaning (transformation) to be used for analytics. The transformation process cannot be done on synthetically derived data. N3C data includes a validation pipeline that transforms LDS across four different data models to ensure syntactical and semantic harmonization.
Who is the point of contact regarding data transfer or technical questions about the platform?
Please email specific questions to NCATS.
Use the Data
Will there be a fee to access the N3C data?
No fee will be charged.
How long until the data can be used?
Data availability is dependent upon when data are deposited into the platform, the execution of a Data Use Agreement and an approved Data Use Request.
Can analytics on the data enclave only be done within the NCATS platform?
Yes. To ensure the safety and security of the data, the NCATS platform is the only place to access and analyze these data.
Can data be downloaded or removed from the NCATS N3C Data Enclave in any form?
No. Data cannot be downloaded from this enclave or be removed based on the stipulations in the current Data Transfer Agreement (PDF - 139KB).
Who needs to submit a Data Use Request?
Investigators who want to access data in the N3C COVID Enclave for their research must submit a separate Data Use Request (DUR) for each project they want to establish or join as a collaborator:
- Investigators starting a new project: Establishing a new project requires submitting a DUR through the N3C COVID Enclave. Once the DUR is approved by the N3C Data Access Committee (DAC), a workspace for the project will be created in the N3C COVID Enclave and the investigator will be given access to that workspace.
- Investigators joining an existing project as a collaborator: Joining an existing project as a collaborator requires submitting a DUR through the N3C COVID Enclave. Once the DUR is approved by the DAC, the collaborator will be given access to the existing project’s workspace within the N3C COVID Enclave.
Learn more about applying for data access.
I already have an approved Data Use Request (DUR) for one project, but I want to start a new project or join a different project as a collaborator. Do I need to submit another DUR?
Yes. DURs are project-specific. Investigators starting a new project or joining an already established project will need to submit a separate DUR to the N3C Data Access Committee for that particular project, even if they already have an approved DUR for a different project.
How long does a Data Use Request last?
Once a Data Use Request (DUR) is approved by the N3C Data Access Committee, access to the N3C COVID Enclave workspace for that project will be effective for one year starting from the date access is granted. A Data Use Agreement must be in place for the entire term of a DUR. DURs will be renewable. When users renew their DURs, they will need to attest at that time that their training for access to the N3C COVID Enclave is up to date.
Are access requirements the same across all tiers of data?
No. Users requesting access to certain tiers of data will need to complete additional steps for the Data Use Request. See an overview of data access requirements. These requirements may change over time as NCATS adjusts and/or augments security and privacy measures in place to keep these data safe within its protected cloud infrastructure.
How long does it take for the N3C Data Access Committee to review a Data Use Request?
The entire process — from when a researcher submits a Data Use Request (DUR) in the N3C COVID Enclave to when the researcher receives notice that a workspace has been created for their project — usually takes 15 business days.
In most cases, the N3C Data Access Committee (DAC) will assign, review and make decisions on a primary DUR (i.e., a DUR to establish a new project) within 10 business days. However, more time for review may be needed if DURs are routed to the DAC Chair for additional discussion or adjudication, or if the DAC has a question about the relationship of institutional review board (IRB) documentation to a project that a collaborator is seeking to join.
After the DAC has made a decision, submitters will receive an email notification with reviewer or Chair comments, if any, in the footer. If the DAC has approved data access, a project workspace will be established within three business days, and a follow-up email will be sent to the submitters to confirm that the workspace has been created in the N3C COVID Enclave.
A common cause of delays in the DAC review process is a lack of clarity or detail provided in certain sections of the DUR or in the IRB documentation.
In their DURs, researchers should explain the following clearly:
- In the Public-Facing Abstract section, the COVID-19-related research question(s) being addressed.
- In the Rationale to N3C DAC section, the specific need for the data level requested to support the aims of the proposed research project (e.g., if exact dates or ZIP codes are requested, how they will be used to support the aim of the research project).
The IRB documentation provided should show clearly that: (1) the project described in the DUR has been reviewed and approved by the IRB, and (2) the investigator submitting the documentation is part of the research team as approved by the IRB.
What is the role of the N3C Data Access Committee?
The N3C Data Access Committee (DAC) is composed of federal employees with appropriate scientific, bioethics, information technology and/or human subjects research expertise who review and approve Data Use Requests (DURs).
The DAC will:
- Ensure that the DUR is for COVID-19 related research.
- Assess that the level of access requested is justified based on the research project described and the rationale provided.
- Ensure that the DUR includes an attestation to human subjects training.
- Ensure that certification of institutional review board (IRB) approval for human subjects research is provided for research that will involve use of the Limited Data Set.
- Ensure that NIH IT Security Training has been attested to (or documentation provided if requested).
- Ensure that human subjects research protections training has been attested to (or documentation provided if requested) for access requests for the de-identified data set or Limited Data Set requests.
Can a researcher from an institution outside of the United States request access to the data in the N3C COVID Enclave?
Yes, with certain limitations. If the researcher is not affiliated with a U.S. institution, the researcher can only request access to the de-identified and synthetic data sets. Learn more about access requirements for different data sets. All institutions must have a DUA with NCATS and can email NCATS with questions.
What should researchers do if they leave their institution but want to retain access to their approved N3C project and related data?
Researchers should contact NCATS to initiate the change in their institution affiliation on their Data Use Request. Their approved N3C project will be held in the NCATS N3C COVID Enclave while the change of institutional affiliation is made. If the institution does not have an active Data Use Agreement (DUA), the researcher will need to request the institution execute a DUA with NCATS. See the list of institutions with active DUAs.
Who can address any additional questions?
Please email questions about the N3C.