Skip to main content

Diversity, Equity, Inclusion & Accessibility

NCATS leverages diversity, equity, inclusion and accessibility to produce research outcomes that are relevant to the full diversity of the population.

Using Health Record Data to Reveal Disparities in Diagnosing Long COVID

For millions of Americans, the COVID-19 pandemic has left a lingering condition known as long COVID. Long COVID encompasses complex symptoms and challenging health problems that last for weeks, months, or years after a COVID-19 infection.

Researchers in NIH’s Researching COVID to Enhance Recovery (RECOVER) Initiative assessed what a new long COVID diagnostic code reveals about who’s developing the condition — and whose diagnoses may be missed. Central to their study was the National COVID Cohort Collaborative (N3C) Data Enclave, a nationwide database developed by NCATS and its partners that reflects the diversity of the country.

The researchers looked at N3C data from the electronic health records of 33,782 adults and children who received a long COVID diagnosis between October 2021 and May 2022. All had been given a diagnosis of “post COVID-19 condition, unspecified,” the diagnostic code introduced in U.S. health care systems in October 2021.

In studying peoples’ profiles and symptoms, the researchers found multiple patterns. Among the more striking findings was that most of the people were white, female, non-Hispanic and likely to live in areas with low poverty and greater access to health care.

Those findings stood out, given what researchers already knew about the disproportionate impact of COVID on people of color and economically disadvantaged populations. The pattern suggested that not all patients who have long COVID are being diagnosed, said Emily Pfaff, Ph.D., a study author and assistant professor in the Division of Endocrinology and Metabolism at the University of North Carolina, Chapel Hill. Those disparities in diagnosis lead to poor outcomes and less access to treatments.


Looking at COVID-19 outcomes from EHR data that are representative of the U.S. population … has been a key priority of N3C.

The reasons for underdiagnosis vary. In addition to long-documented health disparities based on race and other factors, Pfaff explained, women are more likely than men to seek health care in general. People with the time and resources to access health care also tend to be disproportionally represented in clinical data.

“You can see all the different ways these diagnostic codes can provide insight, but they can also skew the whole story,” Pfaff said.

Still, she added, the insights help. She and her team found, for example, that most of the people with long COVID had mild to moderate symptoms of acute infection. They also discovered that long-term symptoms could be grouped into common clusters — cardiopulmonary, neurological, gastrointestinal and coexisting conditions — as well as by age.

“Looking at COVID-19 outcomes from EHR data that are representative of the U.S. population, including the communities hardest hit, has been a key priority of the N3C,” explained NCATS Director Joni L. Rutter, Ph.D. “By linking clinical data with demographic information, the N3C has helped us learn more about how risks for COVID-19 vary across ages, races, chronic conditions, and treatment regimens.”

From machine-learning models that better identify who has long COVID to initiatives such as N3C Public Health Answers to Speed Tractable Results (PHASTR) that speed answers to COVID-19 health outcomes, NCATS has developed research tools to sharpen understanding of long COVID, overcome treatment disparities and deliver solutions to everyone who needs them.


More DEIA Stories

N3C Data Reveal More Severe COVID-19 Outcomes in Rural Communities

Low Vitamin D Levels May Boost COVID-19 Risk in Black People

Last updated on July 19, 2024