Skip to main content

WEST AI Algorithm May Help Speed Diagnosis of Rare Diseases

April 14, 2026

Rare diseases are often hard to diagnose, sending patients on a “diagnostic odyssey” that can last years. A new artificial intelligence (AI) program, developed through the NCATS Clinical and Translational Science Awards (CTSA) Program by NIH-supported researchers at Harvard Medical School and Boston Children’s Hospital, uses data from electronic health records (EHRs) to predict when a patient may have a rare disease. The researchers recently published their work in npj Digital Medicine.

“A lot of these young patients exhibit a strange array of symptoms, and it takes a while for clinicians to reach a diagnosis because they have limited prior experience with these rare and complex conditions,” said Kimberly Greco, M.P.H., a Ph.D. candidate in the Department of Biostatistics at Harvard T.H. Chan School of Public Health.

The WEakly Supervised Transformer (WEST) algorithm is able to use “noisy” (incomplete, incorrect, or noninformative) data in EHRs to suggest whether a patient has a particular rare condition. A transformer, like WEST, is a type of computer program that takes a sequence of data — such as diagnosis codes, lab results and chart notes spanning a patient’s medical history — and predicts a future sequence or turns it into a different type of data, such as a suggested diagnosis. Many well-known AI programs, like ChatGPT, are based on transformers. The “weak supervision” used by WEST allows more types of data to be incorporated into its predictions. In particular, the model can learn from patients both with and without confirmed diagnoses, using less precise outcome information to identify diagnostic patterns. This is particularly important when studying rare diseases, where knowledge may be limited and the high-quality labeled data typically needed for model training are often unavailable.

“The key innovation in this approach is that we pretend these data are labeled and use them to guide us, but we acknowledge they’re not perfect,” explained Tianxi Cai, Sc.D., professor in the Department of Biomedical Informatics at Harvard Medical School.

WEST was tested using EHR data from patients at risk for two rare lung diseases. Pulmonary hypertension (high blood pressure in the lung arteries) is a condition that often looks like asthma, but a wrong diagnosis can delay important treatments and lead to lung damage. Some patients in the study had already been diagnosed with pulmonary hypertension, while others had symptoms consistent with the disease. Similar analyses were conducted using EHR data from people with or at risk for severe asthma. WEST achieved the highest predictive performance among all baseline models in identifying patients diagnosed with pulmonary hypertension or severe asthma by expert doctors.

To build a model that could find rare diseases in noisy EHR data, the research team first mapped how pulmonary hypertension and asthma relate to other clinical features in the medical record. They then trained WEST on patient records containing these relevant events. As the model learned key patterns, its predictions were iteratively updated. “Our model’s performance becomes better and better, and eventually we achieve a performance that aligns closely with expert-reviewed diagnoses,” explained Zongxin Yang, Ph.D., a research fellow in the Department of Biomedical Informatics at Harvard Medical School.

A program with this capability would be particularly useful for primary care doctors who may see a patient with unfamiliar symptoms and are unsure how to diagnose them. “We are hopeful this tool can help the primary care doctor who doesn’t know what to do,” Cai said. The team is continuing to scale WEST to analyze patient data over time to suggest when they might get a disease or how they will respond to treatment. “We hope our model can help find rare diseases in early stages,” Yang explained. Greco added, “We want to know more about what a patient looks like just before and just after they have the disease, which will give us more knowledge about that turning point in the diagnostic process.”

“Results from this program demonstrate important advancements in AI-based tools and allow researchers to identify and extract meaningful information on a rare disease from existing electronic health records,” said Eric Sid, M.D., M.H.A., a program officer in the Division of Rare Diseases Research Innovation at NCATS. “Their forward-thinking approach in developing an AI-based model with a framework with an eye on adapting it to other types of rare diseases emphasizes the translational science principle of creating a generalizable solution to address common challenges that are shared across thousands of individually unique and heterogenous rare diseases.”


 

Last updated on April 14, 2026