NCATS-Supported Researchers Recruit Citizen Scientists to Help Mine Biomedical Literature

Biomedical scientists are publishing new discoveries at a rapid rate. Currently, PubMed, the primary database for biomedical literature housed by NIH’s National Center for Biotechnology Information, contains more than 26 million articles and is expanding by more than 1 million articles per year — that’s two articles per minute. The research boom is great news for biomedical science, but it is challenging for humans to keep up with such massive amounts of information.

Enter a team of bioinformatics scientists from the Scripps Translational Science Institute (STSI) at The Scripps Research Institute (TSRI) in La Jolla, California, an NCATS Clinical and Translational Science Awards (CTSA) Program hub. Led by Andrew Su, Ph.D., associate professor in the Department of Molecular and Experimental Medicine at TSRI and director of computational biology at STSI, the group invented a web-based technology platform to arrange biomedical literature into a format that is easier for computers to organize and analyze. Because of the size of such a task, the platform, called Mark2Cure, is designed to employ crowdsourcing. Crowdsourcing is the practice of recruiting large numbers of people to help solve a complex problem—in this case, sorting and organizing thousands of biomedical papers. Su and his bioinformatics colleagues Max Nanis, Ginger Tsueng, Ph.D., and Benjamin Good, Ph.D., hope that Mark2Cure can make the biomedical literature more manageable and useful by enabling scientists to rediscover buried knowledge that can spur new research hypotheses.

“The Mark2Cure platform exemplifies the CTSA Program’s mission to develop innovative solutions that will improve the efficiency, quality and impact of the process for turning observations in the laboratory, clinic and community into interventions that improve the health of individuals and the public,” said NCATS Director Christopher P. Austin, M.D.

Citizen Scientists as Research Partners

Citizen scientists, like Judy and A.J. Eckhart, are the most important part of a citizen project.

Citizen scientists Judy and A.J. Eckhart share their motivations for contributing to the project. Citizen scientists contribute much more than data, offering valuable insight, suggestions and feedback on how to improve a project. The Eckharts and many other participants provided feedback on how to improve Mark2Cure’s tutorials and interface. (Su Lab, The Scripps Research Institute)

Humans are better than computers at certain tasks, such as scanning text and recognizing keywords and the relationships between them.

“Computers have trouble interpreting the free text in a scientific article very well. We call this an information extraction problem,” Su explained. “On the other hand, humans have a well-developed sense of how to parse language, understand grammar and infer meaning, even when the language is technical and jargon-filled.”

With this knowledge, the team set out to build a platform by which volunteers from the general public, whom they call “citizen scientists,” help solve the information extraction problem. After logging onto the web-based platform and undergoing a brief training exercise to become comfortable with scientific language and concept identification, the citizen scientists complete a two-step process. First they identify relevant concepts, such as genes, proteins, drugs or diseases, in the text, and then they define the relationships between concepts. For example, a relationship may be expressed by stating that a particular drug treats a certain disease; for example, insulin (drug) treats diabetes (disease).

“If we carried out those two steps for every article published, we would have a powerful knowledge base that computers could mine very effectively,” Su said.

In an initial experiment, Su and his team tested the effectiveness of the Mark2Cure approach by comparing the citizen scientists’ efforts with those of experts performing the same task. The Scripps researchers found that, in the aggregate, the citizen scientists performed the task  of highlighting disease mentions within biomedical text with very high accuracy, comparable to that of the experts. The researchers also found, through survey responses, that the citizen scientists had high levels of desire and motivation to volunteer for Mark2Cure; most respondents cited advancing science or learning as their motivation for participating. The Scripps scientists are publishing these findings in Citizen Science: Theory and Practice.

Using Mark2Cure to Study a Rare Disease

A knowledge network constructed from concepts identified by citizen scientists within months after Mark2Cure’s launch.

A knowledge network constructed from concepts identified by citizen scientists within months after Mark2Cure’s launch. Blue text indicates disease-related terms, green text indicates gene-related terms, and pink text indicates treatment-related terms. Many highly motivated citizen scientists were concerned about the quality of their work and were reassured to see that high-quality annotations dominated knowledge networks generated from their work. (Su Lab, The Scripps Research Institute)

Su and his group now have turned to testing Mark2Cure in the context of an actual disease. To start, they are focusing on N-glycanase 1 (NGLY1) deficiency, an extremely rare inherited disorder that affects multiple organs, causing developmental delays, movement problems and seizures, among other symptoms.

According to Su, rare diseases are an ideal starting point, because the literature base is relatively small and members of patient groups are often well-informed and highly motivated to help with research. Indeed, one of the ways NCATS seeks to improve the process of developing interventions is by engaging patient communities and advocacy groups and partnering with them to carry out research.

Already, the citizen scientists have identified a potential treatment for NGLY1 deficiency that is not typically associated with the condition: adrenocorticotropic hormone. The hormone improved some of the symptoms in one patient, and while it ultimately was not a viable treatment option, the discovery points to the potential of the Mark2Cure approach.

Beyond NGLY1 Deficiency

Mark2Cure is designed to help explore virtually any disease, rare or common, and the Scripps team plans to expand beyond NGLY1 deficiency in future projects. It is open-source, meaning that the software and data are available for anyone to access, but the researchers caution that the technology remains experimental. Still, the hope is that Mark2Cure will enable researchers to make unexpected connections between various diseases, their underlying mechanisms and potential treatments.

“By looking for commonalities across diseases, scientists have the potential to accelerate the development and demonstration of treatments for multiple diseases at once,” Austin said. “Mark2Cure’s use of crowdsourcing, engaging citizens and the rare diseases community, and identifying unexpected biological connections are all shared aspects of NCATS’ approach to speed translation to get more treatments to more patients more quickly.”

The work was supported in equal parts by the Scripps CTSA and NIH’s Big Data to Knowledge program.

Posted December 2016