Scientists from Harvard Medical School and Stanford University have used artificial intelligence to make a diagnostic tool that can spot diseases on chest X-rays based on how they are described in natural language in the clinical reports that go with the X-rays.
Because most current AI models require laborious human annotation of enormous amounts of data before the labeled data is fed into the model to train it, the step is considered a major advancement in clinical AI design.
The model, called CheXzero, performed on par with human radiologists in its capacity to identify pathologies on chest X-rays, according to a report on the work that was published Sept. 15 in Nature Biomedical Engineering.
The group has made the model’s source code openly accessible to other academics.
To correctly identify pathologies during their “training,” the majority of AI models need labeled datasets. Since this process requires extensive annotation by human clinicians, which is frequently expensive and time-consuming, it is particularly difficult for tasks involving the interpretation of medical images. For example, skilled radiologists would have to examine hundreds of thousands of X-ray images one at a time and explicitly annotate each one with the conditions detected in order to label a chest X-ray dataset. Even though more recent AI models have tried to solve this labeling problem by learning from unlabeled data during a “pre-training” stage, they still need to be fine-tuned on labeled data to perform well.
The new model, however, is self-supervised in that it learns more independently without requiring manually labeled data either before or after training. The only data used in the model is the English-language notes found in the reports that go along with the chest X-rays.
According to study principal investigator Pranav Rajpurkar, assistant professor of biomedical informatics in the Blavatnik Institute at HMS, “we’re living in the early days of the next-generation medical AI models that are able to perform flexible tasks by directly learning from text.” The majority of AI models have up until this point relied on manually annotating enormous amounts of data, up to and including 100,000 images, to perform well. No such disease-specific annotations are required by our method.
“With CheXzero, one can simply feed the model a chest X-ray and associated radiology report, and it will learn that the image and the text in the report should be considered similar—in other words, it learns to match chest X-rays with their associated reports,” Rajpurkar continued. In the end, the model will be able to figure out which ideas in the unstructured text match which patterns in the image.
The model was “trained” using a publicly accessible dataset that included more than 227,000 clinical notes and 377,000 chest X-rays. Following that, two distinct datasets of chest X-rays and associated notes from two different institutions, one of which was in another country, were used to test the system’s performance. With so many different datasets, it was hoped that the model would work just as well when tested against clinical notes that might use different words to describe the same finding.
During testing, CheXzero was able to recognize pathologies that human clinicians had not explicitly annotated. It performed better than other self-supervised AI tools and had an accuracy comparable to that of radiologists.
The method, according to the researchers, could eventually be used with imaging modalities like CT scans, MRIs, and echocardiograms that go far beyond X-rays.
According to Ekin Tiu, a Stanford undergraduate student and visiting researcher at HMS, “CheXzero shows that accuracy of complex medical image interpretation need no longer be at the mercy of large labeled datasets.” We use chest X-rays as a motivating example, but CheXzero’s capability is generalizable to a wide range of medical settings where unstructured data is the norm, and it precisely embodies the promise of getting around the large-scale labeling bottleneck that has dogged the field of medical machine learning.
Stanford alumni Pujan Patel, Ellie Talius, and Tiu served as the paper’s co-first authors and as guests in the Rajpurkar lab. Andrew Ng and Curtis Langlotz, who are both from Stanford, contributed to the study as well.