Characterized by a weakened or damaged heart muscle, heart failure results in a gradual buildup of fluid in a patient’s lungs, legs, feet, and other parts of the body. This condition is chronic and incurable, often causing arrhythmias or sudden heart failure. For many centuries, bloodletting and leeching were the preferred treatments, famously practiced by barber surgeons in Europe, during a time when physicians rarely operated on patients.
In the 21st century, the management of heart failure has become decidedly less medieval: today, patients rely on healthy lifestyle changes, prescription medications, and sometimes the use of a pacemaker. Yet heart failure remains one of the leading causes of morbidity and mortality, placing a substantial burden on health care systems worldwide.
“About half of people with heart failure will die within five years of diagnosis,” says Thea Bergamaschi, an MIT PhD student in the laboratory of Colin Stultz, the Nina T. and Robert H. Rubin Professor and co-first author of a new paper presenting deep learning models for predicting heart failure. “Understanding how a patient will be treated after hospitalization is really important in allocating limited resources.”
Paper, published in Lancet eClinical Medicine A team of researchers from MIT, Mass General Brigham and Harvard Medical School have shared the results of the development and testing of Pulse-HF, which is meant to “predict changes in left ventricular systolic function from the ECG of patients with heart failure.” The project was conducted in Stultz’s lab, which is affiliated with the MIT Abdul Latif Jameel Clinic for Machine Learning in Health. Developed and retrospectively tested in three different patient cohorts at Massachusetts General Hospital, Brigham and Women’s Hospital and MIMIC-IV (publicly available dataset), the deep learning model accurately predicts changes in left ventricular ejection fraction (LVEF), which is the percentage of blood pumped out of the heart’s left ventricle.
A healthy human heart pumps about 50 to 70 percent of the blood from the left ventricle with each beat – anything less than that is considered a sign of a potential problem. “The model takes a [electrocardiogram] and predicts whether the ejection fraction will fall below 40 percent within the next year, says Tiffany Yau, an MIT PhD student in Stultz’s lab who is also co-first author of the pulsed-HF paper. “This is the most serious subgroup of heart failure.”
If PULSE-HF predicts that a patient’s ejection fraction is likely to deteriorate within a year, the physician may prioritize the patient for follow-up. Subsequently, low-risk patients can reduce the number of hospital visits and the time it takes to have 10 electrodes stuck to their body for a 12-lead ECG. The model can also be deployed in low-resource clinical settings, including doctors’ offices in rural areas, where there is not typically a cardiac sonographer employed to run ultrasounds on a daily basis.
“The biggest thing that differentiates [PULSE-HF] Other heart failure ECG methods predict, rather than detect, heart failure,” Yau says. The paper notes that to date, no other methods exist to predict future LVEF decline among heart failure patients.
During the testing and validation process, researchers used a metric called “area under the receiver operating characteristic curve” (AUROC) to measure the performance of Pulse-HF. AUROC is typically used to measure a model’s ability to discriminate between classes on a scale from 0 to 1, with 0.5 being random and 1 being perfect. PULSE-HF achieved AUROCs ranging from 0.87 to 0.91 in all three patient groups.
Notably, the researchers have also created a version of PULSE-HF for single-lead ECG, meaning only one electrode needs to be placed on the body. While the 12-lead ECG is generally considered superior for being more comprehensive and accurate, the performance of the single-lead version of Pulse-HF was just as strong as the 12-lead version.
Despite the elegant simplicity behind the idea of PULSE-HF, like most clinical AI research, it belies a laborious execution. “It took years [to complete this project],” recalls Bergamaschi. “It went through several iterations.”
One of the team’s biggest challenges was collecting, processing, and cleaning ECG and echocardiogram datasets. While the goal of the model is to predict a patient’s ejection fraction, the labels of the training data were not always readily available. Like a student learning from a textbook with answer keys, labeling is important to help machine-learning models correctly identify patterns in data.
Clean, linear text as TXT files usually works best when training models. But echocardiogram files typically come as PDFs, and when PDFs are converted to TXT files, it becomes difficult for models to read the text (which is broken up by line breaks and formatting). The unpredictable nature of real-life scenarios, such as restless patients or loose leadership, also skewed the data. “There are a lot of signal artifacts that need to be cleaned up,” says Bergamaschi. “It’s kind of a never-ending rabbit hole.”
While Bergamaschi and Yau acknowledge that more complex methods can help filter the data for better signals, there is a limit to the usefulness of these approaches. “At what point do you stop?” Yau asks. “You have to think about the use case – is it easier to have a model that works on data that’s a little messy? Because it probably will be.”
The researchers anticipate that the next step for PULSE-HF will be to test the model in a prospective study in real patients whose future ejection fraction is unknown.
Despite the inherent challenges in getting a clinical AI tool like PULSE-HF across the finish line, including the potential risk of extending the PhD by another year, the students feel that the years of hard work were worthwhile.
“I think things are rewarding partly because they’re challenging,” says Bergamaschi. “A friend told me, ‘If you think you’ll find your calling after graduation, if your calling is really a calling, it will be in an extra year after you graduate.’ …the way we are measured as researchers [the ML and health] The space is different from other researchers in the ML space. “Everyone in this community understands the unique challenges that exist here.”
“There is so much suffering in the world,” says Yau, who joined Stultz’s lab after a health program and realized the importance of machine learning in health care. “Anything that tries to reduce suffering is something I would consider a valuable use of my time.”