“Medicine is not made up of just feel-good stories,” said Dr. Christine Tsien Silvers, a healthcare executive adviser at Amazon Web Services. “I’ve cared for patients who my colleagues and I were not able to save, and I’ve seen people make mistakes.”
At the AWS Summit in June in Washington, D.C., she recounted the experience of a pediatric patient who was given the wrong dose of a medication because no one on the care team realized it was written incorrectly. While it’s unclear exactly how many patients die annually because of medical errors, estimates are in the hundreds of thousands.
“There are good people working in a bad system that needs to be made safer,” Silvers said. “The U.S. performs worst despite spending the most. What can we do?”
While U.S. physician burnout has dropped below 50% for the first time since 2020, it remains a challenge for the healthcare industry that can impact care in the worst cases. Silvers pointed out that generative artificial intelligence tools could further mitigate clinician burnout by autogenerating referrals, summarizing research, drafting clinical notes using ambient listening, explaining medications to patients and checking on patients who are admitted into the hospital.
From clinical and operational efficiencies to medical research and patient experience, generative AI can help in many aspects of healthcare. In one real-world use case, Harvard Medical School is using generative AI to help interpret arterial blood gas test results quickly.
PREPARE: Expert guidance helps healthcare organizations achieve meaningful transformation with AI.
Generative AI Interprets Arterial Blood Gas Test Results
According to Cleveland Clinic, an ABG test measures oxygen and carbon dioxide levels in the blood in addition to its pH balance. It requires a blood sample taken from an artery, and it can be used to evaluate conditions including acute respiratory distress syndrome, sepsis, hypovolemic shock, an asthma attack, cardiac arrest, respiratory failure and heart failure. This test is often conducted in emergency situations, and the correct interpretation of its results can be crucial.
Dr. Praveen N. Meka, a physician at the Dana-Farber Cancer Institute and instructor at Harvard Medical School, spoke about how the university applied generative AI to a database of ABG test results to increase the speed of interpretation.
Click the banner below to learn how a modern data platform supports decision making.
He and his team first created an ABG database consisting of 50 results covering different medical scenarios. They used the Anthropic Claude v2 large language model via Amazon Bedrock to interpret the results. In the first pass, the model’s accuracy was less than 50%. They then conducted prompt engineering in which they further clarified how the ranges of partial pressure of carbon dioxide should be interpreted. Next, they introduced a retrieval-augmented generation architecture, providing more context to the general purpose LLM to reduce hallucinations. The accuracy went up to 75%, which Meka said was still far from what a clinical setting requires.
The team had to solve for a limitation with the model involving mathematical calculations. They created a math scratchpad to help the model interpret the ABG test. The user enters text data into the LLM to format it in a way that is easy for Python to read. Once the data is prepared in the scratchpad, Python code performs the calculations, which creates a new prompt for the LLM, generating more reliable answers. The team achieved up to 98% accuracy with this two-step process.
Meka emphasized the importance of visualization tools to ensure the LLM doesn’t look like a black box that is confusing to clinicians.
While clinical decision support is not new, generative AI tools can better support clinical efficiency.
“Generative AI in healthcare is here to stay,” Meka said. “There’s a lot of interest in the field, and generative AI could be helpful.”