Jul 01 2024

The Evolution of LLMs in Healthcare

Large language models analyze content to generate their own content and, in some cases, make predictions. That presents opportunities for healthcare, provided developers put the right safeguards in place.

Just six months after OpenAI released the large language model ChatGPT, three Stanford University physicians penned a commentary wondering if LLMs  “will reshape modern medicine.” Writing in 5Internal Medicine in April 2023, they continued, “Good or bad, ready or not, Pandora’s box has already been opened.”

LLMs, defined by Gartner as artificial intelligence models “trained on vast amounts of text to understand existing content and generate original content,” have found many use cases in healthcare in less than two years. Some center on facilitating communication with patients, while others attempt to analyze large sets of unstructured data for clues about a patient’s condition or to determine appropriate billing codes.

LLMs in healthcare are undergoing near-constant change as developers refine their models to improve accuracy and remove bias, and health systems determine the workflows that are appropriate for using AI. As a second JAMA paper from Stanford concluded, the industry needs to ensure models make medical professionals more productive, rather than simply automating tasks they already know how to do.

PREPARE: Expert guidance helps healthcare organizations achieve meaningful transformation with AI.

Five Key Roles for LLMs in Healthcare

There are two general categories of LLMs, according to the second Stanford piece. One category is trained on medical documents, ranging from progress notes to medical literature, and is typically deployed to summarize lengthy records or answer clinical questions. The second is trained on structured medical codes, generates a “high-dimensional vector representing the patient’s medical record,” and aims to predict medical events such as readmissions or lengthy hospital stays.

“LLMs excel at summarizing information accurately and even suggesting decisions based on their analysis,” says Venky Ananth, executive vice president and global head of healthcare at Infosys.

As a result, LLMs are well positioned to address two valuable clinical use cases that otherwise involve heavy reading: Disease management and prior authorization. “We’re seeing high adoption in the prior authorization space, as this often involves wading through massive amounts of clinical data, physicians’ notes and lab results spread across dozens of pages.”

A 2023 paper in Communications Medicine listed three additional use cases for LLMs in improving patient care:

  • Better communication between patients and provider organizations, especially after business hours and for difficult conditions that carry a social stigma
  • Fast and accurate translations of medical information into “plain, everyday language” as well as languages other than English
  • Reporting, documentation and other administrative requirements, which researchers estimate can occupy at least 25 percent of a clinician’s workday

Click the banner below for expert guidance to help optimize your IT operations.


LLMs and EHR Integrations

The clinical documentation use case has seen significant activity in recent months. The news site Healthcare IT Today lists nearly 20 ambient clinical voice vendors, with AI-powered tools that document clinical visits and automatically generate summaries. These vendors range from recently launched startups to Nuance, a speech recognition company that debuted in 1992 and was acquired by Microsoft three decades later.

A major contributor to the success of these tools is their ability to integrate with electronic health record systems, the applications where clinical staff spend the bulk of their day. It comes as no surprise that major EHR vendors such as athenahealth, eClinicalWorks, Epic, NextGen Healthcare and Cerner owner Oracle Health have released (or are working on) their own ambient AI tools, often in conjunction with partners.

“Ambient listening is freeing up doctors and nurses from spending hours on tedious documentation, allowing them to dedicate more time to interact with patients as well as reduce burnout,” Ananth says.

An additional benefit to integrating LLMs into EHR systems is the ability to layer on models that read and respond to portal messages from patients. Some health systems have adopted such models to help clinicians address a growing influx of inbound communication, much of which must be addressed outside of normal business hours due to the busy nature of clinical practice.

In multiple studies, these LLMs have shown their effectiveness in responding to patients while reducing physician workloads, though concerns remain about accuracy and the amount of time clinical staff spend generating messages.

  • Mass General Brigham found ChatGPT’s messages were appropriate for patients without any edits from physicians 58 percent of the time, though about 8 percent of recommended responses posed a risk of harm. Overall, ChatGPT responses were more educational but less directive than responses from physicians. “LLM assistance is a promising avenue to reduce clinician workload but has implications that could have downstream effect on patient outcomes,” the paper concluded.
  • One Stanford study found physicians using AI models to generate responses spent 22 percent longer reading the messages, and sent messages that were 18 percent longer. Some messages needed to be edited to remove clinical advice outside the scope of the patient’s original question. That said, much of the additional language was attributed to “those extra personal touches that are highly valued by patients” in messages, such as an empathetic tone.
  • Another Stanford study pointed to improvements in perceived burnout and administrative burdens despite no time savings when using LLMs to respond to messages. “It may be that switching from writing to editing may be less cognitively taxing despite taking the same amount of time,” researchers concluded, adding that “perceptions of time may be different from time captured via EHR metadata.”

DISCOVER: Remote patient monitoring and AI personalize care.

AMIE and Other LLMs from Big Tech

Big technology companies are also getting into the LLM game. Amazon Web Services, Google and Microsoft all have released AI-powered documentation tools, though all three vendors have their eyes on bigger prizes.

In January, Google announced the Articulate Medical Intelligence Explorer. For now a research-only system, AMIE has been “optimized for diagnostic reasoning and conversations,” according to Google. It’s meant to help determine a patient’s possible diagnosis based on the information the patient provides to a text-based chat. This announcement came on the heels of MedLM, which HCA Healthcare has been piloting to support documentation in the emergency department.

Beyond the work of Nuance, Microsoft last October announced Azure AI Health Bot, a cloud service that comes with a symptom checker and medical database and is meant to help organizations develop their own LLMs. According to the company, insurers are using the service to help members check the status of a claim or see what services are covered under their insurance plan. Providers have implemented instances for letting patients find a nearby doctor or determine the appropriate care setting given their symptoms.

AWS is similarly focused on providing the foundation for cloud-based model development through its managed service known as Amazon Bedrock. Provider organizations have built AWS-hosted LLMs for data extraction and real-time analysis to create discharge summaries and identify at-risk patients, among other use cases.

Venky Ananth
Ambient listening is freeing up doctors and nurses from spending hours on tedious documentation, allowing them to dedicate more time to interact with patients as well as reduce burnout.”

Venky Ananth Executive Vice President and Global Head of Healthcare, Infosys

LLMs, Data Privacy, Patient Safety and Hallucinations 

Earlier this year, the World Health Organization released guidelines for the ethical use of LLMs and other AI in healthcare. WHO recognized the potential for LLMs while highlighting a range of risks, including but not limited to inaccuracy, bias, lack of accountability, threats to privacy and a further widening of the digital divide. “How LMMs are accessed and used is new, with both novel benefits and risks that societies, health systems and end users may not yet be prepared to address fully,” WHO noted.

Ananth says it’s critical for both developers and users of LLMs to be accountable for their actions. This includes the harm that may be caused by using LLM outputs without having a human in the loop. He recommends six safeguards:

  • Set guidelines on where LLMs and generative AI can and cannot be used throughout the organization.
  • Use diverse data sets and apply both rigorous testing and human feedback to them for “reinforced learning.”
  • Protect data sets from cyberattacks by ensuring only authorized individuals and systems can access them and requiring identity verification prior to access.
  • Integrate “explainable AI techniques” so end users can understand why an LLM made a given recommendation.
  • Maintain a transparent development process with “continuous dialogue” about the capabilities and limitations of LLMs.
  • Monitor and evaluate the performance of LLMs, particularly in the way they impact outcomes, to maintain compliance with regulatory and ethical standards.

One area of concern for LLMs is “hallucinations” — model outputs that are flat-out wrong. (Think of images of people with seven fingers or three arms.) The stakes are certainly high in healthcare, particularly when it comes to making a diagnosis or determining a billing code. That explains in large part why physicians using LLMs to respond to patient portal messages take the time to review model outputs.

Prashant Natarajan, vice president of strategy and products for H2O.ai, says developers and users should recognize that hallucinations are an inherent part of LLMs, and keep that in mind as they deploy them.

Generative AI models are designed to process large amounts of text data. They do a good job predicting the next token in a sequence,” such as the letters most likely to come after “Q” in a word. “It’s not a mathematical prediction model.”

LLMs need to be tested, Natarajan says, and organizations need to look at the hallucinations that emerge. “In some cases, you want hallucinations because you can use known techniques to reduce them. You need to understand where hallucinations will be useful. You won’t know unless you do it.”

EXPLORE: Medical schools train the next generation of clinicians to better understand AI.

The Future of LLMs in Healthcare

An analysis from Stanford suggested there may be significant untapped potential for LLMs in healthcare. Many LLMs to date have been used for tasks such as augmenting diagnostics or communicating with patients. Far fewer models are addressing the administrative tasks that contribute to burnout in clinicians.

“We urgently need to set up evaluation loops for LLMs where models are built, implemented and then continuously evaluated via user feedback,” the study’s authors concluded.

Natarajan says LLMs are useful now, albeit in a limited context. The “frontier,” he says, is when LLMs are further embedded in the applications that clinicians and patients use every day, appearing to complete a task and then disappearing when they’re done.

“AI is moving to interaction, behavior, context and intelligent agents. It’s connecting to behaviors, reactions and emotions,” he says. “The world is expanding beyond writing emails.”

Boy Wirat/Getty Images

Learn from Your Peers

What can you glean about security from other IT pros? Check out new CDW research and insight from our experts.