Close

See How Your Peers Are Moving Forward in the Cloud

New research from CDW can help you build on your success and take the next step.

Jan 27 2025
Artificial Intelligence

How Does Retrieval-Augmented Generation (RAG) Support Healthcare AI Initiatives?

RAG takes large language models a step further by drawing on trusted sources of domain-specific information. This brings clear benefits to healthcare, where access to technical and medical data is critical for informed decision-making.

Large language models — generative artificial intelligence tools that process and generate text — are proving popular in healthcare. Because LLMs can respond to diverse prompts and process complex concepts, they show promise for augmenting medical research, patient education and clinical documentation. 

That said, models trained on general data often fall short when it comes to nuanced healthcare questions. A 2024 study from Mayo Clinic found accuracy rates of less than 40% for prompts on ChatGPT, Microsoft Bing Chat and Google Bard AI compared with in-depth literature searches for questions on kidney care. “In critical areas like healthcare decision making, the impact of such inaccuracies is considerably heightened,” the authors write, “highlighting the need for models that are more reliable and precise.”

This is where retrieval-augmented generation comes into play. RAG draws on additional, newer and domain-specific data sources. This lets an LLM parse more data than it was initially trained on and answer questions with greater accuracy and less bias — both of which are critical for ensuring the responsible use of generative AI in healthcare.

“The world we live in has its own set of canonical literature, whether it’s medical policies or claims processing manuals or technical literature,” says Corrine Stroum, head of emerging technology at SCAN Health Plan. “RAG will go to a trusted source of material and tell you, ‘This is where I found your answer.’”

Click the banner below to achieve meaningful AI transformation in healthcare with expert guidance.

 

What Is Retrieval-Augmented Generation?

With RAG, an LLM is better positioned to optimize its output before generating a response, says Tehsin Syed, Amazon Web Services’ general manager of health AI. This is valuable when a user is asking specific or technical questions.

“An authoritative external knowledge base is usually more current than the model’s training data, which is a key advantage,” he says. “For healthcare, this means LLMs can tap into the latest medical research, clinical guidelines and patient data to provide more accurate and contextually relevant responses.”

Along with improving accuracy, RAG can help organizations address concerns about bias in AI models that misrepresent risk and underestimate the need for care in minority populations. LLMs rely solely on pretrained knowledge, while Syed explains that RAG allows organizations to “curate more diverse, representative knowledge bases” and enables users to trace responses back to the source of information.

It’s important to note that RAG goes beyond simply fine-tuning an existing LLM. Fine-tuning adapts a model to a specific domain, and requires an extensive feedback loop of inputting additional training material and generating new questions and answers, Stroum says. Not surprisingly, that can be time-intensive and expensive.

RAG, on the other hand, doesn’t change the model but “augments its capabilities by retrieving and incorporating external information at runtime,” Syed says. “This approach offers greater flexibility, allowing the model to access the most current information without needing to be retrained.”

EXPLORE: Here are three areas where RAG implementation can be improved.

Benefits of RAG for Healthcare Institutions

By pulling in up-to-date information, RAG is meant to address the limitations of more traditional LLMs that don’t have access to the latest medical research, Syed says. Use cases for integrating the Amazon Comprehend Medical natural language processing service into a RAG workflow include automating medical coding, generating clinical summaries, analyzing medication side effects and deploying decision support systems.

Internally, RAG makes it possible for LLMs to pull in patient records and other confidential sources that general-purpose LLMs were never trained on. Health systems can use RAG to create highly personalized patient education materials, Syed notes. 

This highlights a key benefit of RAG, which is its ability to navigate unstructured data. Stroum points to evidence of coverage documents; an insurer operating in multiple states can easily have hundreds of these. With RAG, it’s possible to prompt a model to pull up the copay for a specific procedure under a specific plan in a specific county.

RAG is also a significant step forward from traditional search functionality, which struggles to recognize that differences between verb tenses (such as ran and run) shouldn’t necessarily impact search results.

“Today’s models can see what you’re asking, and they’re more forgiving,” Stroum says.

As a result, RAG is more accessible to end users who are less tech-savvy, who otherwise may get frustrated. It also allows for more in-depth prompts. An HR team, for example, can search a repository of resumes for candidates with at least three years of experience in Current Procedural Terminology coding. “RAG still uses the base expectations of the language model, but now you can modulate the level of the conversation,” Stroum adds.

Tehsin Syed
For healthcare, this means LLMs can tap into the latest medical research, clinical guidelines and patient data to provide more accurate and contextually relevant responses.”

Tehsin Syed General Manager of Health AI, Amazon Web Services

Building and Deploying a RAG Pipeline

While RAG may represent the latest in search technology, the most important principle in deploying it is the familiar adage: garbage in, garbage out.

“The most critical thing isn’t the AI but the knowledge repository you hook it up to,” Stroum says. A RAG pipeline is easy to create with just a few clicks. The challenge is scrutinizing that knowledge base; addressing multiple versions of the same document, for example, or dealing with outdated documents such as expired contracts. “Redundant, trivial and obsolete data will destroy RAG.”

From there, the RAG process consists of four steps, Syed says.

  1. Create embeddings, or numeric representations of text, which in turn ingest documents into a vector database. This step requires significant data cleansing and formatting, but it happens only once.
  2. Submit queries in natural language. This step, as well as the two that follow, occur every time a user conducts a search.
  3. Use an orchestrator to perform a similarity search in the vector database, retrieve relevant data and add that context to the prompt and query.
  4. Use the orchestrator to send the query and context to the LLM, which then generates a response.

From the user’s perspective, Syed says, using RAG is like interacting with any LLM. “However, the system knows much more about the content in question and provides answers that are fine-tuned to the organization's knowledge base,” he adds.

This may upend end users’ conception of traditional software, Stroum says. A typical application will provide the same answer if a user repeatedly asks the same question. By contrast, RAG may provide a slightly different answer as information and context change.

As a result, Stroum says, using RAG should be more like an open-ended conversation between the application and the end user. “You can give the model feedback. You can alter a prompt. You can ask a follow-up question,” she says. “You can say, ‘Yes, this is diabetes,’ or ‘No, this isn’t diabetes,’ and that will help the model understand.”

UP NEXT: Check out this overview of 2025 AI trends in healthcare.

everythingpossible/Getty Images