1. What Is Retrieval-Augmented Generation?
RAG is a way to leverage already-trained large language models such as GPT-4, Gemini, Bard and Llama when building applications powered by artificial intelligence. By adding local knowledge (such as hospital policy or protocol information), context (such as clinician profile information) or history (such as patient clinical data), RAG augments LLMs to avoid common AI problems, such as a lack of specific information and LLM hallucinations.
2. How Does RAG Work?
RAG essentially “wraps” an LLM by adding relevant information to the prompt (query) that is sent to the LLM. For example, suppose a clinician wants to ask a question: “Should I increase the dosage of this drug for this patient?” With RAG, the question is processed first to understand the kind of query and specifics being addressed. Then, the RAG tool might retrieve the hospital protocol for the drug, manufacturer’s recommendations, patient history and recent lab results, sending all of this to the LLM along with the question from the clinician. This gives the LLM local knowledge, context and history to help answer the question. All of this is invisible to the clinician because the RAG wrapper is doing the work of choosing what to send to the LLM.
DIVE DEEPER: How can organizations avoid LLM bias and inaccuracy using data governance?
3. How Does RAG Compare to Fine-Tuning?
Fine-tuning an existing LLM adds information to the model, usually private data. This can be useful to make the LLM better at specific tasks. RAG improves the LLM with up-to-date and contextually useful information at the moment the LLM is queried. The patient data isn’t saved in the model, yet the model always has the latest information it needs to help answer questions, and there’s no security issue of leaking confidential data.
4. What Are the Benefits of Using RAG?
RAG extends the value of LLMs by giving the model additional information: local, relevant documents and protocols as well as real-time information from clinical databases. Clinicians and researchers can ask questions based on what is happening this minute, not when the LLM was trained. This additional information lets the LLM deliver answers that are both more relevant and more accurate. IT teams can also build in higher levels of security and tighter access controls, only feeding in information that the person asking the question is allowed to know.
5. What Are the Challenges of RAG?
RAG applications must preprocess user prompts to decide what additional information to send down. That can be a difficult job, and there’s a chance that the RAG application will send the wrong data down. Also, just because you’ve given an LLM additional information doesn’t mean that it is going to properly understand that data and incorporate it into the response.