RAG (Retrieval-Augmented Generation)
A pattern where an LLM retrieves relevant context from a knowledge base before generating a response, grounding output in source material.
Full definition
RAG (Retrieval-Augmented Generation) is the dominant architecture for grounding LLM output in factual sources. Instead of relying solely on what the model learned during training, a RAG system queries an external store — typically a vector database of embedded documents — fetches the most relevant chunks, and includes them in the prompt. The LLM then generates an answer constrained to that context. RAG is the go-to choice when knowledge changes frequently, when you need source attribution, or when the corpus is too large to fit in a context window. Quality is bottlenecked by retrieval, not generation, so most engineering effort goes into chunking, embedding choice, and reranking.
Frequently asked
What does RAG stand for?
Retrieval-Augmented Generation.
Is RAG better than fine-tuning?
For factual recall over changing knowledge, yes. For consistent output format or behavior, fine-tuning often wins.