Hey! I'm Jar — Manvendra's AI sidekick. Want me to show you around?

Back to glossary
AI

RAG (Retrieval-Augmented Generation)

A pattern where an LLM retrieves relevant context from a knowledge base before generating a response, grounding output in source material.

Full definition

RAG (Retrieval-Augmented Generation) is the dominant architecture for grounding LLM output in factual sources. Instead of relying solely on what the model learned during training, a RAG system queries an external store — typically a vector database of embedded documents — fetches the most relevant chunks, and includes them in the prompt. The LLM then generates an answer constrained to that context. RAG is the go-to choice when knowledge changes frequently, when you need source attribution, or when the corpus is too large to fit in a context window. Quality is bottlenecked by retrieval, not generation, so most engineering effort goes into chunking, embedding choice, and reranking.

Frequently asked

What does RAG stand for?

Retrieval-Augmented Generation.

Is RAG better than fine-tuning?

For factual recall over changing knowledge, yes. For consistent output format or behavior, fine-tuning often wins.