RAG (retrieval-augmented generation)

What it means

RAG is the architectural pattern behind most production AI agents. Instead of relying on the LLM's general training knowledge (which can be outdated, incorrect, or generic), RAG runs every customer question through a two-step process: first retrieve the most relevant documents from a connected knowledge base, then generate the response using both the question and the retrieved documents as context.

The knowledge base can be plain documents, structured data (CRM tables, product catalogs), live API responses, or any combination. The retrieval layer typically uses vector embeddings to find semantically similar content even when wording differs.

Why it matters

RAG is what makes an AI agent reliably accurate on business-specific questions. A bare LLM does not know your prices, your policies, your services. A RAG agent looks them up before answering, every single time.

RAG also makes the agent updateable. Change a price in the knowledge base, and the agent reflects it in seconds. Compare to fine-tuning, which would require retraining the model.

Example

A clinic's RAG agent: customer asks 'How much for whitening?' The system searches the knowledge base, finds the price-sheet document, and passes it to the LLM as context. The LLM replies 'Whitening starts at $480 for the basic package.' Switch the price-sheet doc, the answer changes immediately.

What it means

Why it matters

Example

Related terms

Where this comes up