RAG Agent Evaluation Tutorial
This tutorial walks you through the entire process of building a reliable RAG (Retrieval-Augmented Generation) QA Agent,
from initial development to iterative improvement through deepeval
's evaluations. We'll build this RAG QA Agent using OpenAI, LangChain and DeepEval.

OpenAI

LangChain

DeepEval
This tutorial focuses on building a RAG-based QA agent for an infamous company called Theranos. However, the concepts and practices used throughout this tutorial are applicable to any RAG-based application. If you are working with RAG applications, this tutorial will be helpful to you.
Overview
DeepEval is an open-source LLM evaluation framework that supports a wide-range of metrics to help evaluate and iterate on your LLM applications.
You can click on the links below and jump to any stage of this tutorial as you like:
1
Develop Your RAG
- Build a RAG with OpenAI & LangChain
- Use OpenAI Embeddings
- Use LangChain's vector stores
- Create a full RAG QA Agent
2
Evaluate Your Retriever & Generator
- Define your evaluation criteria
- Evaluate your retriever and generator in isolation
- Evaluate your RAG as a whole
- Create datasets for robust eval pipelines
3
Improve your RAG using evals
- Define your hyperparamets
- Test different configurations with DeepEval
- Find the best set of hyperparameters for your RAG
4
Deploy and test your RAG in prod
- Trace your RAG components for each QA
- Choose the metrics to apply in prod
- Test your RAG for every new doc you push in your knowledge base
What You Will Evaluate
RAG (Retrieval-Augmented Generation) agents let companies build domain-specific assistants without fine-tuning large models. In this tutorial, you'll create a RAG QA agent that answers questions about Theranos, a blood diagnostics company. We will evaluate the agent's ability on:
- Generating relevant and accurate answers
- Providing correct citations to questions
Below is an example of what Theranos's internal RAG QA agent might look like:.
In the following sections of this tutorial, you'll learn how to build a reliable RAG QA Agent that retrieves correct data and generates an accurate answer based on the retrieved context.