RAG Agent Evaluation Tutorial

This tutorial walks you through the entire process of building a reliable RAG (Retrieval-Augmented Generation) QA Agent, from initial development to iterative improvement through deepeval's evaluations. We'll build this RAG QA Agent using OpenAI, LangChain and DeepEval.

OpenAI

LangChain

DeepEval

note

This tutorial focuses on building a RAG-based QA agent for an infamous company called Theranos. However, the concepts and practices used throughout this tutorial are applicable to any RAG-based application. If you are working with RAG applications, this tutorial will be helpful to you.

Overview

DeepEval is an open-source LLM evaluation framework that supports a wide-range of metrics to help evaluate and iterate on your LLM applications.

You can click on the links below and jump to any stage of this tutorial as you like:

1

Develop Your RAG

Build a RAG with OpenAI & LangChain
Use OpenAI Embeddings
Use LangChain's vector stores
Create a full RAG QA Agent

2

Evaluate Your Retriever & Generator

Define your evaluation criteria
Evaluate your retriever and generator in isolation
Evaluate your RAG as a whole
Create datasets for robust eval pipelines

3

Improve your RAG using evals

Define your hyperparamets
Test different configurations with DeepEval
Find the best set of hyperparameters for your RAG

4

Deploy and test your RAG in prod

Trace your RAG components for each QA
Choose the metrics to apply in prod
Test your RAG for every new doc you push in your knowledge base

What You Will Evaluate

RAG (Retrieval-Augmented Generation) agents let companies build domain-specific assistants without fine-tuning large models. In this tutorial, you'll create a RAG QA agent that answers questions about Theranos, a blood diagnostics company. We will evaluate the agent's ability on:

Generating relevant and accurate answers
Providing correct citations to questions

Below is an example of what Theranos's internal RAG QA agent might look like:.

MadeUpCompany's RAG QA Agent

In the following sections of this tutorial, you'll learn how to build a reliable RAG QA Agent that retrieves correct data and generates an accurate answer based on the retrieved context.