Skip to main content

RAG Agent Evaluation Tutorial

This tutorial walks you through the entire process of building a reliable RAG (Retrieval-Augmented Generation) QA Agent, from initial development to iterative improvement through deepeval's evaluations. We'll build this RAG QA Agent using OpenAI, LangChain and DeepEval.

OpenAI logo

OpenAI

LangChain logo

LangChain

DeepEval logo

DeepEval

note

This tutorial focuses on building a RAG-based QA agent for an infamous company called Theranos. However, the concepts and practices used throughout this tutorial are applicable to any RAG-based application. If you are working with RAG applications, this tutorial will be helpful to you.

Overview

DeepEval is an open-source LLM evaluation framework that supports a wide-range of metrics to help evaluate and iterate on your LLM applications.

You can click on the links below and jump to any stage of this tutorial as you like:

What You Will Evaluate

RAG (Retrieval-Augmented Generation) agents let companies build domain-specific assistants without fine-tuning large models. In this tutorial, you'll create a RAG QA agent that answers questions about Theranos, a blood diagnostics company. We will evaluate the agent's ability on:

  • Generating relevant and accurate answers
  • Providing correct citations to questions

Below is an example of what Theranos's internal RAG QA agent might look like:.

MadeUpCompany's RAG QA Agent

In the following sections of this tutorial, you'll learn how to build a reliable RAG QA Agent that retrieves correct data and generates an accurate answer based on the retrieved context.