🔥 DeepEval 4.0 just got released. Read the announcement.

RAG Agent Evaluation Tutorial

This tutorial walks you through the entire process of building a reliable RAG (Retrieval-Augmented Generation) QA Agent, from initial development to iterative improvement through deepeval's evaluations. We'll build this RAG QA Agent using OpenAI, LangChain and DeepEval.

OpenAI

LangChain

DeepEval

Overview

DeepEval is an open-source LLM evaluation framework that supports a wide-range of metrics to help evaluate and iterate on your LLM applications.

You can click on the links below and jump to any stage of this tutorial as you like:

What You Will Evaluate

RAG (Retrieval-Augmented Generation) agents let companies build domain-specific assistants without fine-tuning large models. In this tutorial, you'll create a RAG QA agent that answers questions about Theranos, a blood diagnostics company. We will evaluate the agent's ability on:

  • Generating relevant and accurate answers
  • Providing correct citations to questions

Below is an example of what Theranos's internal RAG QA agent might look like:.

MadeUpCompany's RAG QA Agent

In the following sections of this tutorial, you'll learn how to build a reliable RAG QA Agent that retrieves correct data and generates an accurate answer based on the retrieved context.

On this page