Skip to main content

Introduction to Summarizer Evaluation

Learn how to build, evaluate, and deploy a reliable LLM-powered meeting summarization agent using OpenAI and DeepEval.

OpenAI logo

OpenAI

DeepEval logo

DeepEval

note

If you're working with LLMs for summarization, this tutorial is for you. While we'll specifically focus on evaluating a meeting summarizer, the concepts and practices here can be applied to any LLM application tasked with summary generation.

Get Started

DeepEval is an open-source LLM evaluation framework that supports a wide-range of metrics to help evaluate and iterate on your LLM applications.

Click on these links to jump to different stages of this tutorial:

What You Will Evaluate

In this tutorial you will build and evaluate a meeting summarization agent that is used by famous tools like Otter.ai and Circleback to generate their summaries and action items from meeting transcripts. You will use deepeval and evalue the summarization agent's ability to generate:

  • A concise summary of the discussion
  • A clear list of action items

Below is an example of what a deliverable from a meeting summarization platform might look like:

Webpage Image

In the next section, we'll build this summarization agent from scratch using OpenAI API.

tip

If you already have an LLM agent to evaluate, you can skip to Evaluation Section of this tutorial.