Introduction to Summarizer Evaluation

Learn how to build, evaluate, and deploy a reliable LLM-powered meeting summarization agent using OpenAI and DeepEval.

OpenAI

DeepEval

note

If you're working with LLMs for summarization, this tutorial is for you. While we'll specifically focus on evaluating a meeting summarizer, the concepts and practices here can be applied to any LLM application tasked with summary generation.

Get Started

DeepEval is an open-source LLM evaluation framework that supports a wide-range of metrics to help evaluate and iterate on your LLM applications.

Click on these links to jump to different stages of this tutorial:

1

Build your Summarizer

Use OpenAI to build a summarizer
Learn modular coding techniques to improve your summarizer
Learn parsing techniques to build production grade LLM applications

2

Evaluate your summarizer

Learn how to define your evaluation criteria
Create test cases using your summarizer
Run your first eval
Create datasets for future evaluations

3

Changing your model and prompts

Use evaluation scores to improve your summarizer
Iterate over different models to find the best one for your use case
Change your system prompts and check for regressions

4

Setup Evals in Production

Trace your entire application workflow
Evaluate your summarizer during prod and choose your metrics
Setup CI/CD workflows to always get reliable summaries

What You Will Evaluate

In this tutorial you will build and evaluate a meeting summarization agent that is used by famous tools like Otter.ai and Circleback to generate their summaries and action items from meeting transcripts. You will use deepeval and evalue the summarization agent's ability to generate:

A concise summary of the discussion
A clear list of action items

Below is an example of what a deliverable from a meeting summarization platform might look like:

Webpage Image

In the next section, we'll build this summarization agent from scratch using OpenAI API.

tip

If you already have an LLM agent to evaluate, you can skip to Evaluation Section of this tutorial.