Introduction to Summarizer Evaluation
Learn how to build, evaluate, and deploy a reliable LLM-powered meeting summarization agent using OpenAI and DeepEval.

OpenAI

DeepEval
If you're working with LLMs for summarization, this tutorial is for you. While we'll specifically focus on evaluating a meeting summarizer, the concepts and practices here can be applied to any LLM application tasked with summary generation.
Get Started
DeepEval is an open-source LLM evaluation framework that supports a wide-range of metrics to help evaluate and iterate on your LLM applications.
Click on these links to jump to different stages of this tutorial:
1
Build your Summarizer
- Use OpenAI to build a summarizer
- Learn modular coding techniques to improve your summarizer
- Learn parsing techniques to build production grade LLM applications
2
Evaluate your summarizer
- Learn how to define your evaluation criteria
- Create test cases using your summarizer
- Run your first eval
- Create datasets for future evaluations
3
Changing your model and prompts
- Use evaluation scores to improve your summarizer
- Iterate over different models to find the best one for your use case
- Change your system prompts and check for regressions
4
Setup Evals in Production
- Trace your entire application workflow
- Evaluate your summarizer during prod and choose your metrics
- Setup CI/CD workflows to always get reliable summaries
What You Will Evaluate
In this tutorial you will build and evaluate a meeting summarization agent that is used by famous tools like Otter.ai and Circleback to generate their summaries and action items from meeting transcripts. You will use deepeval
and evalue the summarization agent's ability to generate:
- A concise summary of the discussion
- A clear list of action items
Below is an example of what a deliverable from a meeting summarization platform might look like:
In the next section, we'll build this summarization agent from scratch using OpenAI API.
If you already have an LLM agent to evaluate, you can skip to Evaluation Section of this tutorial.