OpenAI Agents
OpenAI Agents is a framework for building agents that can perform tasks.
End-to-End Evals
deepeval allows you to evaluate OpenAI Agents under a minute.
Configure OpenAI Agents
main.py
import os
from agents import Runner, add_trace_processor
from deepeval.openai_agents import Agent, DeepEvalTracingProcessor
from deepeval.metrics import AnswerRelevancyMetric
from tests.test_integrations.utils import assert_trace_json, generate_trace_json
add_trace_processor(DeepEvalTracingProcessor())
weather_agent = Agent(
name="Weather Agent",
instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
agent_metrics=[AnswerRelevancyMetric()],
)
info
Evaluations are supported for OpenAI Agents. Only metrics with parameters input and output are eligible for evaluation.
Run evaluations
Create an EvaluationDataset and invoke your OpenAI Agent for each golden within the evals_iterator() loop to run end-to-end evaluations.
- Synchronous
- Asynchronous
main.py
from deepeval.dataset import EvaluationDataset, Golden
dataset = EvaluationDataset(
goldens=[
Golden(input="What's the weather in UK?"),
Golden(input="What's the weather in France?"),
]
)
for golden in dataset.evals_iterator():
Runner.run_sync(weather_agent, golden.input)
main.py
import asyncio
from deepeval.dataset import EvaluationDataset, Golden
dataset = EvaluationDataset(
goldens=[
Golden(input="What's the weather in UK?"),
Golden(input="What's the weather in France?"),
]
)
for golden in dataset.evals_iterator():
task = asyncio.create_task(Runner.run(weather_agent, golden.input))
dataset.evaluate(task)
✅ Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.