OpenAI Agents

OpenAI Agents is a framework for building agents that can perform tasks.

End-to-End Evals

deepeval allows you to evaluate OpenAI Agents under a minute.

Configure OpenAI Agents

main.py
import os
from agents import Runner, add_trace_processor
from deepeval.openai_agents import Agent, DeepEvalTracingProcessor
from deepeval.metrics import AnswerRelevancyMetric
from tests.test_integrations.utils import assert_trace_json, generate_trace_json

add_trace_processor(DeepEvalTracingProcessor())

weather_agent = Agent(
    name="Weather Agent",
    instructions="You are a weather agent. You are given a question about the weather and you need to answer it.",
    agent_metrics=[AnswerRelevancyMetric()],
)

info

Evaluations are supported for OpenAI Agents. Only metrics with parameters input and output are eligible for evaluation.

Run evaluations

Create an EvaluationDataset and invoke your OpenAI Agent for each golden within the evals_iterator() loop to run end-to-end evaluations.

Synchronous
Asynchronous

main.py
from deepeval.dataset import EvaluationDataset, Golden

dataset = EvaluationDataset(
    goldens=[
        Golden(input="What's the weather in UK?"),
        Golden(input="What's the weather in France?"),
    ]
)

for golden in dataset.evals_iterator():
    Runner.run_sync(weather_agent, golden.input)

main.py
import asyncio
from deepeval.dataset import EvaluationDataset, Golden

dataset = EvaluationDataset(
    goldens=[
        Golden(input="What's the weather in UK?"),
        Golden(input="What's the weather in France?"),
    ]
)

for golden in dataset.evals_iterator():
    task = asyncio.create_task(Runner.run(weather_agent, golden.input))
    dataset.evaluate(task)

✅ Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.

End-to-End Evals​

Configure OpenAI Agents

Run evaluations

End-to-End Evals