Skip to main content

Pydantic AI

Pydantic AI is a Python framework for building reliable, production-grade applications with Generative AI, providing type safety and validation for agent outputs and LLM interactions.

tip

We recommend logging in to Confident AI to view your Pydantic AI evaluations.

deepeval login

End-to-End Evals

deepeval allows you to evaluate Pydantic AI agents under a minute.

Configure Pydantic AI

Pass agent_metrics to the ConfidentInstrumentationSettings constructor.

main.py
from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai.instrumentator import (
ConfidentInstrumentationSettings,
)
from deepeval.metrics import AnswerRelevancyMetric

agent = Agent(
"openai:gpt-5",
instructions="You are a helpful assistant.",
instrument=ConfidentInstrumentationSettings(
is_test_mode=True,
agent_metrics=[AnswerRelevancyMetric()]
),
)
info

Evaluations are supported for Pydantic AI Agent. Only metrics with parameters input, output and tools_called are eligible for evaluation.

Run evaluations

Create an EvaluationDataset and invoke your Pydantic AI application for each golden within the evals_iterator() loop to run end-to-end evaluations.

main.py
import asyncio

dataset = EvaluationDataset(
goldens=[
Golden(input="What's the weather in Paris?"),
Golden(input="What's the weather in London?"),
]
)

for golden in dataset.evals_iterator():
task = asyncio.create_task(run_agent(golden.input))
dataset.evaluate(task)

✅ Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.

View on Confident AI (optional)

note

If you need to evaluate individual components of your Pydantic AI application, set up tracing instead.

Evals in Production

To run online evaluations in production, replace metrics with a metric collection string from Confident AI, and push your Pydantic AI agent to production.

from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai import ConfidentInstrumentationSettings

agent = Agent(
"openai:gpt-4o-mini",
system_prompt="Be concise, reply with one sentence.",
instrument=ConfidentInstrumentationSettings(
agent_metric_collection="test_collection_1",
)
)

result = agent.run_sync(
"What are the LLMs?"
)