Pydantic AI
Pydantic AI is a Python framework for building reliable, production-grade applications with Generative AI, providing type safety and validation for agent outputs and LLM interactions.
We recommend logging in to Confident AI to view your Pydantic AI evaluations.
deepeval login
End-to-End Evals
deepeval allows you to evaluate Pydantic AI agents under a minute.
Configure Pydantic AI
Pass agent_metrics to the ConfidentInstrumentationSettings constructor.
from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai.instrumentator import (
ConfidentInstrumentationSettings,
)
from deepeval.metrics import AnswerRelevancyMetric
agent = Agent(
"openai:gpt-5",
instructions="You are a helpful assistant.",
instrument=ConfidentInstrumentationSettings(
is_test_mode=True,
agent_metrics=[AnswerRelevancyMetric()]
),
)
Evaluations are supported for Pydantic AI Agent. Only metrics with parameters input, output and tools_called are eligible for evaluation.
Run evaluations
Create an EvaluationDataset and invoke your Pydantic AI application for each golden within the evals_iterator() loop to run end-to-end evaluations.
- Asynchronous
import asyncio
dataset = EvaluationDataset(
goldens=[
Golden(input="What's the weather in Paris?"),
Golden(input="What's the weather in London?"),
]
)
for golden in dataset.evals_iterator():
task = asyncio.create_task(run_agent(golden.input))
dataset.evaluate(task)
✅ Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.
View on Confident AI (optional)
If you need to evaluate individual components of your Pydantic AI application, set up tracing instead.
Evals in Production
To run online evaluations in production, replace metrics with a metric collection string from Confident AI, and push your Pydantic AI agent to production.
from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai import ConfidentInstrumentationSettings
agent = Agent(
"openai:gpt-4o-mini",
system_prompt="Be concise, reply with one sentence.",
instrument=ConfidentInstrumentationSettings(
agent_metric_collection="test_collection_1",
)
)
result = agent.run_sync(
"What are the LLMs?"
)