Pydantic AI
Pydantic AI is a Python framework for building reliable, production-grade applications with Generative AI, providing type safety and validation for agent outputs and LLM interactions.
We recommend logging in to Confident AI to view your Pydantic AI evaluations.
deepeval login
End-to-End Evals
deepeval
allows you to evaluate Pydantic AI applications end-to-end in under a minute.
Configure Pydantic AI
Create agent and pass metrics
to the deepeval
's Agent
wrapper.
import time
from pydantic_ai import Agent
from deepeval.integrations.pydantic_ai import instrument_pydantic_ai
instrument_pydantic_ai(api_key="<your-confident-api-key>")
agent = Agent(
"openai:gpt-4o-mini",
system_prompt="Be concise, reply with one sentence.",
)
result = agent.run_sync("What are the LLMs?")
print(result)
time.sleep(10) # wait for the trace to be posted
# Running agent in async mode
# import asyncio
# async def main():
# result = await agent.run("What are the LLMs?")
# print(result)
# if __name__ == "__main__":
# asyncio.run(main())
# time.sleep(10)
Evaluations are supported for Pydantic AI Agent
. Only metrics with parameters input
and output
are eligible for evaluation.
Run evaluations
Create an EvaluationDataset
and invoke your Pydantic AI application for each golden within the evals_iterator()
loop to run end-to-end evaluations.
- Asynchronous
import asyncio
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.integrations.pydantic_ai import instrument_pydantic_ai, Agent
instrument_pydantic_ai(api_key="<your-confident-api-key>")
agent = Agent("openai:gpt-4o-mini", system_prompt="Be concise, reply with one sentence.")
answer_relavancy_metric = AnswerRelevancyMetric()
from deepeval.dataset import EvaluationDataset, Golden
dataset = EvaluationDataset(
goldens=[
Golden(input="What's 7 * 8?"),
Golden(input="What's 7 * 6?"),
]
)
for golden in dataset.evals_iterator():
task = asyncio.create_task(agent.run(
golden.input,
metrics=[answer_relavancy_metric],
))
dataset.evaluate(task)
✅ Done. The evals_iterator
will automatically generate a test run with individual evaluation traces for each golden.
View on Confident AI (optional)
If you need to evaluate individual components of your Pydantic AI application, set up tracing instead.
Evals in Production
To run online evaluations in production, replace metrics
with a metric collection string from Confident AI, and push your Pydantic AI agent to production.
import time
from deepeval.integrations.pydantic_ai import instrument_pydantic_ai, Agent
instrument_pydantic_ai(api_key="<your-confident-api-key>")
agent = Agent(
"openai:gpt-4o-mini",
system_prompt="Be concise, reply with one sentence.",
)
result = agent.run_sync(
"What are the LLMs?",
metric_collection="test_collection_1",
)
print(result)
time.sleep(10) # wait for the trace to be posted