DeepEval just got a new look 🎉 Read the announcement to learn more.
Orchestration Frameworks

AWS AgentCore

Amazon AgentCore is AWS's managed runtime for deploying and scaling AI agents.

End-to-End Evals

deepeval allows you to evaluate Strands agents using agentcore in under a minute.

Configure AgentCore

Pass agent_metrics to the instrument_agentcore method.

main.py
import nest_asyncio

nest_asyncio.apply()

from bedrock_agentcore import BedrockAgentCoreApp
from strands import Agent
from deepeval.integrations.agentcore import instrument_agentcore
from deepeval.metrics import AnswerRelevancyMetric

instrument_agentcore(
    name="AgentCore Tracing",
    environment="development",
    agent_metrics=[AnswerRelevancyMetric()],
)

app = BedrockAgentCoreApp()
agent = Agent(model="amazon.nova-lite-v1:0")

@app.entrypoint
def invoke(payload):
    user_message = payload.get("prompt")
    result = agent(user_message)
    return {"result": result.message}

response = invoke({"prompt": "Make a funny joke"})

Run evaluations

Create an EvaluationDataset and invoke your agentcore application for each golden within the evals_iterator() loop to run end-to-end evaluations.

main.py
from deepeval.evaluate.configs import AsyncConfig

dataset = EvaluationDataset(
    goldens=[
        Golden(input="What's the weather in Paris?"),
        Golden(input="What's the weather in London?"),
    ]
)

for golden in dataset.evals_iterator(async_config=AsyncConfig(run_async=False)):
    response = invoke({"prompt": golden.input})

✅ Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.

Evals in Production

To run online evaluations in production, replace metrics with a metric collection string from Confident AI, and run your Strands agent as usual with agentcore:

from bedrock_agentcore import BedrockAgentCoreApp
from strands import Agent
from deepeval.integrations.agentcore import instrument_agentcore
from deepeval.metrics import AnswerRelevancyMetric

instrument_agentcore(
    name="AgentCore Tracing",
    environment="development",
    trace_metric_collection="my-trace-collection",
    agent_metric_collection="my-agent-collection",
    llm_metric_collection="my-llm-collection",
    tool_metric_collection_map={
        "get_weather": "my-tool-collection",
    },
)

app = BedrockAgentCoreApp()
agent = Agent(model="amazon.nova-lite-v1:0")

@app.entrypoint
def invoke(payload):
    user_message = payload.get("prompt")
    result = agent(user_message)
    return {"result": result.message}

response = invoke({"prompt": "Make a funny joke"})

deepeval allows you to run component evals at different levels of like Trace, Agent, LLM and Tool spans. You can pass your metric collection for any spans using the instrument_agentcore method.

On this page