AWS AgentCore
Amazon AgentCore is AWS's managed runtime for deploying and scaling AI agents.
End-to-End Evals
deepeval allows you to evaluate Strands agents using agentcore in under a minute.
Configure AgentCore
Pass agent_metrics to the instrument_agentcore method.
import nest_asyncio
nest_asyncio.apply()
from bedrock_agentcore import BedrockAgentCoreApp
from strands import Agent
from deepeval.integrations.agentcore import instrument_agentcore
from deepeval.metrics import AnswerRelevancyMetric
instrument_agentcore(
name="AgentCore Tracing",
environment="development",
agent_metrics=[AnswerRelevancyMetric()],
)
app = BedrockAgentCoreApp()
agent = Agent(model="amazon.nova-lite-v1:0")
@app.entrypoint
def invoke(payload):
user_message = payload.get("prompt")
result = agent(user_message)
return {"result": result.message}
response = invoke({"prompt": "Make a funny joke"})Run evaluations
Create an EvaluationDataset and invoke your agentcore application for each golden within the evals_iterator() loop to run end-to-end evaluations.
from deepeval.evaluate.configs import AsyncConfig
dataset = EvaluationDataset(
goldens=[
Golden(input="What's the weather in Paris?"),
Golden(input="What's the weather in London?"),
]
)
for golden in dataset.evals_iterator(async_config=AsyncConfig(run_async=False)):
response = invoke({"prompt": golden.input})✅ Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.
Evals in Production
To run online evaluations in production, replace metrics with a metric collection string from Confident AI, and run your Strands agent as usual with agentcore:
from bedrock_agentcore import BedrockAgentCoreApp
from strands import Agent
from deepeval.integrations.agentcore import instrument_agentcore
from deepeval.metrics import AnswerRelevancyMetric
instrument_agentcore(
name="AgentCore Tracing",
environment="development",
trace_metric_collection="my-trace-collection",
agent_metric_collection="my-agent-collection",
llm_metric_collection="my-llm-collection",
tool_metric_collection_map={
"get_weather": "my-tool-collection",
},
)
app = BedrockAgentCoreApp()
agent = Agent(model="amazon.nova-lite-v1:0")
@app.entrypoint
def invoke(payload):
user_message = payload.get("prompt")
result = agent(user_message)
return {"result": result.message}
response = invoke({"prompt": "Make a funny joke"})deepeval allows you to run component evals at different levels of like Trace, Agent, LLM and Tool spans. You can pass your metric collection for any spans using the instrument_agentcore method.