Multi-Turn
Goal Accuracy
LLM-as-a-judge
Multi-turn
Referenceless
Agent
Multimodal
The Goal Accuracy metric is a multi-turn agentic metric that evaluates your LLM agent's abilities on planning and executing the plan to finish a task or reach a goal. It is a self-explaining eval, which means it outputs a reason for its metric score.
Required Arguments
To use the GoalAccuracyMetric, you'll have to provide the following arguments when creating a ConversationalTestCase:
turns
You can learn more about how it is calculated here.
Usage
The GoalAccuracyMetric() can be used for end-to-end multi-turn evaluations of agents.
from deepeval import evaluate
from deepeval.metrics import GoalAccuracyMetric
from deepeval.test_case import Turn, ConversationalTestCase, ToolCall
convo_test_case = ConversationalTestCase(
turns=[
Turn(role="...", content="..."),
Turn(role="...", content="...", tools_called=[...])
],
)
metric = GoalAccuracyMetric(threshold=0.5)
# To run metric as a standalone
# metric.measure(convo_test_case)
# print(metric.score, metric.reason)
evaluate(test_cases=[convo_test_case], metrics=[metric])There are SIX optional parameters when creating a GoalAccuracyMetric:
- [Optional]
threshold: a float representing the minimum passing threshold, defaulted to 0.5. - [Optional]
model: a string specifying which of OpenAI's GPT models to use, OR any custom LLM model of typeDeepEvalBaseLLM. Defaulted togpt-5.4. - [Optional]
include_reason: a boolean which when set toTrue, will include a reason for its evaluation score. Defaulted toTrue. - [Optional]
strict_mode: a boolean which when set toTrue, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted toFalse. - [Optional]
async_mode: a boolean which when set toTrue, enables concurrent execution within themeasure()method. Defaulted toTrue. - [Optional]
verbose_mode: a boolean which when set toTrue, prints the intermediate steps used to calculate said metric to the console, as outlined in the How Is It Calculated section. Defaulted toFalse.
As a standalone
You can also run the GoalAccuracyMetric on a single test case as a standalone, one-off execution.
...
metric.measure(convo_test_case)
print(metric.score, metric.reason)How Is It Calculated
The GoalAccuracyMetric score is calculated using the following steps:
- Find individual goals and steps taken by your LLM agent for each user-assistat interactions.
- Find goal accuracy scores for each of the goal-steps pairs using the evaluation model.
- Find plan quality and plan adherence scores for each of the goal-step pairs using the evaluation model.