🔥 DeepEval 4.0 just got released. Read the announcement.
Non-LLM

Exact Match

Single-turn

The Exact Match metric measures whether your LLM application's actual_output matches the expected_output exactly.

Required Arguments

To use the ExactMatchMetric, you'll have to provide the following arguments when creating an LLMTestCase:

  • input
  • actual_output
  • expected_output

Read the How Is It Calculated section below to learn how test case parameters are used for metric calculation.

Usage

from deepeval import evaluate
from deepeval.metrics import ExactMatchMetric
from deepeval.test_case import LLMTestCase

metric = ExactMatchMetric(
    threshold=1.0,
    verbose_mode=True,
)

test_case = LLMTestCase(
    input="Translate 'Hello, how are you?' in french",
    actual_output="Bonjour, comment ça va ?",
    expected_output="Bonjour, comment allez-vous ?"
)

# To run metric as a standalone
# metric.measure(test_case)
# print(metric.score, metric.reason)

evaluate(test_cases=[test_case], metrics=[metric])

There are TWO optional parameters when creating an ExactMatchMetric:

  • [Optional] threshold: a float representing the minimum passing threshold, defaulted to 1.0.
  • [Optional] verbose_mode: a boolean which when set to True, prints the intermediate steps used to calculate said metric to the console, as outlined in the How Is It Calculated section. Defaulted to False.

As a Standalone

You can also run the ExactMatchMetric on a single test case as a standalone, one-off execution.

...

metric.measure(test_case)
print(metric.score, metric.reason)

How Is It Calculated?

The ExactMatchMetric score is calculated according to the following equation:

Exact Match Score={1if actual_output = expected_output,0otherwise\text{Exact Match Score} = \begin{cases} 1 & \text{if actual\_output = expected\_output}, \\ 0 & \text{otherwise} \end{cases}

The ExactMatchMetric performs a strict equality check to determine if the actual_output matches the expected_output.

FAQs

Does the Exact Match metric call an LLM or cost money?
No. The ExactMatchMetric is a plain string equality check between actual_output and expected_output — no model, no API key, zero token cost, fully deterministic.
Is the comparison case-sensitive and whitespace-sensitive?
Yes. Any difference — casing, whitespace, punctuation, accents — scores 0. For example, "Bonjour, comment ça va ?" won't match "Bonjour, comment allez-vous ?".
When should I use Exact Match instead of an LLM-judge metric?
When there's exactly one acceptable answer — classification labels, enum values, canned responses. For open-ended outputs with many valid phrasings, use a semantic metric like Answer Relevancy.
Why does my output fail even though it looks correct?
It requires a character-for-character match. Set verbose_mode=True to print the compared strings and spot invisible differences like trailing newlines, smart quotes, or leading whitespace.

On this page