Skip to main content

LangChain

DeepEval makes it easy to evaluate LangChain applications in both development and production environments.

tip

We recommend logging in to Confident AI to view your LangChain evaluation traces.

deepeval login

End-to-End Evals

DeepEval allows you to evaluate LangChain applications end-to-end in under a minute.

Configure LangChain

Create a CallbackHandler with a list of task completion metrics you wish to use, and pass it to your LangChain application's invoke method.

main.py
from langchain.chat_models import init_chat_model
from deepeval.integrations.langchain import CallbackHandler
from deepeval.metrics import TaskCompletionMetric

def multiply(a: int, b: int) -> int:
"""Returns the product of two numbers"""
return a * b

llm = init_chat_model("gpt-4o-mini", model_provider="openai")
llm_with_tools = llm.bind_tools([multiply])

# Create goldens
llm_with_tools.invoke(
"What is 3 * 12?",
config = {"callbacks": [CallbackHandler(metrics=[TaskCompletionMetric(task="multiplication")])]}
)
info

Only Task Completion is supported for the LangChain integration. To use other metrics, manually set up tracing instead.

Run evaluations

Create an EvaluationDataset and invoke your LangChain application for each golden within the evals_iterator() loop to run end-to-end evaluations.

main.py
from deepeval.dataset import EvaluationDataset, Golden
...

dataset = EvaluationDataset(goldens=[Golden(input="What is 3 * 12?")])
for golden in dataset.evals_iterator():
llm_with_tools.invoke(
"What is 3 * 12?",
config = {"callbacks": [CallbackHandler(metrics=[TaskCompletionMetric()])]}
)

✅ Done. The evals_iterator will automatically generate a test run with individual evaluation traces for each golden.

note

If you need to evaluate individual components of your LangChain application, set up tracing instead.

Evals in Production

To run online evaluations in production, simply replace metrics in CallbackHandler with a metric collection string from Confident AI, and push your LangChain agent to production.

info

This will automatically evaluate all incoming traces in production with the task completion metrics defined in your metric collection.

from deepeval.integrations.langchain import CallbackHandler
...

# Invoke your agent with the metric collection name
llm_with_tools.invoke(
"What is 3 * 12?",
config = {"callbacks": [
CallbackHandler(metric_collection="<metric-collection-name-with-task-completion>")
]}
)