Conversation Simulator
deepeval
's ConversationSimulator
allows you to simulate full conversations between a fake user and your chatbot, unlike the synthesizer which generates regular goldens representing single, atomic LLM interactions.
from deepeval.conversation_simulator import ConversationSimulator
# Create ConversationalGolden
conversation_golden = ConversationalGolden(
scenario="Andy Byron wants to purchase a VIP ticket to a cold play concert.",
expected_outcome="Successful purchase of a ticket.",
user_description="Andy Byron is the CEO of Astronomer.",
)
# Define chatbot callback
async def chatbot_callback(input):
return "Chatbot response to: " + input
# Run Simulation
simulator = ConversationSimulator(model_callback=chatbot_callback)
conversational_test_cases = simulator.simulate(goldens=[conversation_golden])
print(conversational_test_cases)
The ConversationSimulator
uses the scenario and user description from a ConversationalGolden
to simulate back-and-forth exchanges with your chatbot. The resulting dialogue is used to create ConversationalTestCase
s for evaluation using deepeval
's multi-turn metrics.
Create Your First Simulator
To create a ConversationSimulator
, you'll need to define a callback that wraps around your LLM chatbot.
from deepeval.test_case import Turn
from deepeval.conversation_simulator import ConversationSimulator
async def model_callback(input: str, turns: List[Turn], thread_id: str) -> Turn:
return Turn(role="assistant", content=f"I don't know how to answer this: {input}")
simulator = ConversationSimulator(model_callback=model_callback)
There are ONE mandatory and FOUR optional parameters when creating a ConversationSimulator
:
model_callback
: a callback that wraps around your conversational agent.- [Optional]
simulator_model
: a string specifying which of OpenAI's GPT models to use for generation, OR any custom LLM model of typeDeepEvalBaseLLM
. Defaulted togpt-4.1
. - [Optional]
opening_message
: a string that specifies your LLM chatbot's opening message. You should only provide this IF your chatbot is designed to talk before a user does. Defaulted toNone
. - [Optional]
async_mode
: a boolean which when set toTrue
, enables concurrent generation of goldens. Defaulted toTrue
. - [Optional]
max_concurrent
: an integer that determines the maximum number of goldens that can be generated in parallel at any point in time. You can decrease this value if you're running into rate limit errors. Defaulted to100
.
Model callback
Only the input
argument is required when defining your model_callback
, but you may also define these optional arguments:
- [Optional]
turns
: a list ofTurn
s, which include the role and content of each message in the conversation. - [Optional]
thread_id
: a unique identifier for each conversation.
While turns captures the dialogue context for each turn, some applications must persist additional state across turns — for example, when invoking external APIs or tracking user-specific data. In these cases, you'll want to take advantage of the thread_id
.
from deepeval.test_case import Turn
async def model_callback(input: str, turns: List[Turn], thread_id: str) -> Turn:
# Inspect the turns and thread_id
print(turns)
print(thread_id)
# Replace with your chatbot
res = await your_llm_app(input, turns, thread_id)
return Turn(role="assistant", content=res)
Simulate A Conversation
To simulate your first conversation, simply pass in a list of ConversationalGolden
s to the simulate
method:
from deepeval.dataset import ConversationalGolden
...
conversation_golden = ConversationalGolden(
scenario="Andy Byron wants to purchase a VIP ticket to a cold play concert.",
expected_outcome="Successful purchase of a ticket.",
user_description="Andy Byron is the CEO of Astronomer.",
)
conversational_test_cases = simulator.simulate(conversational_goldens=[conversation_golden])
There are ONE mandatory and ONE optional parameter when calling the simulate
method:
conversational_goldens
: a list ofConversationalGolden
s that specify the scenario and user description.- [Optional]
max_turns
: an integer that specifies the maximum number of turns to simulate per conversation. Defaulted to10
.
A simulation ends either when the converaiton achieves the expected outcome outlined in a ConversationalGolden
, or when the max_turns
has been reached.
Evaluate Simulated Conversations
The simulate
function returns a list of ConversationalTestCase
s, which can be used to evaluate your LLM chatbot using deepeval
's conversational metrics. Use simulated conversations to run end-to-end evaluations:
from deepeval import evaluate
from deepeval.metrics import ConversationRelevancyMetric
...
evaluate(test_cases=conversational_test_cases, metrics=[ConversationRelevancyMetric()])