Skip to main content

Multi-Turn

Quick Summary

A multi-turn test case is a blueprint provided by deepeval to unit test a series of LLM interactions. A multi-turn test case in deepeval is represented by a ConversationalTestCase, and has TWO parameters:

  • turns
  • chatbot_role
note

deepeval makes the assumption that a multi-turn use case are mainly conversational chatbots. Agents on the other hand, should be evaluated via component-level evaluation instead, where each component in your agentic workflow is assessed individually.

Here's an example implementation of an ConversationalTestCase:

from deepeval.test_case import ConversationalTestCase, Turn

test_case = ConversationalTestCase(
turns=[
Turn(role="user", content="How are you doing?"),
Turn(role="assistant", content="Why do you care?)
]
)

Conversational Test Case

While a single-turn test case represents an individual LLM system interaction, a ConversationalTestCase encapsulates a series of Turns that make up an LLM-based conversation. This is particular useful if you're looking to for example evaluate a conversation between a user and an LLM-based chatbot.

A ConversationalTestCase can only be evaluated using conversational metrics.

main.py
from deepeval.test_case import Turn, ConversationalTestCase

turns = [
Turn(
role="assistant",
content="Why did the chicken cross the road?",
),
Turn(
role="user",
content="Are you trying to be funny?",
),
]

test_case = ConversationalTestCase(turns=turns)
note

Similar to how the term 'test case' refers to an LLMTestCase if not explicitly specified, the term 'metrics' also refer to non-conversational metrics throughout deepeval.

Turns

The turns parameter is a list of Turns and is basically a list of messages/exchanges in a user-LLM conversation. If you're using ConversationalGEval, you might also want to supply different parameteres to a Turn. A Turn is made up of the following parameters:

class Turn:
role: Literal["user", "assistant"]
content: str
user_id: Optional[str] = None
retrieval_context: Optional[List[str]] = None
tools_called: Optional[List[ToolCall]] = None
additional_metadata: Optional[Dict] = None
info

You should only provide the retrieval_context and tools_called parameter if the role is "assistant".

The role parameter specifies whether a particular turn is by the "user" (end user) or "assistant" (LLM). This is similar to OpenAI's API.

Chatbot Role

The chatbot_role parameter is an optional parameter that specifies what role the chatbot is supposed to play. This is currently only required for the RoleAdherenceMetric, where it is particularly useful for a role-playing evaluation use case.

from deepeval.test_case import Turn, ConversationalTestCase

test_case = ConversationalTestCase(chatbot_role="A happy jolly wizard.", turns=[Turn(...)])