Multi-Turn

Quick Summary

A multi-turn test case is a blueprint provided by deepeval to unit test a series of LLM interactions. A multi-turn test case in deepeval is represented by a ConversationalTestCase, and has TWO parameters:

turns
chatbot_role

note

deepeval makes the assumption that a multi-turn use case are mainly conversational chatbots. Agents on the other hand, should be evaluated via component-level evaluation instead, where each component in your agentic workflow is assessed individually.

Here's an example implementation of an ConversationalTestCase:

from deepeval.test_case import ConversationalTestCase, Turn

test_case = ConversationalTestCase(
    turns=[
        Turn(role="user", content="How are you doing?"),
        Turn(role="assistant", content="Why do you care?)
    ]
)

Conversational Test Case

While a single-turn test case represents an individual LLM system interaction, a ConversationalTestCase encapsulates a series of Turns that make up an LLM-based conversation. This is particular useful if you're looking to for example evaluate a conversation between a user and an LLM-based chatbot.

A ConversationalTestCase can only be evaluated using conversational metrics.

main.py
from deepeval.test_case import Turn, ConversationalTestCase

turns = [
    Turn(
        role="assistant",
        content="Why did the chicken cross the road?",
    ),
    Turn(
        role="user",
        content="Are you trying to be funny?",
    ),
]

test_case = ConversationalTestCase(turns=turns)

note

Similar to how the term 'test case' refers to an LLMTestCase if not explicitly specified, the term 'metrics' also refer to non-conversational metrics throughout deepeval.

Turns

The turns parameter is a list of Turns and is basically a list of messages/exchanges in a user-LLM conversation. If you're using ConversationalGEval, you might also want to supply different parameteres to a Turn. A Turn is made up of the following parameters:

class Turn:
    role: Literal["user", "assistant"]
    content: str
    user_id: Optional[str] = None
    retrieval_context: Optional[List[str]] = None
    tools_called: Optional[List[ToolCall]] = None
    additional_metadata: Optional[Dict] = None

info

You should only provide the retrieval_context and tools_called parameter if the role is "assistant".

The role parameter specifies whether a particular turn is by the "user" (end user) or "assistant" (LLM). This is similar to OpenAI's API.

Chatbot Role

The chatbot_role parameter is an optional parameter that specifies what role the chatbot is supposed to play. This is currently only required for the RoleAdherenceMetric, where it is particularly useful for a role-playing evaluation use case.

from deepeval.test_case import Turn, ConversationalTestCase

test_case = ConversationalTestCase(chatbot_role="A happy jolly wizard.", turns=[Turn(...)])

Quick Summary​

Conversational Test Case​

Turns​

Chatbot Role​

Quick Summary

Conversational Test Case

Turns

Chatbot Role