Multi-Turn
Quick Summary
A multi-turn test case is a blueprint provided by deepeval
to unit test a series of LLM interactions. A multi-turn test case in deepeval
is represented by a ConversationalTestCase
, and has TWO parameters:
turns
chatbot_role
deepeval
makes the assumption that a multi-turn use case are mainly conversational chatbots. Agents on the other hand, should be evaluated via component-level evaluation instead, where each component in your agentic workflow is assessed individually.
Here's an example implementation of an ConversationalTestCase
:
from deepeval.test_case import ConversationalTestCase, Turn
test_case = ConversationalTestCase(
turns=[
Turn(role="user", content="How are you doing?"),
Turn(role="assistant", content="Why do you care?)
]
)
Conversational Test Case
While a single-turn test case represents an individual LLM system interaction, a ConversationalTestCase
encapsulates a series of Turn
s that make up an LLM-based conversation. This is particular useful if you're looking to for example evaluate a conversation between a user and an LLM-based chatbot.
A ConversationalTestCase
can only be evaluated using conversational metrics.
from deepeval.test_case import Turn, ConversationalTestCase
turns = [
Turn(
role="assistant",
content="Why did the chicken cross the road?",
),
Turn(
role="user",
content="Are you trying to be funny?",
),
]
test_case = ConversationalTestCase(turns=turns)
Similar to how the term 'test case' refers to an LLMTestCase
if not explicitly specified, the term 'metrics' also refer to non-conversational metrics throughout deepeval
.
Turns
The turns
parameter is a list of Turn
s and is basically a list of messages/exchanges in a user-LLM conversation. If you're using ConversationalGEval
, you might also want to supply different parameteres to a Turn
. A Turn
is made up of the following parameters:
class Turn:
role: Literal["user", "assistant"]
content: str
user_id: Optional[str] = None
retrieval_context: Optional[List[str]] = None
tools_called: Optional[List[ToolCall]] = None
additional_metadata: Optional[Dict] = None
You should only provide the retrieval_context
and tools_called
parameter if the role
is "assistant"
.
The role
parameter specifies whether a particular turn is by the "user"
(end user) or "assistant"
(LLM). This is similar to OpenAI's API.
Chatbot Role
The chatbot_role
parameter is an optional parameter that specifies what role the chatbot is supposed to play. This is currently only required for the RoleAdherenceMetric
, where it is particularly useful for a role-playing evaluation use case.
from deepeval.test_case import Turn, ConversationalTestCase
test_case = ConversationalTestCase(chatbot_role="A happy jolly wizard.", turns=[Turn(...)])