Introduction to Synthetic Data Generation

Synthetic data generation helps you bootstrap evaluation datasets when you do not yet have enough representative examples, but it should complement—not replace—real data.

Recommended Priority

The best evaluation datasets are grounded in real product behavior. We recommend choosing data sources in this order:

Use a reasonably curated dataset. Start with human-reviewed examples when you have them, especially examples that reflect important user journeys, failures, and edge cases.
Use production traffic. If you do not have a curated dataset, sample real conversations or requests from production, then review and clean them before using them for evals.
Use synthetic data. If you do not have enough curated or production data, generate synthetic examples to create initial coverage and uncover obvious regressions.

Synthetic data is most useful when it gives you a starting point faster. For high-stakes workflows, you should still review, edit, and enrich generated examples before treating them as ground truth.

Best Practices On Synthetic Data Quality

Not all synthetic data is equally reliable. Prefer grounded and reviewed sources before fully open-ended generation:

Generate from documents. This is the strongest default because generated goldens are grounded in your knowledge base.
Generate from existing goldens. This works well when the seed goldens are already reasonably curated and human-reviewed.
Generate from scratch. This is the least grounded option, and is not recommended unless the use case is simple or you only need rough initial coverage.

What You Can Synthesize

deepeval supports two related synthetic-data workflows:

Generate goldens: Use the Golden Synthesizer to create single-turn or conversational goldens for your evaluation dataset.
Simulate turns: Use the Conversation Simulator to generate realistic back-and-forth turns between a simulated user and your chatbot.

Generate Goldens

Goldens define what you want to test. They can be single-turn examples for regular LLM interactions, or conversational goldens that define a multi-turn scenario and expected outcome.

from deepeval.synthesizer import Synthesizer

synthesizer = Synthesizer()
goldens = synthesizer.generate_goldens_from_docs(
    document_paths=["support_docs.md"],
    include_expected_output=True,
)

For multi-turn use cases, generate conversational goldens instead:

from deepeval.synthesizer import Synthesizer

synthesizer = Synthesizer()
conversational_goldens = synthesizer.generate_conversational_goldens_from_docs(
    document_paths=["support_docs.md"],
    include_expected_outcome=True,
)

Learn more in the Golden Synthesizer docs.

Turn simulation is only for multi-turn use cases. It follows golden generation: first create conversational goldens with a scenario and expected outcome, then use the Conversation Simulator to produce the actual back-and-forth turns.

from deepeval.simulator import ConversationSimulator

simulator = ConversationSimulator(model_callback=model_callback)
test_cases = simulator.simulate(
    conversational_goldens=conversational_goldens,
    max_user_simulations=10,
)

Learn more in the Conversation Simulator docs.

For single-turn use cases, generated goldens may be enough. For multi-turn use cases, you typically need both: use the Golden Synthesizer to define the scenario and expected outcome, then use the Conversation Simulator to generate the actual turns for evaluation.

Recommended Priority

Best Practices On Synthetic Data Quality

What You Can Synthesize

Generate Goldens

Simulate Turns

Next Steps

Golden Synthesizer

Conversation Simulator

On this page