Introduction to Synthetic Data Generation
Synthetic data generation helps you bootstrap evaluation datasets when you do not yet have enough representative examples, but it should complement—not replace—real data.
Recommended Priority
The best evaluation datasets are grounded in real product behavior. We recommend choosing data sources in this order:
- Use a reasonably curated dataset. Start with human-reviewed examples when you have them, especially examples that reflect important user journeys, failures, and edge cases.
- Use production traffic. If you do not have a curated dataset, sample real conversations or requests from production, then review and clean them before using them for evals.
- Use synthetic data. If you do not have enough curated or production data, generate synthetic examples to create initial coverage and uncover obvious regressions.
Synthetic data is most useful when it gives you a starting point faster. For high-stakes workflows, you should still review, edit, and enrich generated examples before treating them as ground truth.
Best Practices On Synthetic Data Quality
Not all synthetic data is equally reliable. Prefer grounded and reviewed sources before fully open-ended generation:
- Generate from documents. This is the strongest default because generated goldens are grounded in your knowledge base.
- Generate from existing goldens. This works well when the seed goldens are already reasonably curated and human-reviewed.
- Generate from scratch. This is the least grounded option, and is not recommended unless the use case is simple or you only need rough initial coverage.
What You Can Synthesize
deepeval supports two related synthetic-data workflows:
- Generate goldens: Use the Golden Synthesizer to create single-turn or conversational goldens for your evaluation dataset.
- Simulate turns: Use the Conversation Simulator to generate realistic back-and-forth turns between a simulated user and your chatbot.
Generate Goldens
Goldens define what you want to test. They can be single-turn examples for regular LLM interactions, or conversational goldens that define a multi-turn scenario and expected outcome.
from deepeval.synthesizer import Synthesizer
synthesizer = Synthesizer()
goldens = synthesizer.generate_goldens_from_docs(
document_paths=["support_docs.md"],
include_expected_output=True,
)For multi-turn use cases, generate conversational goldens instead:
from deepeval.synthesizer import Synthesizer
synthesizer = Synthesizer()
conversational_goldens = synthesizer.generate_conversational_goldens_from_docs(
document_paths=["support_docs.md"],
include_expected_outcome=True,
)Learn more in the Golden Synthesizer docs.
Simulate Turns
Turn simulation is only for multi-turn use cases. It follows golden generation: first create conversational goldens with a scenario and expected outcome, then use the Conversation Simulator to produce the actual back-and-forth turns.
from deepeval.simulator import ConversationSimulator
simulator = ConversationSimulator(model_callback=model_callback)
test_cases = simulator.simulate(
conversational_goldens=conversational_goldens,
max_user_simulations=10,
)Learn more in the Conversation Simulator docs.
For single-turn use cases, generated goldens may be enough. For multi-turn use cases, you typically need both: use the Golden Synthesizer to define the scenario and expected outcome, then use the Conversation Simulator to generate the actual turns for evaluation.
Next Steps
Start with goldens to define what should be tested, then add turn simulation when you need realistic multi-turn conversations.