πŸ”₯ DeepEval 4.0 just got released. Read the announcement.

CLI Settings

Quick Summary

deepeval provides a CLI for managing common tasks directly from the terminal. You can use it for:

  • Logging in/out and viewing test runs
  • Running evaluations from test files
  • Inspecting saved test runs in a terminal TUI (deepeval inspect)
  • Generating synthetic goldens from docs, contexts, scratch, or existing goldens
  • Enabling/disabling debug
  • Selecting an LLM/embeddings provider (OpenAI, Azure OpenAI, Gemini, Grok, DeepSeek, LiteLLM, local/Ollama)
  • Setting/unsetting provider-specific options (model, endpoint, deployment, etc.)
  • Listing and updating any deepeval setting (deepeval settings -l, deepeval settings --set KEY=VALUE)
  • Saving settings and secrets persistently to .env files

Install & Update

pip install -U deepeval

To review available commands consult the CLI built in help:

deepeval --help

Read & Write Settings

deepeval reads settings from dotenv files in the current working directory (or ENV_DIR_PATH=/path/to/project), without overriding existing process environment variables. Dotenv precedence (lowest β†’ highest) is: .env β†’ .env.<APP_ENV> β†’ .env.local.

deepeval also uses a legacy JSON keystore at .deepeval/.deepeval for non-secret keys. This keystore is treated as a fallback (dotenv/process env take precedence). Secrets are never written to the JSON keystore.

Core Commands

generate

Use deepeval generate to generate synthetic goldens from the terminal with the Golden Synthesizer. The command requires two selectors:

  • --method: where goldens come from: docs, contexts, scratch, or goldens
  • --variation: what to generate: single-turn or multi-turn

Generate single-turn goldens from documents:

deepeval generate \
  --method docs \
  --variation single-turn \
  --documents example.txt \
  --documents another.pdf \
  --output-dir ./synthetic_data

Generate multi-turn goldens from scratch:

deepeval generate \
  --method scratch \
  --variation multi-turn \
  --num-goldens 25 \
  --scenario-context "Users asking support questions" \
  --conversational-task "Help users solve product issues" \
  --participant-roles "User and assistant"

Common options:

OptionDescription
--method docs|contexts|scratch|goldensSelect the generation method.
--variation single-turn|multi-turnSelect whether to generate Goldens or ConversationalGoldens.
--output-dirDirectory where generated goldens are saved. Defaults to ./synthetic_data.
--file-type json|csv|jsonlOutput file type. Defaults to json.
--file-nameOptional output filename without extension.
--modelModel to use for generation.
--async-mode / --sync-modeEnable or disable concurrent generation.
--max-concurrentMaximum number of concurrent generation tasks.
--include-expected / --no-include-expectedGenerate or skip expected outputs/outcomes.
--cost-trackingPrint generation cost when supported by the model.

Method-specific options:

MethodRequired OptionsUseful Optional Options
docs--documents--max-goldens-per-context, --max-contexts-per-document, --min-contexts-per-document, --chunk-size, --chunk-overlap, --context-quality-threshold, --context-similarity-threshold, --max-retries
contexts--contexts-file--max-goldens-per-context
scratch--num-goldens plus styling optionsSingle-turn: --scenario, --task, --input-format, --expected-output-format. Multi-turn: --scenario-context, --conversational-task, --participant-roles, --scenario-format, --expected-outcome-format
goldens--goldens-file--max-goldens-per-golden

For a deeper walkthrough, see the Golden Synthesizer docs.

test

Use deepeval test run to run evaluation test files through pytest with the deepeval pytest plugin enabled.

deepeval test --help
deepeval test run --help

Run a single test file:

deepeval test run test_chatbot.py

Run a test directory:

deepeval test run tests/evals

Run a specific test:

deepeval test run test_chatbot.py::test_answer_relevancy

Useful options:

OptionDescription
--verbose, -vShow verbose pytest output and turn on deepeval verbose mode.
--exit-on-first-failure, -xStop after the first failed test.
--show-warnings, -wShow pytest warnings instead of disabling them.
--identifier, -idAttach an identifier to the test run.
--num-processes, -nRun tests with multiple pytest-xdist processes.
--repeat, -rRerun each test case the specified number of times.
--use-cache, -cUse cached evaluation results when --repeat is not set.
--ignore-errors, -iContinue when deepeval evaluation errors occur.
--skip-on-missing-params, -sSkip test cases with missing metric parameters.
--display, -dControl final result display. Defaults to showing all results.
--mark, -mRun tests matching a pytest marker expression.

You can pass additional pytest flags after the deepeval options. For example:

deepeval test run tests/evals \
  --mark "not slow" \
  --exit-on-first-failure \
  -- --tb=short

inspect

Use deepeval inspect to open a saved test run inside a terminal TUI β€” a trace-tree viewer for metric scores, reasons, inputs/outputs, tool calls, and retriever context, all without leaving the terminal.

The TUI is a trace viewer, so it's only useful for runs produced by evals_iterator() with an instrumented agent (see single-turn end-to-end evals). Each such call writes a rolling snapshot to .deepeval/.latest_run_full.json (gitignored alongside the rest of the hidden cache dir), so the zero-arg form picks up the most recent run automatically:

deepeval inspect

You can also point it at a specific file or folder:

deepeval inspect ./experiments/test_run_20260512_174200.json
deepeval inspect ./experiments          # latest test_run_*.json inside
deepeval inspect --folder ./experiments # same, via explicit flag

This allows you to inspect traces and spans locally on your machine:

deepeval inspect TUI showing a trace tree with per-span scores and metric reasons

Resolution order when no path is passed: --folder β†’ DEEPEVAL_RESULTS_FOLDER β†’ .deepeval/.latest_run_full.json β†’ ./experiments (legacy fallback).

The TUI needs an optional extras bundle (Textual + clipboard support):

pip install 'deepeval[inspect]'

Confident AI Commands

Use these commands to connect deepeval to Confident AI (deepeval Cloud) so your local evaluations can be uploaded, organized, and viewed as rich test run reports on the cloud. If you don’t have an account yet, sign up here.

login & logout

  • deepeval login [--confident-api-key ...] [--save=dotenv[:path]]: Log in to Confident AI by saving your CONFIDENT_API_KEY. Once logged in, deepeval can automatically upload test runs so you can browse results, share reports, and track evaluation performance over time on Confident AI.
  • deepeval logout [--save=dotenv[:path]]: Remove your Confident AI credentials from local persistence (JSON keystore and the chosen dotenv file).

view

  • deepeval view: Opens the latest test run on Confident AI in your browser. If needed, it uploads the cached run artifacts first.

Persistence & Secrets

All set-* / unset-* commands follow the same rules:

  • Non-secrets (model name, endpoint, deployment, etc.) may be mirrored into .deepeval/.deepeval.
  • Secrets (API keys) are never written to .deepeval/.deepeval.
  • Pass --save=dotenv[:path] to write settings (including secrets) to a dotenv file (default: .env.local).
  • If --save is omitted, deepeval will use DEEPEVAL_DEFAULT_SAVE if set; otherwise it won’t write a dotenv file (some commands like login still default to .env.local).
  • Unsetting one provider only removes that provider’s keys. If other provider credentials remain (e.g. OPENAI_API_KEY), they may still be selected by default.

To set the model and token cost for Anthropic you would run:

deepeval set-anthropic -m claude-3-7-sonnet-latest -i 0.000003 -o 0.000015 --save=dotenv
Saved environment variables to .env.local (ensure it's git-ignored).
πŸ™Œ Congratulations! You're now using Anthropic `claude-3-7-sonnet-latest` for all evals that require an LLM.

To view your settings for Anthropic you would run:

deepeval settings -l anthropic
                                                                                Settings
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name                            ┃ Value                    ┃ Description                                                                                      ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ ANTHROPIC_API_KEY               β”‚ ********                 β”‚ Anthropic API key.                                                                               β”‚
β”‚ ANTHROPIC_COST_PER_INPUT_TOKEN  β”‚ 3e-06                    β”‚ Anthropic input token cost (used for cost reporting).                                            β”‚
β”‚ ANTHROPIC_COST_PER_OUTPUT_TOKEN β”‚ 1.5e-05                  β”‚ Anthropic output token cost (used for cost reporting).                                           β”‚
β”‚ ANTHROPIC_MODEL_NAME            β”‚ claude-3-7-sonnet-latest β”‚ Anthropic model name (e.g. 'claude-3-...').                                                      β”‚
β”‚ USE_ANTHROPIC_MODEL             β”‚ True                     β”‚ Select Anthropic as the active LLM provider (USE_* flags are mutually exclusive in CLI helpers). β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Debug Controls

Use these to turn on structured logs, gRPC wire tracing, and Confident tracing (all optional).

deepeval set-debug \
  --log-level DEBUG \
  --debug-async \
  --retry-before-level INFO \
  --retry-after-level ERROR \
  --grpc --grpc-verbosity DEBUG --grpc-trace list_tracers \
  --trace-verbose --trace-env staging --trace-flush \
  --save=dotenv
  • Immediate effect in the current process
  • Optional persistence via --save=dotenv[:path]
  • No-op guard: If nothing would change, you’ll see No changes to save … (and nothing is written).

To restore defaults and clean persisted values:

deepeval unset-debug --save=dotenv

Model Provider Configs

All provider commands come in pairs:

  • deepeval set-<provider> [provider-specific flags] [--save=dotenv[:path]] [--quiet]
  • deepeval unset-<provider> [--save=dotenv[:path]] [--quiet]

This switches the active provider:

  • It sets USE_<PROVIDER>_MODEL = True for the chosen provider, and
  • Turns all other USE_* flags off so that only one provider is enabled at a time.

When you set a provider, the CLI enables that provider’s USE_<PROVIDER>_MODEL flag and disables all other USE_* flags. When you unset a provider, it disables only that provider’s USE_* flag and leaves all others untouched. If you manually set env vars (or edit dotenv files) it’s possible to end up with multiple USE_* flags enabled.

Full model list

Provider (LLM)SetUnset
OpenAIset-openaiunset-openai
Azure OpenAIset-azure-openaiunset-azure-openai
Anthropicset-anthropicunset-anthropic
AWS Bedrockset-bedrockunset-bedrock
Ollama (local)set-ollamaunset-ollama
Local HTTP modelset-local-modelunset-local-model
Grokset-grokunset-grok
Moonshot (Kimi)set-moonshotunset-moonshot
DeepSeekset-deepseekunset-deepseek
Geminiset-geminiunset-gemini
LiteLLMset-litellmunset-litellm
Portkeyset-portkeyunset-portkey

Embeddings:

Provider (Embeddings)SetUnset
Azure OpenAIset-azure-openai-embeddingunset-azure-openai-embedding
Local (HTTP)set-local-embeddingsunset-local-embeddings
Ollamaset-ollama-embeddingsunset-ollama-embeddings

Common Issues

  • Nothing printed? For set-* / unset-* / set-debug, a clean exit with no output often means you are passing the --quiet / -q flag.
  • Provider still active after unsetting? Unsetting turns off target provider USE_* flags; if a provider remains enabled and properly configured it will become the active provider. If no provider is enabled, but OpenAI credentials are present, OpenAI may be used as a fallback. To force a provider, run the corresponding set-<provider> command.
  • Dotenv edits not picked up? deepeval loads dotenv files from the current working directory by default, or ENV_DIR_PATH if set. Ensure your Python process runs in that context.

If you’re still stuck, the dedicated Troubleshooting page covers deeper debugging (TLS errors, logging, timeouts, dotenv loading, and config caching).

On this page