🔥 DeepEval 4.0 just got released. Read the announcement.
Evaluation Models

Grok

DeepEval allows you to run evals with Grok models via xAI’s SDK, either through the CLI or directly in Python. DeepEval currently validates model names against a supported list—see Available Grok Models.

Command Line

To configure Grok through the CLI, run the following command:

deepeval set-grok --model grok-4.1 \
    --temperature=0

The CLI command above sets the specified Grok model as the default llm-judge for all metrics, unless overridden in Python code. To use a different default model provider, you must first unset Grok:

deepeval unset-grok

Python

Alternatively, you can specify your model directly in code using GrokModel from DeepEval's model collection.

from deepeval.models import GrokModel
from deepeval.metrics import AnswerRelevancyMetric

model = GrokModel(
    model="grok-4.1",
    api_key="your-api-key",
    temperature=0
)

answer_relevancy = AnswerRelevancyMetric(model=model)

To use any Grok model directly in deepeval, set the USE_GROK_MODEL=1 in your env and simply pass the name of your desired model in your metric initialization:

from deepeval.metrics import AnswerRelevancyMetric

answer_relevancy = AnswerRelevancyMetric(
    model="grok-4.1",
)

You should also set the other necessary vars like GROK_API_KEY to be able to use the Grok models as shown above.

There are ZERO mandatory and SIX optional parameters when creating a GrokModel:

  • [Optional] model: A string specifying the name of the Grok model to use. Defaults to GROK_MODEL_NAME if not passed; raises an error at runtime if unset.
  • [Optional] api_key: A string specifying your Grok API key for authentication. Defaults to GROK_API_KEY if not passed; raises an error at runtime if unset.
  • [Optional] temperature: A float specifying the model temperature. Defaults to TEMPERATURE if not passed; falls back to 0.0 if unset.
  • [Optional] cost_per_input_token: A float specifying the cost for each input token for the provided model. Defaults to GROK_COST_PER_INPUT_TOKEN if available in deepeval's model cost registry, else None.
  • [Optional] cost_per_output_token: A float specifying the cost for each output token for the provided model. Defaults to GROK_COST_PER_OUTPUT_TOKEN if available in deepeval's model cost registry, else None.
  • [Optional] generation_kwargs: A dictionary of additional generation parameters forwarded to the xAI SDK client.chat.create(...) call.

Available Grok Models

Below is the comprehensive list of available Grok models in DeepEval:

  • grok-4.1
  • grok-4
  • grok-4-heavy
  • grok-4-fast
  • grok-beta
  • grok-3
  • grok-2
  • grok-2-mini
  • grok-code-fast-1

On this page