Grok
DeepEval allows you to run evals with Grok models via xAI’s SDK, either through the CLI or directly in Python. DeepEval currently validates model names against a supported list—see Available Grok Models.
Command Line
To configure Grok through the CLI, run the following command:
deepeval set-grok --model grok-4.1 \
--temperature=0The CLI command above sets the specified Grok model as the default llm-judge for all metrics, unless overridden in Python code. To use a different default model provider, you must first unset Grok:
deepeval unset-grokPython
Alternatively, you can specify your model directly in code using GrokModel from DeepEval's model collection.
from deepeval.models import GrokModel
from deepeval.metrics import AnswerRelevancyMetric
model = GrokModel(
model="grok-4.1",
api_key="your-api-key",
temperature=0
)
answer_relevancy = AnswerRelevancyMetric(model=model)To use any Grok model directly in deepeval, set the USE_GROK_MODEL=1 in your env and simply pass the name of your desired model in your metric initialization:
from deepeval.metrics import AnswerRelevancyMetric
answer_relevancy = AnswerRelevancyMetric(
model="grok-4.1",
)You should also set the other necessary vars like GROK_API_KEY to be able to use the Grok models as shown above.
There are ZERO mandatory and SIX optional parameters when creating a GrokModel:
- [Optional]
model: A string specifying the name of the Grok model to use. Defaults toGROK_MODEL_NAMEif not passed; raises an error at runtime if unset. - [Optional]
api_key: A string specifying your Grok API key for authentication. Defaults toGROK_API_KEYif not passed; raises an error at runtime if unset. - [Optional]
temperature: A float specifying the model temperature. Defaults toTEMPERATUREif not passed; falls back to0.0if unset. - [Optional]
cost_per_input_token: A float specifying the cost for each input token for the provided model. Defaults toGROK_COST_PER_INPUT_TOKENif available indeepeval's model cost registry, elseNone. - [Optional]
cost_per_output_token: A float specifying the cost for each output token for the provided model. Defaults toGROK_COST_PER_OUTPUT_TOKENif available indeepeval's model cost registry, elseNone. - [Optional]
generation_kwargs: A dictionary of additional generation parameters forwarded to the xAI SDKclient.chat.create(...)call.
Available Grok Models
Below is the comprehensive list of available Grok models in DeepEval:
grok-4.1grok-4grok-4-heavygrok-4-fastgrok-betagrok-3grok-2grok-2-minigrok-code-fast-1