vLLM

vLLM is a high-performance inference engine for LLMs that supports OpenAI-compatible APIs. deepeval can connect to a running vLLM server for running local evaluations.

Command Line

Launch your vLLM server and ensure it’s exposing the OpenAI-compatible API. The typical base URL for a local vLLM server is: http://localhost:8000/v1/.
Then run the following command to configure deepeval:

deepeval set-local-model --model-name=<model_name> \
    --base-url="http://localhost:8000/v1/" \
    --api-key=<api-key>

tip

You can use any value for --api-key if authentication is not enforced.

Persisting settings

You can persist CLI settings with the optional --save flag. See Flags and Configs -> Persisting CLI settings.

Reverting to OpenAI

To disable the local model and return to OpenAI:

deepeval unset-local-model

info

For advanced setup or deployment options (e.g. multi-GPU, HuggingFace models), see the vLLM documentation.

Command Line​

Reverting to OpenAI​

Command Line

Reverting to OpenAI