Skip to main content

vLLM

vLLM is a high-performance inference engine for LLMs that supports OpenAI-compatible APIs. deepeval can connect to a running vLLM server for running local evaluations.

Command Line

  1. Launch your vLLM server and ensure it’s exposing the OpenAI-compatible API. The typical base URL for a local vLLM server is: http://localhost:8000/v1/.
  2. Then run the following command to configure deepeval:
deepeval set-local-model --model-name=<model_name> \
--base-url="http://localhost:8000/v1/" \
--api-key=<api-key>
tip

You can use any value for --api-key if authentication is not enforced.

Reverting to OpenAI

To disable the local model and return to OpenAI:

deepeval unset-local-model
info

For advanced setup or deployment options (e.g. multi-GPU, HuggingFace models), see the vLLM documentation.