Skip to main content

vLLM

vLLM is a high-performance inference engine for LLMs that supports OpenAI-compatible APIs. deepeval can connect to a running vLLM server for running local evaluations.

Command Line

  1. Launch your vLLM server and ensure it’s exposing the OpenAI-compatible API. The typical base URL for a local vLLM server is: http://localhost:8000/v1/.
  2. Then run the following command to configure deepeval:
deepeval set-local-model --model-name=<model_name> \
--base-url="http://localhost:8000/v1/" \
--api-key=<api-key>
tip

You can use any value for --api-key if authentication is not enforced.

Persisting settings

You can persist CLI settings with the optional --save flag. See Flags and Configs -> Persisting CLI settings.

Reverting to OpenAI

To disable the local model and return to OpenAI:

deepeval unset-local-model
info

For advanced setup or deployment options (e.g. multi-GPU, HuggingFace models), see the vLLM documentation.

Confident AI
Try DeepEval on Confident AI for FREE
View and save evaluation results, curate datasets and manage annotations, monitor online performance, trace for AI observability, and auto-optimize prompts.
Try it for Free