vLLM
vLLM
is a high-performance inference engine for LLMs that supports OpenAI-compatible APIs. deepeval
can connect to a running vLLM
server for running local evaluations.
Command Line
- Launch your
vLLM
server and ensure it’s exposing the OpenAI-compatible API. The typical base URL for a local vLLM server is:http://localhost:8000/v1/
. - Then run the following command to configure
deepeval
:
deepeval set-local-model --model-name=<model_name> \
--base-url="http://localhost:8000/v1/" \
--api-key=<api-key>
tip
You can use any value for --api-key
if authentication is not enforced.
Reverting to OpenAI
To disable the local model and return to OpenAI:
deepeval unset-local-model
info
For advanced setup or deployment options (e.g. multi-GPU, HuggingFace models), see the vLLM documentation.