🔥 DeepEval 4.0 just got released. Read the announcement.
Evaluation Models

vLLM

vLLM is a high-performance inference engine for LLMs that supports OpenAI-compatible APIs. deepeval can connect to a running vLLM server for running local evaluations.

Command Line

  1. Launch your vLLM server and ensure it’s exposing the OpenAI-compatible API. The typical base URL for a local vLLM server is: http://localhost:8000/v1/.
  2. Then run the following command to configure deepeval:
deepeval set-local-model \
    --model=<model_name> \
    --base-url="http://localhost:8000/v1/"

Python

Alternatively, you can define LocalModel directly in Python code:

from deepeval.models import LocalModel
from deepeval.metrics import AnswerRelevancyMetric

model = LocalModel(
    model="<model_name>",
    base_url="http://localhost:8000/v1/",
    api_key="vllm",  # any placeholder works if your server has no auth
    temperature=0
)

answer_relevancy = AnswerRelevancyMetric(model=model)

To use a local model directly in deepeval, set USE_LOCAL_MODEL=1 in your env and simply pass the name of your desired model in your metric initialization:

from deepeval.metrics import AnswerRelevancyMetric

answer_relevancy = AnswerRelevancyMetric(
    model="<model_name>",
)

You should also set the other necessary vars like LOCAL_MODEL_BASE_URL and LOCAL_MODEL_API_KEY to be able to use your local model as shown above.

There are ZERO mandatory and SIX optional parameters when creating a LocalModel:

  • [Optional] model: A string specifying the local model to use. Defaults to LOCAL_MODEL_NAME if not passed; raises an error at runtime if unset.
  • [Optional] api_key: A string specifying the API key for your local server. Defaults to LOCAL_MODEL_API_KEY if not passed; raises an error at runtime if unset. Local servers without authentication accept any placeholder string.
  • [Optional] base_url: A string specifying the base URL of your local server. Defaults to LOCAL_MODEL_BASE_URL if not passed.
  • [Optional] temperature: A float specifying the model temperature. Defaults to TEMPERATURE if not passed; falls back to 0.0 if unset.
  • [Optional] format: A string specifying the structured-output response format. Defaults to LOCAL_MODEL_FORMAT if not passed; falls back to "json" if unset.
  • [Optional] generation_kwargs: A dictionary of additional generation parameters forwarded to the local server's chat.completions.create(...) call.

Reverting to OpenAI

To disable the local model and return to OpenAI:

deepeval unset-local-model

On this page