Evaluation Models
vLLM
vLLM is a high-performance inference engine for LLMs that supports OpenAI-compatible APIs. deepeval can connect to a running vLLM server for running local evaluations.
Command Line
- Launch your
vLLMserver and ensure it’s exposing the OpenAI-compatible API. The typical base URL for a local vLLM server is:http://localhost:8000/v1/. - Then run the following command to configure
deepeval:
deepeval set-local-model \
--model=<model_name> \
--base-url="http://localhost:8000/v1/"Python
Alternatively, you can define LocalModel directly in Python code:
from deepeval.models import LocalModel
from deepeval.metrics import AnswerRelevancyMetric
model = LocalModel(
model="<model_name>",
base_url="http://localhost:8000/v1/",
api_key="vllm", # any placeholder works if your server has no auth
temperature=0
)
answer_relevancy = AnswerRelevancyMetric(model=model)To use a local model directly in deepeval, set USE_LOCAL_MODEL=1 in your env and simply pass the name of your desired model in your metric initialization:
from deepeval.metrics import AnswerRelevancyMetric
answer_relevancy = AnswerRelevancyMetric(
model="<model_name>",
)You should also set the other necessary vars like LOCAL_MODEL_BASE_URL and LOCAL_MODEL_API_KEY to be able to use your local model as shown above.
There are ZERO mandatory and SIX optional parameters when creating a LocalModel:
- [Optional]
model: A string specifying the local model to use. Defaults toLOCAL_MODEL_NAMEif not passed; raises an error at runtime if unset. - [Optional]
api_key: A string specifying the API key for your local server. Defaults toLOCAL_MODEL_API_KEYif not passed; raises an error at runtime if unset. Local servers without authentication accept any placeholder string. - [Optional]
base_url: A string specifying the base URL of your local server. Defaults toLOCAL_MODEL_BASE_URLif not passed. - [Optional]
temperature: A float specifying the model temperature. Defaults toTEMPERATUREif not passed; falls back to0.0if unset. - [Optional]
format: A string specifying the structured-output response format. Defaults toLOCAL_MODEL_FORMATif not passed; falls back to"json"if unset. - [Optional]
generation_kwargs: A dictionary of additional generation parameters forwarded to the local server'schat.completions.create(...)call.
Reverting to OpenAI
To disable the local model and return to OpenAI:
deepeval unset-local-model