LiteLLM
DeepEval allows you to use any model supported by LiteLLM to run evals, either through the CLI or directly in Python.
Before getting started, make sure you have LiteLLM installed. It will not be installed automatically with DeepEval, you need to install it separately:
pip install litellm
Command Line
To configure your LiteLLM model through the CLI, run the following command. You must specify the provider in the model name:
# OpenAI
deepeval set-litellm openai/gpt-3.5-turbo
# Anthropic
deepeval set-litellm anthropic/claude-3-opus
# Google
deepeval set-litellm google/gemini-pro
You can also specify additional parameters:
# With API key
deepeval set-litellm openai/gpt-3.5-turbo --api-key="your-api-key"
# With custom API base
deepeval set-litellm openai/gpt-3.5-turbo --api-base="https://your-custom-endpoint.com"
# With both API key and custom base
deepeval set-litellm openai/gpt-3.5-turbo \
--api-key="your-api-key" \
--api-base="https://your-custom-endpoint.com"
The CLI command above sets LiteLLM as the default provider for all metrics, unless overridden in Python code. To use a different default model provider, you must first unset LiteLLM:
deepeval unset-litellm
Python
When using LiteLLM in Python, you must always specify the provider in the model name. Here's how to use LiteLLMModel
from DeepEval's model collection:
from deepeval.models import LiteLLMModel
from deepeval.metrics import AnswerRelevancyMetric
# OpenAI model
model = LiteLLMModel(
model="openai/gpt-3.5-turbo", # Provider must be specified
api_key="your-api-key", # optional, can be set via environment variable
api_base="your-api-base", # optional, for custom endpoints
temperature=0
)
answer_relevancy = AnswerRelevancyMetric(model=model)
The LiteLLMModel
class accepts the following parameters:
model
(required): A string specifying the provider and model name (e.g., "openai/gpt-3.5-turbo", "anthropic/claude-3-opus")api_key
(optional): A string specifying the API key for the modelapi_base
(optional): A string specifying the base URL for the model APItemperature
(optional): A float specifying the model temperature. Defaults to 0
Environment Variables
You can also configure LiteLLM using environment variables:
# OpenAI
export OPENAI_API_KEY="your-api-key"
# Anthropic
export ANTHROPIC_API_KEY="your-api-key"
# Google
export GOOGLE_API_KEY="your-api-key"
# Custom endpoint
export LITELLM_API_BASE="https://your-custom-endpoint.com"
Available Models
This list only displays some of the available models. For a complete list of supported models and their capabilities, refer to the LiteLLM documentation.
OpenAI Models
openai/gpt-3.5-turbo
openai/gpt-4
openai/gpt-4-turbo-preview
Anthropic Models
anthropic/claude-3-opus
anthropic/claude-3-sonnet
anthropic/claude-3-haiku
Google Models
google/gemini-pro
google/gemini-ultra
Mistral Models
mistral/mistral-small
mistral/mistral-medium
mistral/mistral-large
LM Studio Models
lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF
lm-studio/Mistral-7B-Instruct-v0.2-GGUF
lm-studio/Phi-2-GGUF
Ollama Models
ollama/llama2
ollama/mistral
ollama/codellama
ollama/neural-chat
ollama/starling-lm
When using LM Studio, you need to specify the API base URL. By default, LM Studio runs on http://localhost:1234/v1
.
When using Ollama, you need to specify the API base URL. By default, Ollama runs on http://localhost:11434/v1
.
Examples
Basic Usage with Different Providers
from deepeval.models import LiteLLMModel
from deepeval.metrics import AnswerRelevancyMetric
# OpenAI
model = LiteLLMModel(model="openai/gpt-3.5-turbo")
metric = AnswerRelevancyMetric(model=model)
# Anthropic
model = LiteLLMModel(model="anthropic/claude-3-opus")
metric = AnswerRelevancyMetric(model=model)
# Google
model = LiteLLMModel(model="google/gemini-pro")
metric = AnswerRelevancyMetric(model=model)
# LM Studio
model = LiteLLMModel(
model="lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF",
api_base="http://localhost:1234/v1", # LM Studio default URL
api_key="lm-studio" # LM Studio uses a fixed API key
)
metric = AnswerRelevancyMetric(model=model)
# Ollama
model = LiteLLMModel(
model="ollama/llama2",
api_base="http://localhost:11434/v1", # Ollama default URL
api_key="ollama" # Ollama uses a fixed API key
)
metric = AnswerRelevancyMetric(model=model)
Using Custom Endpoint
model = LiteLLMModel(
model="custom/your-model-name", # Provider must be specified
api_base="https://your-custom-endpoint.com",
api_key="your-api-key"
)
Using with Schema Validation
from pydantic import BaseModel
class ResponseSchema(BaseModel):
score: float
reason: str
# OpenAI
model = LiteLLMModel(model="openai/gpt-3.5-turbo")
response, cost = model.generate(
"Rate this answer: 'The capital of France is Paris'",
schema=ResponseSchema
)
# LM Studio
model = LiteLLMModel(
model="lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF",
api_base="http://localhost:1234/v1",
api_key="lm-studio"
)
response, cost = model.generate(
"Rate this answer: 'The capital of France is Paris'",
schema=ResponseSchema
)
# Ollama
model = LiteLLMModel(
model="ollama/llama2",
api_base="http://localhost:11434/v1",
api_key="ollama"
)
response, cost = model.generate(
"Rate this answer: 'The capital of France is Paris'",
schema=ResponseSchema
)
Best Practices
Provider Specification: Always specify the provider in the model name (e.g., "openai/gpt-3.5-turbo", "anthropic/claude-3-opus", "lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF", "ollama/llama2")
API Key Security: Store your API keys in environment variables rather than hardcoding them in your scripts.
Model Selection: Choose the appropriate model based on your needs:
- For simple tasks: Use smaller models like
openai/gpt-3.5-turbo
- For complex reasoning: Use larger models like
openai/gpt-4
oranthropic/claude-3-opus
- For cost-sensitive applications: Use models like
mistral/mistral-small
oranthropic/claude-3-haiku
- For local development:
- Use LM Studio models like
lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF
- Use Ollama models like
ollama/llama2
orollama/mistral
- Use LM Studio models like
- For simple tasks: Use smaller models like
Error Handling: Implement proper error handling for API rate limits and connection issues.
Cost Management: Monitor your usage and costs, especially when using larger models.
Local Model Setup:
LM Studio:
- Make sure LM Studio is running and the model is loaded
- Use the correct API base URL (default:
http://localhost:1234/v1
) - Use the fixed API key "lm-studio"
- Ensure the model is properly downloaded and loaded in LM Studio
Ollama:
- Make sure Ollama is running and the model is pulled
- Use the correct API base URL (default:
http://localhost:11434/v1
) - Use the fixed API key "ollama"
- Pull the model first using
ollama pull llama2
(or your chosen model) - Ensure you have enough system resources for the model