LiteLLM

DeepEval allows you to use any model supported by LiteLLM to run evals, either through the CLI or directly in Python.

note

Before getting started, make sure you have LiteLLM installed. It will not be installed automatically with DeepEval, you need to install it separately:

pip install litellm

Command Line

To configure your LiteLLM model through the CLI, run the following command. You must specify the provider in the model name:

# OpenAI
deepeval set-litellm openai/gpt-3.5-turbo

# Anthropic
deepeval set-litellm anthropic/claude-3-opus

# Google
deepeval set-litellm google/gemini-pro

You can also specify additional parameters:

# With API key
deepeval set-litellm openai/gpt-3.5-turbo --api-key="your-api-key"

# With custom API base
deepeval set-litellm openai/gpt-3.5-turbo --api-base="https://your-custom-endpoint.com"

# With both API key and custom base
deepeval set-litellm openai/gpt-3.5-turbo \
    --api-key="your-api-key" \
    --api-base="https://your-custom-endpoint.com"

info

The CLI command above sets LiteLLM as the default provider for all metrics, unless overridden in Python code. To use a different default model provider, you must first unset LiteLLM:

deepeval unset-litellm

Python

When using LiteLLM in Python, you must always specify the provider in the model name. Here's how to use LiteLLMModel from DeepEval's model collection:

from deepeval.models import LiteLLMModel
from deepeval.metrics import AnswerRelevancyMetric

# OpenAI model
model = LiteLLMModel(
    model="openai/gpt-3.5-turbo",  # Provider must be specified
    api_key="your-api-key",  # optional, can be set via environment variable
    api_base="your-api-base",  # optional, for custom endpoints
    temperature=0
)

answer_relevancy = AnswerRelevancyMetric(model=model)

The LiteLLMModel class accepts the following parameters:

model (required): A string specifying the provider and model name (e.g., "openai/gpt-3.5-turbo", "anthropic/claude-3-opus")
api_key (optional): A string specifying the API key for the model
api_base (optional): A string specifying the base URL for the model API
temperature (optional): A float specifying the model temperature. Defaults to 0

Environment Variables

You can also configure LiteLLM using environment variables:

# OpenAI
export OPENAI_API_KEY="your-api-key"

# Anthropic
export ANTHROPIC_API_KEY="your-api-key"

# Google
export GOOGLE_API_KEY="your-api-key"

# Custom endpoint
export LITELLM_API_BASE="https://your-custom-endpoint.com"

Available Models

note

This list only displays some of the available models. For a complete list of supported models and their capabilities, refer to the LiteLLM documentation.

OpenAI Models

openai/gpt-3.5-turbo
openai/gpt-4
openai/gpt-4-turbo-preview

Anthropic Models

anthropic/claude-3-opus
anthropic/claude-3-sonnet
anthropic/claude-3-haiku

Google Models

google/gemini-pro
google/gemini-ultra

Mistral Models

mistral/mistral-small
mistral/mistral-medium
mistral/mistral-large

LM Studio Models

lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF
lm-studio/Mistral-7B-Instruct-v0.2-GGUF
lm-studio/Phi-2-GGUF

Ollama Models

ollama/llama2
ollama/mistral
ollama/codellama
ollama/neural-chat
ollama/starling-lm

note

When using LM Studio, you need to specify the API base URL. By default, LM Studio runs on http://localhost:1234/v1.

When using Ollama, you need to specify the API base URL. By default, Ollama runs on http://localhost:11434/v1.

Examples

Basic Usage with Different Providers

from deepeval.models import LiteLLMModel
from deepeval.metrics import AnswerRelevancyMetric

# OpenAI
model = LiteLLMModel(model="openai/gpt-3.5-turbo")
metric = AnswerRelevancyMetric(model=model)

# Anthropic
model = LiteLLMModel(model="anthropic/claude-3-opus")
metric = AnswerRelevancyMetric(model=model)

# Google
model = LiteLLMModel(model="google/gemini-pro")
metric = AnswerRelevancyMetric(model=model)

# LM Studio
model = LiteLLMModel(
    model="lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF",
    api_base="http://localhost:1234/v1",  # LM Studio default URL
    api_key="lm-studio"  # LM Studio uses a fixed API key
)
metric = AnswerRelevancyMetric(model=model)

# Ollama
model = LiteLLMModel(
    model="ollama/llama2",
    api_base="http://localhost:11434/v1",  # Ollama default URL
    api_key="ollama"  # Ollama uses a fixed API key
)
metric = AnswerRelevancyMetric(model=model)

Using Custom Endpoint

model = LiteLLMModel(
    model="custom/your-model-name",  # Provider must be specified
    api_base="https://your-custom-endpoint.com",
    api_key="your-api-key"
)

Using with Schema Validation

from pydantic import BaseModel

class ResponseSchema(BaseModel):
    score: float
    reason: str

# OpenAI
model = LiteLLMModel(model="openai/gpt-3.5-turbo")
response, cost = model.generate(
    "Rate this answer: 'The capital of France is Paris'",
    schema=ResponseSchema
)

# LM Studio
model = LiteLLMModel(
    model="lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF",
    api_base="http://localhost:1234/v1",
    api_key="lm-studio"
)
response, cost = model.generate(
    "Rate this answer: 'The capital of France is Paris'",
    schema=ResponseSchema
)

# Ollama
model = LiteLLMModel(
    model="ollama/llama2",
    api_base="http://localhost:11434/v1",
    api_key="ollama"
)
response, cost = model.generate(
    "Rate this answer: 'The capital of France is Paris'",
    schema=ResponseSchema
)

Best Practices

Provider Specification: Always specify the provider in the model name (e.g., "openai/gpt-3.5-turbo", "anthropic/claude-3-opus", "lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF", "ollama/llama2")
API Key Security: Store your API keys in environment variables rather than hardcoding them in your scripts.
Model Selection: Choose the appropriate model based on your needs:
- For simple tasks: Use smaller models like openai/gpt-3.5-turbo
- For complex reasoning: Use larger models like openai/gpt-4 or anthropic/claude-3-opus
- For cost-sensitive applications: Use models like mistral/mistral-small or anthropic/claude-3-haiku
- For local development:
  - Use LM Studio models like lm-studio/Meta-Llama-3.1-8B-Instruct-GGUF
  - Use Ollama models like ollama/llama2 or ollama/mistral
Error Handling: Implement proper error handling for API rate limits and connection issues.
Cost Management: Monitor your usage and costs, especially when using larger models.
Local Model Setup:
- LM Studio:
  - Make sure LM Studio is running and the model is loaded
  - Use the correct API base URL (default: http://localhost:1234/v1)
  - Use the fixed API key "lm-studio"
  - Ensure the model is properly downloaded and loaded in LM Studio
- Ollama:
  - Make sure Ollama is running and the model is pulled
  - Use the correct API base URL (default: http://localhost:11434/v1)
  - Use the fixed API key "ollama"
  - Pull the model first using ollama pull llama2 (or your chosen model)
  - Ensure you have enough system resources for the model

Command Line​

Python​

Environment Variables​

Available Models​

OpenAI Models​

Anthropic Models​

Google Models​

Mistral Models​

LM Studio Models​

Ollama Models​

Examples​

Basic Usage with Different Providers​

Using Custom Endpoint​

Using with Schema Validation​

Best Practices​

Command Line

Python

Environment Variables

Available Models

OpenAI Models

Anthropic Models

Google Models

Mistral Models

LM Studio Models

Ollama Models

Examples

Basic Usage with Different Providers

Using Custom Endpoint

Using with Schema Validation

Best Practices