Skip to main content

Set Up

Installing DeepEval

DeepEval is a powerful LLM evaluation framework. Here's how you can easily get started by installing and running your first evaluation using DeepEval.

Start by installing DeepEval using pip:

pip install -U deepeval

Write your first test

Let's evaluate the correctness of an LLM output using GEval, a powerful metric based on LLM-as-a-judge evaluation.

note

Your test file must be named with a test_ prefix (like test_app.py) for DeepEval to recognize and run it.

test_app.py
from deepeval import evaluate
from deepeval.test_case import LLMTestCase, LLMTestCaseParams
from deepeval.metrics import GEval

correctness_metric = GEval(
name="Correctness",
criteria="Determine if the 'actual output' is correct based on the 'expected output'.",
evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
threshold=0.5
)

test_case = LLMTestCase(
input="I have a persistent cough and fever. Should I be worried?",
# Replace this with the actual output from your LLM application
actual_output="A persistent cough and fever could signal various illnesses, from minor infections to more serious conditions like pneumonia or COVID-19. It's advisable to seek medical attention if symptoms worsen, persist beyond a few days, or if you experience difficulty breathing, chest pain, or other concerning signs.",
expected_output="A persistent cough and fever could indicate a range of illnesses, from a mild viral infection to more serious conditions like pneumonia or COVID-19. You should seek medical attention if your symptoms worsen, persist for more than a few days, or are accompanied by difficulty breathing, chest pain, or other concerning signs."
)

evaluate([test_case], [correctness_metric])

To run your first evaluation, enter the following command in your terminal:

deepeval test run test_app.py
note

DeepEval's powerful LLM-as-a-judge metrics (like GEval used in this example) rely on an underlying LLM called the Evaluation Model to perform evaluations. By default, DeepEval uses OpenAI's models for this purpose.

So you'll have to set your OPENAI_API_KEY as an environment variable as shown below.

export OPENAI_API_KEY="your_api_key"

To use ANY custom LLM of your choice, Check out our docs on custom evaluation models.

Congratulations! You've successfully run your first LLM evaluation with DeepEval.

Setting Up Confident AI

While DeepEval works great standalone, you can connect it to Confident AI — our cloud platform for dashboards, logging, collaboration, and more — built for LLM evaluation. Best of all, it's free to get started! (No credit card required.)

You can sign up here, or you can run the following command

deepeval login

Navigate to your Settings page and copy your Confident AI API Key from the Project API Key box. If you used the deepeval login command to log in, you'll be prompted to paste your Confident AI API Key after creating an account.

Alternatively, if you already have an account, you can log in directly using Python:

main.py
deepeval.login_with_confident_api_key("your-confident-api-key")

Or through the CLI:

deepeval login --confident-api-key "your-confident-api-key"

You're all set! You can now evaluate LLMs locally and monitor them in Confident AI.