Set Up
Installing DeepEval
DeepEval is a powerful LLM evaluation framework. Here's how you can easily get started by installing and running your first evaluation using DeepEval.
Start by installing DeepEval using pip:
pip install -U deepeval
Write your first test
Let's evaluate the correctness of an LLM output using GEval
, a powerful metric based on LLM-as-a-judge evaluation.
Your test file must be named with a test_
prefix (like test_app.py
) for DeepEval to recognize and run it.
from deepeval import evaluate
from deepeval.test_case import LLMTestCase, LLMTestCaseParams
from deepeval.metrics import GEval
correctness_metric = GEval(
name="Correctness",
criteria="Determine if the 'actual output' is correct based on the 'expected output'.",
evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
threshold=0.5
)
test_case = LLMTestCase(
input="I have a persistent cough and fever. Should I be worried?",
# Replace this with the actual output from your LLM application
actual_output="A persistent cough and fever could signal various illnesses, from minor infections to more serious conditions like pneumonia or COVID-19. It's advisable to seek medical attention if symptoms worsen, persist beyond a few days, or if you experience difficulty breathing, chest pain, or other concerning signs.",
expected_output="A persistent cough and fever could indicate a range of illnesses, from a mild viral infection to more serious conditions like pneumonia or COVID-19. You should seek medical attention if your symptoms worsen, persist for more than a few days, or are accompanied by difficulty breathing, chest pain, or other concerning signs."
)
evaluate([test_case], [correctness_metric])
To run your first evaluation, enter the following command in your terminal:
deepeval test run test_app.py
DeepEval's powerful LLM-as-a-judge metrics (like GEval
used in this example) rely on an underlying LLM called the Evaluation Model to perform evaluations. By default, DeepEval uses OpenAI's models for this purpose.
So you'll have to set your OPENAI_API_KEY
as an environment variable as shown below.
export OPENAI_API_KEY="your_api_key"
To use ANY custom LLM of your choice, Check out our docs on custom evaluation models.
Congratulations! You've successfully run your first LLM evaluation with DeepEval.
Setting Up Confident AI
While DeepEval works great standalone, you can connect it to Confident AI — our cloud platform for dashboards, logging, collaboration, and more — built for LLM evaluation. Best of all, it's free to get started! (No credit card required.)
You can sign up here, or you can run the following command
deepeval login
Navigate to your Settings page and copy your Confident AI API Key from the Project API Key box. If you used the deepeval login
command to log in, you'll be prompted to paste your Confident AI API Key after creating an account.
Alternatively, if you already have an account, you can log in directly using Python:
deepeval.login_with_confident_api_key("your-confident-api-key")
Or through the CLI:
deepeval login --confident-api-key "your-confident-api-key"
You're all set! You can now evaluate LLMs locally and monitor them in Confident AI.