Regression Testing LLM Systems in CI/CD
Regression testing ensures your LLM systems doesn't degrade in performance over time, and there is no better place to do it than in CI/CD environments. deepeval allows anyone to easily regression test outputs of LLM systems (which can be RAG pipelines, or even just an LLM itself) in the CLI through its deep integration with Pytest via the deepeval test run command.
Creating Your Test File
deepeval treats rows in an evaluation dataset as unit test cases, and a wide range of research backed LLM evaluation metrics, which you can define in a test_<name>.py file to implement your regression test.
import pytest
from deepeval import assert_test
from deepeval.metrics import HallucinationMetric, AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
from deepeval.dataset import EvaluationDataset
first_test_case = LLMTestCase(input="...", actual_output="...")
second_test_case = LLMTestCase(input="...", actual_output="...")
dataset = EvaluationDataset(
test_cases=[first_test_case, second_test_case]
)
@pytest.mark.parametrize(
"test_case",
dataset.test_cases,
)
def test_example(test_case: LLMTestCase):
metric = AnswerRelevancyMetric(threshold=0.5)
assert_test(test_case, [metric])To check that your test file is working, run deepeval test run:
deepeval test run test_file.pySetting Up Your YAML File
To set up a GitHub workflow that triggers deepeval test run on every pull or push request, define a .yaml file:
name: LLM Regression Test
on:
push:
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install Dependencies
run: poetry install --no-root
- name: Run DeepEval Unit Tests
run: poetry run deepeval test run test_file.pyCongratulations 🎉! You've now setup an automated regression testing suite in under 30 lines of code.