An open-source framework for evaluating LLMs, RAG systems & chatbots.
ContextCheck is an open-source framework designed to evaluate, test, and validate large language models (LLMs), Retrieval-Augmented Generation (RAG) systems, and chatbots. It provides tools to automatically generate queries, request completions, detect regressions, perform penetration tests, and assess hallucinations, ensuring the robustness and reliability of these systems. ContextCheck is configurable via YAML and can be integrated into continuous integration (CI) pipelines for automated testing.
Features
➡️ Simple test scenario definition using human-readable .yaml files
➡️ Flexible endpoint configuration for OpenAI, HTTP, and more
➡️ Customizable JSON request/response models
➡️ Support for variables and Jinja2 templating in YAML files
➡️ Response validation options, including heuristics, LLM-based judgment, and human labeling
➡️ Enhanced output formatting with the rich package for clear, readable displays