
Stanford HELM
The industry-standard framework for holistic, multi-metric evaluation of large language models.
6d ago
Best for LLMOpsHas API
PricingFree
Free
Automated Model Benchmarking
Bias and Toxicity Detection
Robustness Testing
Discover the strongest tools and workflows for bias detection.

The industry-standard framework for holistic, multi-metric evaluation of large language models.
A Suite of Statistical Tools for Analyzing and Generating Text