
Stanford HELM
The industry-standard framework for holistic, multi-metric evaluation of large language models.
Just now
Has API
PricingFree
Free
Automated Model Benchmarking
Bias and Toxicity Detection
Robustness Testing
Discover the strongest tools and workflows for bias detection.

The industry-standard framework for holistic, multi-metric evaluation of large language models.
A Suite of Statistical Tools for Analyzing and Generating Text