Overview
HellaSwag is a dataset designed to evaluate and challenge the commonsense reasoning capabilities of Natural Language Processing (NLP) models. It focuses on the task of adversarial commonsense inference, where models must select the most plausible ending to a given sentence context. The dataset is constructed using an adversarial filtering approach, which iteratively generates and filters incorrect answers to create challenging examples. HellaSwag aims to expose the limitations of current state-of-the-art NLP models, which often struggle with tasks that are trivial for humans. By providing a benchmark that co-evolves with advancing NLP techniques, HellaSwag encourages the development of more robust and human-like language understanding systems. It is primarily used by NLP researchers and developers to evaluate and improve the commonsense reasoning abilities of their models.
Common tasks