Overview
The Stanford Natural Language Inference (SNLI) Corpus is a collection of 570k human-written English sentence pairs, manually labeled for balanced classification with the labels entailment, contradiction, and neutral. It serves as a benchmark for evaluating representational systems for text, including those induced by representation-learning methods, and as a resource for developing NLP models. The corpus is used for Natural Language Inference (NLI), also known as Recognizing Textual Entailment (RTE), which is the task of determining the inference relation between two texts. SNLI is distributed in both JSON lines and tab separated value files. Researchers and developers in natural language processing and machine learning use it to train and evaluate models for tasks such as text understanding and semantic reasoning. The corpus includes content from the Flickr 30k and VisualGenome corpora.