Overview
Switchboard-1 Release 2 is a corpus of approximately 260 hours of conversational telephone speech developed by Texas Instruments and distributed by the Linguistic Data Consortium (LDC). It comprises around 2,400 two-sided telephone conversations among 543 speakers from across the United States. The data was collected using a computer-driven robot operator system which introduced topics for discussion and recorded speech from both participants into separate channels. This release corrects errors from the original NIST publication (Release 1) and includes modifications to the NIST Sphere headers for consistency. The ISIP update of phonetic transcriptions developed by the International Computer Science Institute (ICSI) and corrected word alignments are available at ISIP. Researchers leverage this data for annotation projects, discourse analysis, part-of-speech tagging, and phonetic transcriptions. The corpus includes speaker attribution tables, updated file lists, and documentation.
