LibriVox
Free public domain audiobooks read by volunteers from around the world.

The gold-standard conversational telephone speech corpus for enterprise-grade ASR and NLU development.
Fisher English Training Speech Part 1 (Catalog Number LDC2004S07) is a cornerstone dataset in the field of Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). Developed by the Linguistic Data Consortium (LDC), it contains 5,850 technical-quality telephone conversations, totaling approximately 975 hours of audio. The technical architecture of the corpus is designed to solve the 'sparse data' problem in conversational speech by utilizing a large-scale collection of short (10-minute) conversations between strangers. In the 2026 market, it remains a critical benchmark for training robust models capable of handling 8kHz narrowband telephony audio, which still dominates global telecommunications infrastructure. The data is formatted in SPHERE (NIST) format, featuring 2-channel, 8-bit, 8kHz μ-law sampled data. Its technical value lies in its demographic diversity and the inclusion of precise metadata, allowing AI solutions architects to build models with high accuracy across various dialects and acoustic environments. While newer wideband datasets exist, the Fisher corpus's unmatched scale and the accompanying Part 1 Transcripts (LDC2004T19) make it indispensable for cross-entropy training and fine-tuning state-of-the-art transformer models for real-world call center and telephonic applications.
Includes audio from 11,699 unique speakers across various US dialects.
Free public domain audiobooks read by volunteers from around the world.
Enterprise-grade data labeling platform for high-precision AI model training and validation.
The industry-standard public domain dataset for neural text-to-speech synthesis and voice modeling.
High-quality multimodal AI training data for global enterprise scale.
Verified feedback from the global deployment network.
Post queries, share implementation strategies, and help other users.
Each caller is recorded on a separate channel in the SPHERE file.
Conversations are initiated based on 40 distinct assigned topics.
Native sampling at 8000Hz using μ-law encoding.
Includes age, gender, and education level of speakers.
Contains metadata directly in the file headers including sample rate and encoding.
Designed for use with LDC2004T19 transcripts.
Poor recognition of customers over low-bandwidth telephone lines.
Registry Updated:2/7/2026
Deploy to IVR system
Need for robust biometric voice verification over phone lines.
Manually categorizing thousands of support calls.