Objaverse-XL
The Universe of 3D Objects: A massive open-source dataset for next-generation 3D generative AI and robotics.
The premier open-source multi-language corpus for training high-fidelity TTS and STT models.
The M-AILABS Speech Dataset is a foundational technical resource for the 2026 AI developer ecosystem, specifically engineered for the development of Speech-to-Text (STT) and Text-to-Speech (TTS) systems. Originating from the Munich Artificial Intelligence Laboratories, the dataset provides over 1,000 hours of high-quality, 16-bit PCM audio sampled at 16kHz or 22.05kHz, primarily sourced from LibriVox audiobooks. Its architecture follows a standardized directory structure similar to LibriSpeech, making it instantly compatible with modern training frameworks like PyTorch and TensorFlow. In 2026, it remains a critical asset for fine-tuning large-scale foundational models like OpenAI's Whisper or Meta's SeamlessM4T, particularly for European languages including German, French, Spanish, Italian, and Polish. The dataset is distinguished by its meticulous alignment between audio segments and clean transcripts, eliminating the need for heavy preprocessing. Unlike commercial datasets, M-AILABS offers a permissive BSD 3-Clause license, allowing for unrestricted commercial use, which makes it a top choice for startups building sovereign voice AI solutions or localized LLM voice interfaces.
Includes localized dialects and phonetic structures for English, German, French, Italian, Spanish, Russian, Ukrainian, and Polish.
The Universe of 3D Objects: A massive open-source dataset for next-generation 3D generative AI and robotics.
The world's largest open-source, multi-language voice dataset for democratizing AI speech recognition.
A massive-scale open-source corpus for multilingual Automatic Speech Recognition (ASR) research.
The industry-standard benchmark for training and validating state-of-the-art deepfake detection models.
Verified feedback from the global deployment network.
Post queries, share implementation strategies, and help other users.
Directories are organized by Speaker ID > Book ID > Audio/Text segments, following industry-standard LibriVox formatting.
Recorded at 16,000Hz or 22,050Hz in 16-bit PCM format to capture nuances in human speech.
Each language pack includes a centralized metadata.csv file for efficient data batching and lookup.
Contains a distribution of both male and female narrators across various age groups and tones.
Transcripts are pre-cleaned of non-verbal markers and standardized for easier tokenization.
Permissive license that allows modification and redistribution without requiring source code disclosure.
Lack of high-quality training data for German-speaking synthetic voices.
Registry Updated:2/7/2026
Reducing costs of human narration for long-form content.
Scarce available data for Slavic language speech recognition.