Rhasspy Larynx
High-quality, privacy-first neural text-to-speech for local edge computing.
High-quality, low-complexity neural vocoder combining DSP and Deep Learning for real-time speech synthesis.
LPCNet is a pioneering hybrid neural vocoder that integrates traditional Digital Signal Processing (DSP) techniques, specifically Linear Predictive Coding (LPC), with deep recurrent neural networks (RNN). Developed primarily by Jean-Marc Valin at Mozilla, it represents a significant leap in audio synthesis efficiency, enabling high-quality speech generation at computational loads significantly lower than pure-neural models like WaveNet. By using the LPC coefficients to handle the spectral envelope, the neural network only needs to model the residual excitation signal, which is much easier to learn and requires fewer parameters. As of 2026, LPCNet has become a foundational architecture for low-bitrate speech codecs and real-time Text-to-Speech (TTS) applications on edge devices. It utilizes sparse GRU (Gated Recurrent Unit) layers and 8-bit quantization to achieve real-time performance on high-end mobile CPUs without requiring dedicated GPU acceleration. This makes it ideal for privacy-focused, on-device voice synthesis and low-latency communication protocols where bandwidth and power are constrained.
Combines a linear prediction filter with a gated recurrent unit to reduce the complexity of the neural synthesis task.
High-quality, privacy-first neural text-to-speech for local edge computing.
A high-speed, fully convolutional neural architecture for multi-speaker text-to-speech synthesis.
Real-time neural text-to-speech architecture for massive-scale multi-speaker synthesis.
A Multilingual Single-Speaker Speech Corpus for High-Fidelity Text-to-Speech Synthesis.
Verified feedback from the global deployment network.
Post queries, share implementation strategies, and help other users.
Uses structured sparsity in the GRU layers to skip redundant computations during inference.
Processes coarse-grained spectral features at a lower rate than the sample-level excitation.
Outputs audio in 8-bit u-law format internally to simplify the probability distribution modeling.
Adjusts neural processing based on the fundamental frequency of the input speech.
Predicts missing audio frames using the neural network's stateful memory.
Hand-rolled intrinsics for Intel AVX2 and ARM NEON architectures.
Maintaining voice clarity over extremely congested networks (sub-3kbps bandwidth).
Registry Updated:2/7/2026
High-quality TTS without the latency or privacy concerns of cloud-based APIs.
Removing background noise while maintaining low latency (under 10ms).