Overview
SoftVC VITS Singing Voice Conversion (so-vits-svc) is an open-source project focused on converting singing voices using AI. It employs a SoftVC content encoder to extract speech features from source audio, feeding them directly into a VITS model, preserving pitch and intonations. Unlike traditional TTS, so-vits-svc excels in SVC tasks by replacing the vocoder with NSF HiFiGAN to minimize sound interruption. The architecture supports shallow diffusion models for enhanced sound quality, Whisper-PPG encoder support, and static/dynamic sound fusion. Users train models independently using datasets, requiring consideration of dataset authorization. This framework allows developers to enable characters to perform singing tasks, with focus on fictional characters. The system is designed to operate offline, ensuring no user data is collected.
