Overview
Onsets and Frames is a state-of-the-art automatic music transcription (AMT) model developed by the Google Magenta team. Built on a sophisticated neural network architecture, it specifically addresses the 'onset-offset' problem in polyphonic music transcription. By utilizing separate heads for detecting the beginning of notes (onsets) and the duration (frames), the system achieves significantly higher precision than traditional frame-based classifiers. In 2026, it remains the industry benchmark for piano transcription, utilizing a combination of Convolutional Neural Networks (CNNs) for feature extraction and Recurrent Neural Networks (LSTMs or Transformers in newer iterations) for temporal modeling. The model also regresses note velocity, allowing it to capture the expressive dynamics of a performance. This architecture effectively mitigates the common error where long notes are fragmented into multiple short ones. It is primarily distributed via the Magenta library and TensorFlow, making it a favorite for developers building DAW plugins, music education platforms, and digital archival tools that require high-accuracy conversion of acoustic audio into editable MIDI data.
