Onsets and Frames

Onsets and Frames | findAIList | Find AI List

Overview

Onsets and Frames is a state-of-the-art automatic music transcription (AMT) model developed by the Google Magenta team. Built on a sophisticated neural network architecture, it specifically addresses the 'onset-offset' problem in polyphonic music transcription. By utilizing separate heads for detecting the beginning of notes (onsets) and the duration (frames), the system achieves significantly higher precision than traditional frame-based classifiers. In 2026, it remains the industry benchmark for piano transcription, utilizing a combination of Convolutional Neural Networks (CNNs) for feature extraction and Recurrent Neural Networks (LSTMs or Transformers in newer iterations) for temporal modeling. The model also regresses note velocity, allowing it to capture the expressive dynamics of a performance. This architecture effectively mitigates the common error where long notes are fragmented into multiple short ones. It is primarily distributed via the Magenta library and TensorFlow, making it a favorite for developers building DAW plugins, music education platforms, and digital archival tools that require high-accuracy conversion of acoustic audio into editable MIDI data.

Common tasks

Polyphonic Piano Transcription Note Velocity Estimation MIDI Score Generation

FAQ

View all

Can it transcribe multiple instruments at once?

No, the standard model is optimized for solo piano. Multi-instrument transcription requires more complex source separation or a different model like MT3.

Does it work in real-time?

While inference is fast, real-time performance depends on your hardware (GPU) and the specific implementation buffer size.

Is there a GUI?

The core model is a CLI tool, but 'Magenta Studio' provides a plugin interface for Ableton Live and a standalone app.

What is the license?

It is released under the Apache 2.0 license, allowing for commercial use.

FAQ+