Steering Autoregressive Music Generation with Recursive Feature Machines

Daniel Zhao, Daniel Beaglehole, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack

Audio Samples

Organized by control class

Note Controls

Control: 0.6 | Modification: Note C
Trance EDM song with synths and reverb

Control: 0.5 | Modification: Note G
Dreamy future bass with chopped synth chords.

Control: 0.5 | Modification: Note D
A funky groove with slap bass, wah guitar, tight drums, and playful brass stabs, evoking a 1970s dance floor.

Chord Controls

Control: 0.6 | Modification: Augmented Chord
Trance EDM song with synths and reverb

Control: 0.4 | Modification: Minor Chord
Trance EDM song with synths and reverb

Control: 0.5 | Modification: Augmented Chord
Dreamy future bass with chopped synth chords.

Control: 0.5 | Modification: Major Chord
A funky groove with slap bass, wah guitar, tight drums, and playful brass stabs, evoking a 1970s dance floor.

Interval Controls

Control: 0.7 | Modification: Interval 2
Dreamy future bass with chopped synth chords.

Control: 0.7 | Modification: Interval 6 (tritone)
A cinematic orchestral piece with soaring strings, brass, and percussion, building tension and drama.

Control: 0.7 | Modification: Interval 1
A funky groove with slap bass, wah guitar, tight drums, and playful brass stabs, evoking a 1970s dance floor.

Tempo Controls

Control: -0.2 | Modification: Slow Tempo
A cinematic orchestral piece with soaring strings, brass, and percussion, building tension and drama.

Control: 0.2 | Modification: Fast Tempo
A cinematic orchestral piece with soaring strings, brass, and percussion, building tension and drama.

Multi-Direction Controls

Double Control: 0.5, 0.3 | Modification: Note F# and Fast Tempo
A cinematic orchestral piece with soaring strings, brass, and percussion, building tension and drama.

Double Control: 0.5, -0.3 | Modification: Note F# and Slow Tempo
A cinematic orchestral piece with soaring strings, brass, and percussion, building tension and drama.

Double Control: 0.5, 0.3 | Modification: Minor Chord and Fast Tempo
A slow, melancholic classical piano solo with expressive dynamics and a gentle, flowing melody.

Double Control: 0.5, -0.3 | Modification: Minor Chord and Slow Tempo
A slow, melancholic classical piano solo with expressive dynamics and a gentle, flowing melody.

Double Control: 0.5, 0.5 | Modification: Note D and Minor Chord
Fast beat, hip hop, upbeat that has a positive vibe

Temporal Controls

Double Control: 0.5, 0.5 | Modification: Interval 2 -> 7 over time
A high-energy electronic dance track with pulsating synths, sidechained kick, and a euphoric drop.

Double Control: 0.5, 0.5 | Modification: Note A -> E over time
A fast-paced rock song featuring electric guitars, energetic drums, and a driving bassline, evoking excitement.

Baseline (No Control)

Trance EDM song with synths and reverb

Fast beat, hip hop, upbeat that has a positive vibe

Dreamy future bass with chopped synth chords.

A fast-paced rock song featuring electric guitars, energetic drums, and a driving bassline, evoking excitement.

A slow, melancholic classical piano solo with expressive dynamics and a gentle, flowing melody.

A cinematic orchestral piece with soaring strings, brass, and percussion, building tension and drama.

A funky groove with slap bass, wah guitar, tight drums, and playful brass stabs, evoking a 1970s dance floor.

A high-energy electronic dance track with pulsating synths, sidechained kick, and a euphoric drop.

Abstract

Controllable music generation remains a significant challenge, with existing methods often requiring model retraining or introducing audible artifacts. We introduce MusicRFM, a framework that adapts Recursive Feature Machines (RFMs) to enable fine-grained, interpretable control over frozen, pre-trained music models by directly steering their internal activations.

RFMs analyze a model's internal gradients to produce interpretable "concept directions", or specific axes in the activation space that correspond to musical attributes like notes or chords. We first train lightweight RFM probes to discover these directions within MusicGen's hidden states; then, during inference, we inject them back into the model to guide the generation process in real-time without per-step optimization.

We present advanced mechanisms for this control, including dynamic, time-varying schedules and methods for the simultaneous enforcement of multiple musical properties. Our method successfully navigates the trade-off between control and generation quality: we can increase the accuracy of generating a target musical note from 0.23 to 0.82, while text prompt adherence remains within approximately 0.02 of the unsteered baseline, demonstrating effective control with minimal impact on prompt fidelity.

Steering Autoregressive Music Generation with Recursive Feature Machines

Audio Samples

Note Controls

Control: 0.6 | Modification: Note C Trance EDM song with synths and reverb

Control: 0.5 | Modification: Note G Dreamy future bass with chopped synth chords.

Control: 0.5 | Modification: Note D A funky groove with slap bass, wah guitar, tight drums, and playful brass stabs, evoking a 1970s dance floor.

Chord Controls

Control: 0.6 | Modification: Augmented Chord Trance EDM song with synths and reverb

Control: 0.4 | Modification: Minor Chord Trance EDM song with synths and reverb

Control: 0.5 | Modification: Augmented Chord Dreamy future bass with chopped synth chords.

Control: 0.5 | Modification: Major Chord A funky groove with slap bass, wah guitar, tight drums, and playful brass stabs, evoking a 1970s dance floor.