Can AI Understand Music Theory? An Experiment in LLM-Composed Techno
Testing whether Claude can compose from first principles rather than pattern matching.
chasewughes.com · Jan 2026
There’s a fascinating paradox at the heart of AI-generated music right now.
Tools like Suno and Udio can produce remarkably polished tracks—songs that genuinely sound like music. But they work through sequence prediction: patterns learned from millions of existing songs, reassembled into statistically likely combinations. They don’t understand what a minor seventh chord does emotionally, or why a four-bar phrase followed by a two-bar variation creates tension. They pattern-match.
The other limitation is practical: these systems output finished audio. No stems. No edits. No remixing. If you want to change the bass line, you regenerate everything.
I wanted to test a different approach: What if an LLM composed music from first principles? Could a model that actually understands music theory—scales, harmonic relationships, arrangement conventions—create something genuinely new?
View the full project on GitHub: AbletonComposer
Why Start with Techno?
Techno is an interesting test case for several reasons:
Programmatic instrumentation: Most sounds in techno can be synthesized electronically. You’re not trying to replicate the nuance of a violin or the breath of a vocalist—you’re programming oscillators, filters, and envelopes. This maps naturally to code.
Mathematical patterns: The genre has deep structure. Four-on-the-floor kick patterns. Polymetric relationships between elements. Tension curves that build over 8 or 16 bars. These can be described formally.
Sample-based culture: Techno producers often work with sample libraries rather than recording live instruments. This creates an interesting challenge—how does an AI “choose” samples when it can’t hear them?
Clear quality signals: A bad techno track reveals itself quickly. The kick doesn’t sit right. The energy doesn’t build. Elements clash. There’s less subjectivity to hide behind than in some genres.
The Architecture
The system I built combines several components:
Ableton Live 12 as the Canvas
Ableton’s session view—a grid of clips that can be triggered independently—provides the perfect foundation for programmatic composition. Each clip is a discrete musical idea. Each track is an instrument or element. The arrangement emerges from which clips play when.
I forked an existing Ableton MCP (Model Context Protocol) server and extended it significantly, adding functionality for:
- Creating and populating clips with MIDI data
- Manipulating device parameters (filters, effects, envelopes)
- Managing track routing and sends
- Querying session state
Claude Skills: Encoding Music Knowledge
Here’s where it gets interesting. I used Anthropic’s Claude Skills feature to create domain-specific instruction sets—essentially encoding the knowledge a music producer would have.
Each skill covers a different aspect of production:
- Arrangement: Energy curves, tension/release, section lengths
- Sound Design: Synthesis techniques, effect chains, timbre shaping
- Mixing: Frequency relationships, gain staging, spatial placement
- Groove/Timing: Humanization, swing, polymetric relationships
- Composition: Harmonic progressions, melodic contour, rhythmic variation
These were built from deep research reports on music production theory, distilled into actionable instructions the model could apply.
The Sample Library Problem
One challenge I hadn’t fully anticipated: techno producers typically don’t synthesize every sound from scratch. They curate sample libraries—kicks, hats, textures, effects—that define their sonic palette. But an AI can’t “listen” to a kick drum sample and know if it’s punchy, warm, tight, or boomy.
My solution was building a sample analysis system. It processes audio files and generates JSON profiles capturing:
{
"file_path": "/samples/001_Stab_Low__FX_Trail__126bpm_Am.wav",
"duration_seconds": 19.048,
"bpm": 126.0,
"bpm_confidence": 0.8,
"key": "N/A (atonal)",
"is_tonal": false,
"spectral_centroid_mean": 297.2,
"brightness": 0.04,
"warmth": 1.0,
"roughness": 1.0,
"energy_level": 5,
"texture_tags": ["dark", "warm", "soft-attack", "distorted", "tonal", "dynamic"],
"category": "bass",
"subcategory": "dark warm soft"
}
This lets Claude “hear” through metadata. When selecting samples for a dark industrial track, it can filter for high roughness, low brightness, and tags like “distorted” and “warm”—maintaining sonic coherence without actually processing audio.
The Composition Pipeline
Phase 1: Planning
Claude analyzes the target style—in this case, dark industrial techno in the vein of Ben Klock, Surgeon, and Regis—and creates an arrangement map:
- 8 scenes (intro → build → breakdown → drop → peak → evolve → outro)
- 19+ tracks (kick, bass, hats, percussion, synths, textures, FX, vocals)
- Energy curves for each section
Phase 2: Track Setup
The MCP creates Ableton tracks with appropriate instruments and effect chains. For example, the bass track uses:
Operator (FM synth) → Auto Filter → Saturator → Phaser-Flanger → Compressor → EQ Eight
Each device is configured with parameters matching the target aesthetic—heavy saturation, aggressive filtering, controlled dynamics.
Phase 3: Note Generation
This is where the music theory knowledge manifests. Claude generates Python scripts that create MIDI patterns following specific rules.
The bass line script (generate_bass.py) demonstrates the approach:
- Key: G minor (MIDI 31 = G0 root)
- Pitches: Scale-appropriate notes (G, A, B♭, C, D, E♭, F)
- Pattern evolution: Sparse in first half, dense in second
- Velocity curves: 90-105 early, 105-120 late
- Fills: Ascending chromatic runs every 16 bars
patterns_first_half = [
[31, 0, 38, 0, 31, 0, 41, 0], # Sparse, root-focused
[31, 38, 31, 0, 38, 41, 31, 0], # Building movement
]
patterns_second_half = [
[31, 38, 31, 38, 41, 31, 38, 41], # Full movement
[38, 31, 38, 31, 38, 41, 38, 31], # Fifth-focused
]
Phase 4: Performance
This is the crucial innovation. Having clips in Ableton isn’t the same as having a track. A song is defined by transitions—how elements fade in, how energy builds, how textures layer.
The performance script (perform_subterranean_v13.py) is 1,700 lines of Python that “plays” the track in real-time via OSC:
Volume automation: Each instrument has carefully planned dynamics across the arrangement.
# HARDER DROP (v13 innovation)
# Brief silence before impact (maximum contrast)
# DROP_CRASH @ -4dB (was -6dB)
# STAB + PEAK_LEAD both fire at drop
# KICK @ -6dB (was -7dB)
Polymetric timing: Different elements cycle on prime-number intervals, creating hypnotic phase relationships.
POLY = {
"STAB": 7,
"RIDE": 5,
"DING": 11,
"ARPEGGIO": 13,
"TEXTURE": 9,
}
Spectral carving: Dynamic EQ to prevent frequency collisions. When the vocal plays, the texture track ducks in the 300Hz range. When the peak lead fires, competing elements are carved out at 400Hz.
class SpectralCarver:
"""Alpha/Beta frequency collision management."""
def process_spectral_carving(self, ctl, bar=0):
if self._peak_lead_active and self._texture_active:
# Cut TEXTURE 400Hz when PEAK_LEAD is active
ctl.set_device_param(
T["DARK_TEXTURE"],
DEVICE["DARK_TEXTURE_EQ"],
EQ8["BAND2_GAIN"],
-5.0 # dB cut
)
Humanization: Not everything should be perfectly on-grid. The script applies track-specific timing variations:
- Rigid tracks (kick, bass, sub-bass): Zero timing deviation
- Tight tracks (stabs, leads, bells): ±2ms variation
- Loose tracks (hats, percussion): Up to ±12ms slop
Phase 5: Feedback Loop
This is where Gemini enters the picture. It can analyze audio natively, providing feedback on the generated track. Combined with stem separation libraries, I could evaluate individual instruments and iterate:
- Is the kick cutting through?
- Does the bass line feel static?
- Is the arrangement building energy appropriately?
Results
What works:
- Structural logic: The track follows proper arrangement conventions. Energy builds, releases, builds again. The breakdown creates genuine tension before the drop.
- Technical correctness: Everything is in key. The mix is reasonably balanced. Frequency conflicts are managed.
- Coherence: Elements feel like they belong together—the sample selection and synthesis parameters create a consistent sonic world.
What’s missing:
- Surprise: The model follows music theory correctly, but doesn’t break rules in interesting ways. There’s no unexpected chord change, no moment where you think “I didn’t see that coming.”
- Intuition: Human producers make decisions that can’t be derived from first principles. They know when a track needs “more space” or “something weird” without being able to articulate why.
- Taste: The hardest thing to encode. The model can tell you what’s correct; it can’t tell you what’s cool.
The Bigger Picture
This experiment highlights an interesting tension in AI creativity.
Current generative music AI (Suno, Udio) skips understanding entirely. It’s sophisticated interpolation—pattern-matching without comprehension. The output can be impressive, but it can’t explain its choices or produce editable stems.
This approach inverts that. Claude genuinely understands music theory. It can explain why a minor seventh chord creates tension, why four-bar phrases work, why polymetric relationships create hypnotic effects. But it lacks the intuition to transcend that understanding.
Both approaches hit a wall, just from different directions.
I suspect the breakthrough will come from systems that understand the rules deeply enough to know which ones to break—and when breaking them will feel good rather than just wrong. That requires something we don’t quite have yet: taste as a learnable skill rather than an innate property.
I have no formal music training. Everything in this project was learned through building it—which is maybe the most interesting finding. You can get pretty far by understanding theory even without intuition.