
Every song tells a story through its structure. The way a track transitions from an atmospheric INTRO into the first VERSE, builds tension toward the CHORUS, and eventually resolves in an OUTRO — this is the invisible architecture that shapes our emotional journey as listeners.
Until recently, analyzing song structure required trained ears and manual annotation. Today, AI-powered tools can automatically detect these sections in seconds, opening up new possibilities for music production, education, remixing, and content creation.
Song structure refers to the arrangement of different sections within a piece of music. The most common sections include:
| Section | Purpose |
|---|---|
| INTRO | Sets the mood, introduces musical elements |
| VERSE | Tells the story, builds narrative |
| PRE-CHORUS | Creates tension before the hook |
| CHORUS | The memorable, repeating hook |
| BRIDGE | Provides contrast, breaks repetition |
| OUTRO | Concludes the song, fades out |
Here's what an AI-generated structure analysis looks like in practice:
{
"track_id": "untitled_project",
"bpm": 120,
"sections": [
{ "label": "INTRO", "time_s": 6.71 },
{ "label": "VERSE", "time_s": 27.79 },
{ "label": "CHORUS", "time_s": 49.13 },
{ "label": "VERSE", "time_s": 69.11 },
{ "label": "CHORUS", "time_s": 90.13 },
{ "label": "BRIDGE", "time_s": 110.09 },
{ "label": "CHORUS", "time_s": 151.62 },
{ "label": "BRIDGE", "time_s": 171.19 },
{ "label": "CHORUS", "time_s": 191.48 },
{ "label": "OUTRO", "time_s": 211.65 }
]
}
This JSON output provides:
Modern song structure analysis relies on several machine learning techniques:
Spectral Analysis
The AI examines the frequency content over time, looking for changes in harmonic patterns, instrument density, and frequency range usage.
Self-Similarity Matrices
By comparing every moment of a song to every other moment, the algorithm identifies repeating patterns (choruses), unique sections (bridges), and transitional moments.
Feature Extraction
Key audio features analyzed include chroma features (harmonic content), MFCCs (timbral texture), onset strength (rhythmic patterns), and energy (loudness changes).
Boundary Detection
Sudden changes in these features indicate section transitions. The AI marks these boundaries and classifies each segment.
Several tools and libraries can perform this analysis:
| Tool | Type | Best For |
|---|---|---|
| librosa | Python library | Research, custom pipelines |
| Essentia | C++/Python | High-performance analysis |
| MSAF | Python library | Academic structure segmentation |
| Spleeter | Source separation | Pre-processing stems |
| allin1 | Python package | End-to-end structure analysis |
import librosa
import numpy as np
# Load audio
y, sr = librosa.load('track.mp3')
# Compute features for segmentation
chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
# Detect segment boundaries
bounds = librosa.segment.agglomerative(chroma, k=10)
bound_times = librosa.frames_to_time(bounds, sr=sr)
print("Section boundaries (seconds):", bound_times)
One of the most practical applications is importing structure data directly into your DAW. Here's a complete Lua script for REAPER that reads our JSON format and creates colored markers on the timeline:
-- Import Song Structure Markers from JSON
-- Creates colored markers for each section in REAPER
-- Color palette for different section types
local section_colors = {
INTRO = 0x1E88E5, -- Blue
VERSE = 0x43A047, -- Green
CHORUS = 0xE53935, -- Red
BRIDGE = 0xFB8C00, -- Orange
OUTRO = 0x8E24AA, -- Purple
DEFAULT = 0x757575 -- Gray
}
-- Convert RGB to REAPER native color format
local function rgb_to_native(rgb)
local r = (rgb >> 16) & 0xFF
local g = (rgb >> 8) & 0xFF
local b = rgb & 0xFF
return reaper.ColorToNative(r, g, b) | 0x1000000
end
-- Simple JSON parser for our structure format
local function parse_json(json_str)
local data = {sections = {}}
-- Extract BPM
local bpm = json_str:match('"bpm"%s*:%s*(%d+%.?%d*)')
data.bpm = tonumber(bpm)
-- Extract sections
for label, time_s in json_str:gmatch(
'"label"%s*:%s*"([^"]*)"%s*,%s*"time_s"%s*:%s*(%d+%.?%d*)'
) do
table.insert(data.sections, {
label = label,
time_s = tonumber(time_s)
})
end
return data
end
-- Main function
local function main()
-- Open file dialog
local retval, filename = reaper.GetUserFileNameForRead(
"", "Select Structure JSON", "json"
)
if not retval then return end
-- Read JSON file
local file = io.open(filename, "r")
if not file then return end
local json_content = file:read("*all")
file:close()
local data = parse_json(json_content)
-- Set project BPM if available
if data.bpm then
reaper.SetCurrentBPM(0, data.bpm, false)
end
-- Create markers
reaper.Undo_BeginBlock()
for i, section in ipairs(data.sections) do
local color = rgb_to_native(
section_colors[section.label] or section_colors.DEFAULT
)
reaper.AddProjectMarker2(
0, false, section.time_s, 0, section.label, -1, color
)
end
reaper.Undo_EndBlock("Import Structure Markers", -1)
end
main()
Save the code above as import_structure_markers.lua
In REAPER: Actions → Show action list → New action → Load ReaScript...
Select your saved .lua file
Run the action and choose your JSON structure file
Colored markers appear instantly on your timeline
The result gives you a visual map of your song structure directly in your DAW:
| Color | Section |
|---|---|
| 🔵 Blue | INTRO |
| 🟢 Green | VERSE |
| 🔴 Red | CHORUS |
| 🟠 Orange | BRIDGE |
| 🟣 Purple | OUTRO |
The same JSON format can be imported into other DAWs:
| DAW | Method |
|---|---|
| Ableton Live | Python script via Max for Live |
| Logic Pro | AppleScript or Lua via Scripter |
| Pro Tools | AAF/XML marker import |
| FL Studio | Python scripting |
| Cubase | Track preset with markers |
Now let's look at the complete pipeline: analyzing an audio file and generating the JSON structure automatically. We'll use the allin1 library which provides state-of-the-art structure analysis.
pip install allin1 librosa
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
#!/usr/bin/env python3
"""
Song Structure Analyzer
Analyzes audio files and exports structure as JSON for DAW import.
"""
import json
import argparse
from pathlib import Path
import allin1
def analyze_track(audio_path: str, output_path: str = None) -> dict:
"""
Analyze audio file and return structure data.
Args:
audio_path: Path to audio file (mp3, wav, flac, etc.)
output_path: Optional path for JSON output
Returns:
Dictionary with track_id, bpm, and sections
"""
# Run analysis
result = allin1.analyze(audio_path)
# Build structure data
track_id = Path(audio_path).stem
structure = {
"track_id": track_id,
"bpm": round(result.bpm),
"sections": []
}
# Extract sections with timestamps
for segment in result.segments:
structure["sections"].append({
"label": segment.label.upper(),
"time_s": round(segment.start, 2)
})
# Save to JSON if output path provided
if output_path:
with open(output_path, 'w') as f:
json.dump(structure, f, indent=2)
print(f"Structure saved to: {output_path}")
return structure
def main():
parser = argparse.ArgumentParser(
description='Analyze song structure and export to JSON'
)
parser.add_argument('audio', help='Path to audio file')
parser.add_argument(
'-o', '--output',
help='Output JSON path (default: same name as audio)'
)
args = parser.parse_args()
# Default output path
if not args.output:
args.output = Path(args.audio).with_suffix('.json')
# Analyze
result = analyze_track(args.audio, args.output)
# Print summary
print(f"\nTrack: {result['track_id']}")
print(f"BPM: {result['bpm']}")
print(f"Sections found: {len(result['sections'])}")
print("\nStructure:")
for section in result['sections']:
minutes = int(section['time_s'] // 60)
seconds = section['time_s'] % 60
print(f" {minutes}:{seconds:05.2f} - {section['label']}")
if __name__ == '__main__':
main()
python analyze_structure.py "Back in Black.mp3"
python analyze_structure.py track.wav -o structure.json
for f in *.mp3; do python analyze_structure.py "$f"; done
Running the script on AC/DC's "Back in Black":
{
"track_id": "MTL_MADEBYIKIGAI__ACDC_BACK_IN_BLACK",
"bpm": 120,
"sections": [
{ "label": "INTRO", "time_s": 6.71 },
{ "label": "VERSE", "time_s": 27.79 },
{ "label": "CHORUS", "time_s": 49.13 },
{ "label": "VERSE", "time_s": 69.11 },
{ "label": "CHORUS", "time_s": 90.13 },
{ "label": "BRIDGE", "time_s": 110.09 },
{ "label": "CHORUS", "time_s": 151.62 },
{ "label": "BRIDGE", "time_s": 171.19 },
{ "label": "CHORUS", "time_s": 191.48 },
{ "label": "OUTRO", "time_s": 211.65 }
]
}
If you need more control over the segmentation algorithm:
import librosa
import numpy as np
from sklearn.cluster import KMeans
import json
def analyze_with_librosa(audio_path: str, n_sections: int = 10) -> dict:
"""
Custom structure analysis using librosa.
"""
# Load audio
y, sr = librosa.load(audio_path)
# Estimate BPM
tempo, _ = librosa.beat.beat_track(y=y, sr=sr)
# Compute features
chroma = librosa.feature.chroma_cqt(y=y, sr=sr)
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
# Combine features
features = np.vstack([chroma, mfcc])
# Find boundaries using structural segmentation
bounds = librosa.segment.agglomerative(features, k=n_sections)
bound_times = librosa.frames_to_time(bounds, sr=sr)
# Cluster segments to find similar sections
segment_features = []
for i in range(len(bounds) - 1):
start_frame = bounds[i]
end_frame = bounds[i + 1]
segment_feat = np.mean(features[:, start_frame:end_frame], axis=1)
segment_features.append(segment_feat)
# Cluster into section types
if len(segment_features) > 3:
kmeans = KMeans(n_clusters=min(4, len(segment_features)))
labels = kmeans.fit_predict(segment_features)
label_map = {0: 'VERSE', 1: 'CHORUS', 2: 'BRIDGE', 3: 'OTHER'}
else:
labels = range(len(segment_features))
label_map = {i: 'SECTION' for i in labels}
# Build structure
sections = []
# First section is usually INTRO
sections.append({
"label": "INTRO",
"time_s": round(bound_times[0], 2)
})
for i, (time, label) in enumerate(zip(bound_times[1:-1], labels)):
sections.append({
"label": label_map.get(label, 'SECTION'),
"time_s": round(time, 2)
})
# Last section is usually OUTRO
if len(bound_times) > 1:
sections.append({
"label": "OUTRO",
"time_s": round(bound_times[-1], 2)
})
return {
"track_id": audio_path.split('/')[-1].replace('.mp3', ''),
"bpm": int(tempo),
"sections": sections
}
# 1. Analyze the audio
python analyze_structure.py ~/Music/track.mp3 -o /tmp/structure.json
# 2. Open REAPER and run the import script
# The Lua script will read /tmp/structure.json and create markers
import subprocess
import os
def analyze_and_import(audio_path: str, reaper_script: str):
"""
Analyze audio and automatically import to REAPER.
"""
# Generate JSON
json_path = audio_path.replace('.mp3', '_structure.json')
analyze_track(audio_path, json_path)
# Call REAPER with the script
subprocess.run([
'/Applications/REAPER.app/Contents/MacOS/REAPER',
'-nosplash',
f'-script:{reaper_script}',
json_path
])
As models improve, we can expect:
Automatic song structure analysis transforms how we interact with music. By understanding the architecture of a track — its verses, choruses, bridges, and transitions — we unlock new creative possibilities.
Whether you're a producer seeking arrangement inspiration, an educator teaching music theory, or a developer building the next music app, structure analysis provides a foundation for deeper musical understanding.
At Music Tech Lab, we explore the intersection of music and technology. Have questions about implementing song structure analysis in your project? Get in touch.
Building something similar or facing technical challenges? We've been there.
Let's talk — no sales pitch, just honest engineering advice.
Automate repetitive tasks to improve your business performance
Repetitive work may negatively impact your business performance. By improving business performance management, you can automate tasks and increase productivity.
Automating Success: The Art of Unified Documentation
Automated documentation isn't just a convenience; it's a necessity in a landscape of ever-increasing complexity. By adopting a unified documentation strategy, teams can create a dynamic, living docume