AI Song Structure Analysis: Intro, Verse, Chorus

When working with music at scale – whether for DJ software, music education, or content creation – one of the most valuable pieces of metadata is the song structure. Where does the chorus start? How long is the intro? When does the bridge appear? Traditionally, this information required manual tagging or expensive commercial solutions. With MTL Audio Locators, we've built an open-source tool that detects song sections automatically using AI and signal processing.

In this article, we'll explore how automatic structure analysis works and how you can use it in your projects.

The Challenge: Why Song Structure Matters

Song structure – the arrangement of sections like INTRO, VERSE, CHORUS, BRIDGE, and OUTRO – is fundamental to how we experience music. For technical applications, this data enables:

Use Case	Application
DJ Software	Auto-sync to chorus, smart mixing
Music Education	Visual learning aids, section practice
Content Creation	Quick navigation, highlight extraction
DAW Integration	Import markers for editing
Recommendation Systems	Structure-aware similarity matching

The challenge is that this information rarely exists in metadata. Even when producers add markers in their DAW, these don't export with the audio file.

Available Analysis Engines

MTL Audio Locators provides three analysis backends, automatically selecting the best available option:

Engine	Type	Accuracy	Speed	Use Case
allin1	Deep Learning	High	Slower	Production-quality analysis
MSAF	Traditional MIR	Medium	Medium	Research, comparison
librosa	Spectral	Basic	Fast	Fallback, quick scans

allin1 – AI-Powered Analysis

The allin1 model represents the state-of-the-art in music structure analysis. It's a deep learning model trained specifically on song segmentation tasks. The analysis pipeline:

Stem separation – Isolates vocals, drums, bass, and other instruments using Demucs
Spectrogram extraction – Converts each stem to time-frequency representation
Structure prediction – Neural network predicts section boundaries and labels

# Using allin1 directly
import allin1

result = allin1.analyze("track.mp3")
print(f"BPM: {result.bpm}")
for segment in result.segments:
    print(f"{segment.start:.2f}s - {segment.label}")

First Run: allin1 downloads ~1.5GB of models on first use (Demucs for demixing + Harmonix for structure).

MSAF – Music Structure Analysis Framework

MSAF provides traditional Music Information Retrieval (MIR) algorithms. We use spectral flux for boundary detection combined with Fourier Magnitude Coefficients for labeling:

import msaf

boundaries, labels = msaf.process(
    "track.mp3",
    boundaries_id='sf',   # Spectral flux
    labels_id='fmc2d'     # 2D Fourier Magnitude Coefficients
)

librosa – Spectral Fallback

When neither allin1 nor MSAF is available, we use a custom algorithm based on librosa that combines multiple features:

Feature	Purpose
Chroma CQT	Harmonic content (chord changes)
MFCC	Timbral characteristics (texture changes)
Spectral Contrast	Brightness/darkness patterns

The algorithm:

Extracts all features and stacks them
Uses agglomerative clustering to find segment boundaries
Snaps boundaries to nearest beats for musical alignment
Labels sections using position heuristics and K-means clustering

Section Labeling Convention

All backends output sections with a consistent naming convention designed for DAW compatibility:

Section	Prefix	Color	Example
Intro	INT	#808080	INT1
Verse	VRS	#4CAF50	VRS1, VRS2
Pre-Chorus	PRE	#8BC34A	PRE1
Chorus	CHO	#FF5722	CHO1, CHO2
Bridge	BRG	#00BCD4	BRG1
Breakdown	BRD	#2196F3	BRD1
Build	BUI	#9C27B0	BUI1
Outro	OUT	#607D8B	OUT1

The numbered suffix (VRS1, VRS2) indicates occurrence order – useful for identifying repeating sections.

Output Format

Analysis results follow a JSON schema designed for easy integration:

{
  "track_id": "my_song",
  "bpm": 128,
  "sections": [
    {"label": "INT1", "time_s": 0.0, "color": "#808080"},
    {"label": "VRS1", "time_s": 15.5, "color": "#4CAF50"},
    {"label": "CHO1", "time_s": 45.2, "color": "#FF5722"},
    {"label": "VRS2", "time_s": 75.0, "color": "#4CAF50"},
    {"label": "CHO2", "time_s": 105.3, "color": "#FF5722"},
    {"label": "OUT1", "time_s": 135.8, "color": "#607D8B"}
  ]
}

Structure Analysis Output

Using MTL Audio Locators

Installation

Clone and Install

git clone https://github.com/musictechlab/mtl-audiolocators
cd mtl-audiolocators
poetry install

CLI Usage

Analyze Single Track

poetry run analyze track.mp3

Custom Output Path

poetry run analyze track.mp3 -o structure.json

Sample CLI output:

Analyzing: track.mp3
Using allin1 (AI-based analysis)

========================================
Track: track
BPM: 128
Sections: 8
========================================

  0:00.00  INT1
  0:15.50  VRS1
  0:45.20  CHO1
  1:15.00  VRS2
  1:45.30  CHO2
  2:15.80  BRG1
  2:45.00  CHO3
  3:15.80  OUT1

-> Import to REAPER: Run import_structure_markers.lua
   and select: track_structure.json

REST API

For integration into web applications, MTL Audio Locators includes a FastAPI server:

Start Server

poetry run serve

Available endpoints:

Endpoint	Method	Description
`/analyze`	POST	Full structure analysis
`/analyze/features`	POST	Audio features only (BPM, energy)
`/waveform`	POST	Waveform data for visualization
`/engines`	GET	List available backends
`/logs/stream`	GET	SSE stream for progress updates
`/health`	GET	Service health check

Example API call:

cURL

curl -X POST "http://localhost:8000/analyze" \
  -F "file=@track.mp3"

Response:

{
  "track_id": "track",
  "bpm": 128,
  "duration_s": 210.5,
  "sections": [...],
  "analysis_method": "allin1"
}

Python API

from audiolocators import analyze_track, extract_audio_features

# Full structure analysis
result = analyze_track("track.mp3")
print(f"BPM: {result['bpm']}")
for section in result['sections']:
    print(f"{section['time_s']}s - {section['label']}")

# Features only (faster)
features = extract_audio_features("track.mp3")
print(f"Duration: {features['duration_s']}s")
print(f"Energy segments: {len(features['energy_segments'])}")

Real-Time Progress Streaming

For long-running analyses, the API provides Server-Sent Events (SSE) for real-time progress updates:

const eventSource = new EventSource('http://localhost:8000/logs/stream');

eventSource.onmessage = (event) => {
  const log = JSON.parse(event.data);
  console.log(`[${log.source}] ${log.message}`);
  // Example: [allin1] Separating stems (vocals, drums, bass, other)...
};

Progress stages for allin1 analysis:

Starting – Pipeline initialization
Demixing – Stem separation (0-100%)
Spectrograms – Feature extraction
Analyzing – Neural network inference
Complete – Results ready

Practical Applications

Integration with WaveSurfer.js

Combine structure data with waveform visualization:

import WaveSurfer from 'wavesurfer.js';
import RegionsPlugin from 'wavesurfer.js/dist/plugins/regions.js';

const wavesurfer = WaveSurfer.create({
  container: '#waveform',
  plugins: [RegionsPlugin.create()]
});

// After analysis
structure.sections.forEach((section, i) => {
  const nextSection = structure.sections[i + 1];
  wavesurfer.addRegion({
    start: section.time_s,
    end: nextSection ? nextSection.time_s : wavesurfer.getDuration(),
    color: section.color + '40', // Add transparency
    content: section.label
  });
});

DAW Import (REAPER)

The JSON format is designed for easy DAW import. A companion Lua script for REAPER can read the output and create project markers:

-- In REAPER: Actions > Run ReaScript
-- Select: import_structure_markers.lua

Combined with MTL Ableton Analyser

Use both tools together for comprehensive music analysis:

MTL Audio Locators – Analyze raw audio files
MTL Ableton Analyser – Extract data from Ableton projects
Compare – Validate AI detection against manual markers

Accuracy Considerations

Structure detection accuracy depends on several factors:

Factor	Impact	Mitigation
Genre	Pop/EDM highest accuracy	Use genre-appropriate backend
Mixing	Clear separations help	allin1 handles complex mixes well
Length	Very long tracks may segment oddly	Consider splitting before analysis
Tempo changes	Can confuse boundary detection	Manual review recommended

For production use, we recommend:

allin1 for final output (highest accuracy)
librosa for quick previews or batch processing
Manual review for critical applications

Platform-Specific Installation Notes

The allin1 Dependency Challenge

The allin1 engine provides the highest quality analysis, but it has a complex dependency chain that currently causes compatibility issues across platforms:

allin1 1.1.0
├── natten (Neighborhood Attention)
│   └── API changed in 0.21.x, allin1 expects older API
├── madmom (Music Beat Detection)
│   └── Uses collections.MutableSequence, removed in Python 3.10+
└── demucs (stem separation)
    └── Works fine

Dependency	Problem	Impact
`natten` 0.21.x	New API incompatible with allin1	Import fails on all platforms
`madmom` 0.16.1	Uses deprecated Python API	Breaks on Python 3.10+
shi-labs.com wheels	SSL certificate expired	Can't download pre-built binaries

Current recommendation: Use librosa fallback which works reliably across all platforms. The allin1 ecosystem requires updates from upstream maintainers.

# What you'll see when running analysis:
$ poetry run analyze track.mp3
Using librosa enhanced analysis  # allin1 not available, fallback works fine

Running with Docker

We provide a Dockerfile for containerized deployment:

Dockerfile

FROM python:3.10-slim

WORKDIR /app

# System dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    ffmpeg libsndfile1 git cmake build-essential \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip Cython numpy poetry

COPY pyproject.toml poetry.lock README.md ./
RUN poetry config virtualenvs.create false \
    && poetry install --no-interaction --no-ansi --no-root

COPY src/ ./src/
RUN poetry install --no-interaction --no-ansi --only-root

EXPOSE 8000
CMD ["python", "-m", "uvicorn", "audiolocators.server:app", "--host", "0.0.0.0", "--port", "8000"]

Build and Run

# Build the image
docker build -t mtl-audiolocators .

# Run analysis (uses librosa)
docker run -v $(pwd)/tracks:/app/tracks mtl-audiolocators \
    python -c "from audiolocators import analyze_track; print(analyze_track('/app/tracks/song.mp3'))"

# Start the API server
docker run -p 8000:8000 mtl-audiolocators
curl http://localhost:8000/health
# {"status":"ok","allin1_available":false}

Version Compatibility Matrix

Python	madmom	natten	allin1	Status
3.9	0.16.1	0.17.x	1.1.0	Best chance for allin1
3.10+	0.16.1	any	1.1.0	madmom fails (deprecated API)
3.11	–	0.21.x	1.1.0	natten API mismatch
any	–	–	–	librosa fallback always works

Contributing: If you get allin1 working reliably, please share your setup! Open a PR or issue on GitHub.

Limitations

Current limitations to be aware of:

allin1 availability – Due to dependency conflicts (natten API changes, madmom Python 3.10+ incompatibility), allin1 currently doesn't work on most setups. Use librosa fallback instead.
Label accuracy – Section types (verse vs. chorus) are heuristic-based, boundaries are more reliable than labels
Genre dependency – Works best on pop/EDM with clear structure; experimental/ambient music may produce inconsistent results
Processing time – If allin1 were working, it would take 2-5x real-time on CPU
Model downloads – allin1 requires ~1.5GB model download on first run

Alternative MIR Tools

If you're exploring music structure analysis beyond MTL Audio Locators, here are other tools worth considering:

Open Source Libraries for Structure Analysis

Tool	Focus	Structure Detection	Notes
allin1	Full structure	Yes (verse, chorus, bridge)	Best quality, but dependency issues
MSAF	Segmentation	Abstract labels (A, B, C)	Academic, multiple algorithms
Essentia	Comprehensive MIR	Basic segmentation	MTG Barcelona, well maintained
librosa	Audio analysis	Heuristic-based	Always works, lower accuracy

Beat/Tempo Detection (No Structure Labels)

Tool	Focus	Notes
madmom	Beat/onset detection	Strong beat tracking, Python 3.9 only
Aubio	Lightweight MIR	Fast, C library with Python bindings
BeatNet	Beat/downbeat	Real-time, deep learning

Commercial APIs (No Structure Detection)

Important: No major commercial API currently offers song structure detection (verse/chorus/bridge). The services below provide other audio analysis features:

Service	Features	What It Does NOT Do
Cyanite.ai	Mood, genre, instruments (15s segments)	No verse/chorus labels
Musiio (Beatport)	Genre, mood, BPM	No structure detection
ACRCloud	Audio fingerprinting, recognition	No structure detection

Spotify Audio Analysis API was the only major commercial option for structure detection (sections, segments, tempo, key). It was deprecated in November 2024 and is no longer available.

Why No Commercial Structure APIs?

Song structure detection is a niche problem with limited commercial demand:

Music streaming – services use internal solutions (not exposed via API)
DJ software – built-in proprietary algorithms
Academic – research papers, not production services

This leaves open source as the primary option for structure analysis.

The Open Source MIR Maintenance Problem

Many MIR libraries suffer from a common pattern: written once, then abandoned. Here's why:

Factor	Impact
Academic origins	Most MIR tools (madmom, MSAF, allin1) come from PhD research. When the researcher graduates or changes projects, development stops.
Publication vs. maintenance	In academia, publishing a paper = success. Maintaining software ≠ academic credit.
Niche community	Small user base means few contributors and little pressure to fix bugs.
Rapid Python/ML changes	Python 3.10 removed `collections.MutableSequence`, PyTorch changes API constantly. Keeping up requires active maintenance.
No commercial sponsorship	Unlike web frameworks (React/Meta, Angular/Google), MIR tools have no corporate backing.

Real examples:

Project	Last PyPI Release	Issue
madmom	2019	Uses deprecated Python APIs
MSAF	2020	Minimal maintenance
allin1	2024	Active, but blocked by natten/madmom issues

This is why MTL Audio Locators defaults to librosa – it's the only dependency that's actively maintained and works across all Python versions.

Source Separation (Preprocessing)

Model	Source	Notes
Demucs (Meta)	GitHub	Best quality, used by allin1
Spleeter (Deezer)	GitHub	Faster, good for batch processing

What Data Can We Extract from Ableton's .als and .asd Files – Parsing Ableton project files
Exporting Ableton Live Locators to JSON with Max for Live – Manual section marking
Why We Decided to Use WaveSurfer – Audio visualization in the browser

Open Source: MTL Audio Locators is available on GitHub. Contributions welcome!