Open Source·

AI Song Structure Analysis: Intro, Verse, Chorus

A technical look at automatic song structure detection using AI and signal processing. Identify song sections with MTL Audio Locators.
AI Song Structure Analysis: Intro, Verse, Chorus

When working with music at scale – whether for DJ software, music education, or content creation – one of the most valuable pieces of metadata is the song structure. Where does the chorus start? How long is the intro? When does the bridge appear? Traditionally, this information required manual tagging or expensive commercial solutions. With MTL Audio Locators, we've built an open-source tool that detects song sections automatically using AI and signal processing.

In this article, we'll explore how automatic structure analysis works and how you can use it in your projects.

The Challenge: Why Song Structure Matters

Song structure – the arrangement of sections like INTRO, VERSE, CHORUS, BRIDGE, and OUTRO – is fundamental to how we experience music. For technical applications, this data enables:

Use CaseApplication
DJ SoftwareAuto-sync to chorus, smart mixing
Music EducationVisual learning aids, section practice
Content CreationQuick navigation, highlight extraction
DAW IntegrationImport markers for editing
Recommendation SystemsStructure-aware similarity matching

The challenge is that this information rarely exists in metadata. Even when producers add markers in their DAW, these don't export with the audio file.

Available Analysis Engines

MTL Audio Locators provides three analysis backends, automatically selecting the best available option:

EngineTypeAccuracySpeedUse Case
allin1Deep LearningHighSlowerProduction-quality analysis
MSAFTraditional MIRMediumMediumResearch, comparison
librosaSpectralBasicFastFallback, quick scans

allin1 – AI-Powered Analysis

The allin1 model represents the state-of-the-art in music structure analysis. It's a deep learning model trained specifically on song segmentation tasks. The analysis pipeline:

  1. Stem separation – Isolates vocals, drums, bass, and other instruments using Demucs
  2. Spectrogram extraction – Converts each stem to time-frequency representation
  3. Structure prediction – Neural network predicts section boundaries and labels
# Using allin1 directly
import allin1

result = allin1.analyze("track.mp3")
print(f"BPM: {result.bpm}")
for segment in result.segments:
    print(f"{segment.start:.2f}s - {segment.label}")
First Run: allin1 downloads ~1.5GB of models on first use (Demucs for demixing + Harmonix for structure).

MSAF – Music Structure Analysis Framework

MSAF provides traditional Music Information Retrieval (MIR) algorithms. We use spectral flux for boundary detection combined with Fourier Magnitude Coefficients for labeling:

import msaf

boundaries, labels = msaf.process(
    "track.mp3",
    boundaries_id='sf',   # Spectral flux
    labels_id='fmc2d'     # 2D Fourier Magnitude Coefficients
)

librosa – Spectral Fallback

When neither allin1 nor MSAF is available, we use a custom algorithm based on librosa that combines multiple features:

FeaturePurpose
Chroma CQTHarmonic content (chord changes)
MFCCTimbral characteristics (texture changes)
Spectral ContrastBrightness/darkness patterns

The algorithm:

  1. Extracts all features and stacks them
  2. Uses agglomerative clustering to find segment boundaries
  3. Snaps boundaries to nearest beats for musical alignment
  4. Labels sections using position heuristics and K-means clustering

Section Labeling Convention

All backends output sections with a consistent naming convention designed for DAW compatibility:

SectionPrefixColorExample
IntroINT#808080INT1
VerseVRS#4CAF50VRS1, VRS2
Pre-ChorusPRE#8BC34APRE1
ChorusCHO#FF5722CHO1, CHO2
BridgeBRG#00BCD4BRG1
BreakdownBRD#2196F3BRD1
BuildBUI#9C27B0BUI1
OutroOUT#607D8BOUT1

The numbered suffix (VRS1, VRS2) indicates occurrence order – useful for identifying repeating sections.

Output Format

Analysis results follow a JSON schema designed for easy integration:

{
  "track_id": "my_song",
  "bpm": 128,
  "sections": [
    {"label": "INT1", "time_s": 0.0, "color": "#808080"},
    {"label": "VRS1", "time_s": 15.5, "color": "#4CAF50"},
    {"label": "CHO1", "time_s": 45.2, "color": "#FF5722"},
    {"label": "VRS2", "time_s": 75.0, "color": "#4CAF50"},
    {"label": "CHO2", "time_s": 105.3, "color": "#FF5722"},
    {"label": "OUT1", "time_s": 135.8, "color": "#607D8B"}
  ]
}

Using MTL Audio Locators

Installation

git clone https://github.com/musictechlab/mtl-audiolocators
cd mtl-audiolocators
poetry install

CLI Usage

poetry run analyze track.mp3
poetry run analyze track.mp3 -o structure.json

Sample CLI output:

Analyzing: track.mp3
Using allin1 (AI-based analysis)

========================================
Track: track
BPM: 128
Sections: 8
========================================

  0:00.00  INT1
  0:15.50  VRS1
  0:45.20  CHO1
  1:15.00  VRS2
  1:45.30  CHO2
  2:15.80  BRG1
  2:45.00  CHO3
  3:15.80  OUT1

-> Import to REAPER: Run import_structure_markers.lua
   and select: track_structure.json

REST API

For integration into web applications, MTL Audio Locators includes a FastAPI server:

poetry run serve

Available endpoints:

EndpointMethodDescription
/analyzePOSTFull structure analysis
/analyze/featuresPOSTAudio features only (BPM, energy)
/waveformPOSTWaveform data for visualization
/enginesGETList available backends
/logs/streamGETSSE stream for progress updates
/healthGETService health check

Example API call:

curl -X POST "http://localhost:8000/analyze" \
  -F "file=@track.mp3"

Response:

{
  "track_id": "track",
  "bpm": 128,
  "duration_s": 210.5,
  "sections": [...],
  "analysis_method": "allin1"
}

Python API

from audiolocators import analyze_track, extract_audio_features

# Full structure analysis
result = analyze_track("track.mp3")
print(f"BPM: {result['bpm']}")
for section in result['sections']:
    print(f"{section['time_s']}s - {section['label']}")

# Features only (faster)
features = extract_audio_features("track.mp3")
print(f"Duration: {features['duration_s']}s")
print(f"Energy segments: {len(features['energy_segments'])}")

Real-Time Progress Streaming

For long-running analyses, the API provides Server-Sent Events (SSE) for real-time progress updates:

const eventSource = new EventSource('http://localhost:8000/logs/stream');

eventSource.onmessage = (event) => {
  const log = JSON.parse(event.data);
  console.log(`[${log.source}] ${log.message}`);
  // Example: [allin1] Separating stems (vocals, drums, bass, other)...
};

Progress stages for allin1 analysis:

  1. Starting – Pipeline initialization
  2. Demixing – Stem separation (0-100%)
  3. Spectrograms – Feature extraction
  4. Analyzing – Neural network inference
  5. Complete – Results ready

Practical Applications

Integration with WaveSurfer.js

Combine structure data with waveform visualization:

import WaveSurfer from 'wavesurfer.js';
import RegionsPlugin from 'wavesurfer.js/dist/plugins/regions.js';

const wavesurfer = WaveSurfer.create({
  container: '#waveform',
  plugins: [RegionsPlugin.create()]
});

// After analysis
structure.sections.forEach((section, i) => {
  const nextSection = structure.sections[i + 1];
  wavesurfer.addRegion({
    start: section.time_s,
    end: nextSection ? nextSection.time_s : wavesurfer.getDuration(),
    color: section.color + '40', // Add transparency
    content: section.label
  });
});

DAW Import (REAPER)

The JSON format is designed for easy DAW import. A companion Lua script for REAPER can read the output and create project markers:

-- In REAPER: Actions > Run ReaScript
-- Select: import_structure_markers.lua

Combined with MTL Ableton Analyser

Use both tools together for comprehensive music analysis:

  1. MTL Audio Locators – Analyze raw audio files
  2. MTL Ableton Analyser – Extract data from Ableton projects
  3. Compare – Validate AI detection against manual markers

Accuracy Considerations

Structure detection accuracy depends on several factors:

FactorImpactMitigation
GenrePop/EDM highest accuracyUse genre-appropriate backend
MixingClear separations helpallin1 handles complex mixes well
LengthVery long tracks may segment oddlyConsider splitting before analysis
Tempo changesCan confuse boundary detectionManual review recommended

For production use, we recommend:

  • allin1 for final output (highest accuracy)
  • librosa for quick previews or batch processing
  • Manual review for critical applications

Platform-Specific Installation Notes

The allin1 Dependency Challenge

The allin1 engine provides the highest quality analysis, but it has a complex dependency chain that currently causes compatibility issues across platforms:

allin1 1.1.0
├── natten (Neighborhood Attention)
│   └── API changed in 0.21.x, allin1 expects older API
├── madmom (Music Beat Detection)
│   └── Uses collections.MutableSequence, removed in Python 3.10+
└── demucs (stem separation)
    └── Works fine
DependencyProblemImpact
natten 0.21.xNew API incompatible with allin1Import fails on all platforms
madmom 0.16.1Uses deprecated Python APIBreaks on Python 3.10+
shi-labs.com wheelsSSL certificate expiredCan't download pre-built binaries

Current recommendation: Use librosa fallback which works reliably across all platforms. The allin1 ecosystem requires updates from upstream maintainers.

# What you'll see when running analysis:
$ poetry run analyze track.mp3
Using librosa enhanced analysis  # allin1 not available, fallback works fine

Running with Docker

We provide a Dockerfile for containerized deployment:

FROM python:3.10-slim

WORKDIR /app

# System dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    ffmpeg libsndfile1 git cmake build-essential \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip Cython numpy poetry

COPY pyproject.toml poetry.lock README.md ./
RUN poetry config virtualenvs.create false \
    && poetry install --no-interaction --no-ansi --no-root

COPY src/ ./src/
RUN poetry install --no-interaction --no-ansi --only-root

EXPOSE 8000
CMD ["python", "-m", "uvicorn", "audiolocators.server:app", "--host", "0.0.0.0", "--port", "8000"]
# Build the image
docker build -t mtl-audiolocators .

# Run analysis (uses librosa)
docker run -v $(pwd)/tracks:/app/tracks mtl-audiolocators \
    python -c "from audiolocators import analyze_track; print(analyze_track('/app/tracks/song.mp3'))"

# Start the API server
docker run -p 8000:8000 mtl-audiolocators
curl http://localhost:8000/health
# {"status":"ok","allin1_available":false}

Version Compatibility Matrix

Pythonmadmomnattenallin1Status
3.90.16.10.17.x1.1.0Best chance for allin1
3.10+0.16.1any1.1.0madmom fails (deprecated API)
3.110.21.x1.1.0natten API mismatch
anylibrosa fallback always works
Contributing: If you get allin1 working reliably, please share your setup! Open a PR or issue on GitHub.

Limitations

Current limitations to be aware of:

  • allin1 availability – Due to dependency conflicts (natten API changes, madmom Python 3.10+ incompatibility), allin1 currently doesn't work on most setups. Use librosa fallback instead.
  • Label accuracy – Section types (verse vs. chorus) are heuristic-based, boundaries are more reliable than labels
  • Genre dependency – Works best on pop/EDM with clear structure; experimental/ambient music may produce inconsistent results
  • Processing time – If allin1 were working, it would take 2-5x real-time on CPU
  • Model downloads – allin1 requires ~1.5GB model download on first run

Alternative MIR Tools

If you're exploring music structure analysis beyond MTL Audio Locators, here are other tools worth considering:

Open Source Libraries for Structure Analysis

ToolFocusStructure DetectionNotes
allin1Full structureYes (verse, chorus, bridge)Best quality, but dependency issues
MSAFSegmentationAbstract labels (A, B, C)Academic, multiple algorithms
EssentiaComprehensive MIRBasic segmentationMTG Barcelona, well maintained
librosaAudio analysisHeuristic-basedAlways works, lower accuracy

Beat/Tempo Detection (No Structure Labels)

ToolFocusNotes
madmomBeat/onset detectionStrong beat tracking, Python 3.9 only
AubioLightweight MIRFast, C library with Python bindings
BeatNetBeat/downbeatReal-time, deep learning

Commercial APIs (No Structure Detection)

Important: No major commercial API currently offers song structure detection (verse/chorus/bridge). The services below provide other audio analysis features:
ServiceFeaturesWhat It Does NOT Do
Cyanite.aiMood, genre, instruments (15s segments)No verse/chorus labels
Musiio (Beatport)Genre, mood, BPMNo structure detection
ACRCloudAudio fingerprinting, recognitionNo structure detection
Spotify Audio Analysis API was the only major commercial option for structure detection (sections, segments, tempo, key). It was deprecated in November 2024 and is no longer available.

Why No Commercial Structure APIs?

Song structure detection is a niche problem with limited commercial demand:

  • Music streaming – services use internal solutions (not exposed via API)
  • DJ software – built-in proprietary algorithms
  • Academic – research papers, not production services

This leaves open source as the primary option for structure analysis.

The Open Source MIR Maintenance Problem

Many MIR libraries suffer from a common pattern: written once, then abandoned. Here's why:

FactorImpact
Academic originsMost MIR tools (madmom, MSAF, allin1) come from PhD research. When the researcher graduates or changes projects, development stops.
Publication vs. maintenanceIn academia, publishing a paper = success. Maintaining software ≠ academic credit.
Niche communitySmall user base means few contributors and little pressure to fix bugs.
Rapid Python/ML changesPython 3.10 removed collections.MutableSequence, PyTorch changes API constantly. Keeping up requires active maintenance.
No commercial sponsorshipUnlike web frameworks (React/Meta, Angular/Google), MIR tools have no corporate backing.

Real examples:

ProjectLast PyPI ReleaseIssue
madmom2019Uses deprecated Python APIs
MSAF2020Minimal maintenance
allin12024Active, but blocked by natten/madmom issues

This is why MTL Audio Locators defaults to librosa – it's the only dependency that's actively maintained and works across all Python versions.

Source Separation (Preprocessing)

ModelSourceNotes
Demucs (Meta)GitHubBest quality, used by allin1
Spleeter (Deezer)GitHubFaster, good for batch processing


Open Source: MTL Audio Locators is available on GitHub. Contributions welcome!

Interested in music analysis, AI-powered audio tools, or DAW integration? Get in touch – we're always happy to discuss music technology.

Let's Build Something Together

Have a similar project in mind? We'd love to hear about it.

Get in touch to discuss how we can help bring your vision to life.