
When working with music at scale – whether for DJ software, music education, or content creation – one of the most valuable pieces of metadata is the song structure. Where does the chorus start? How long is the intro? When does the bridge appear? Traditionally, this information required manual tagging or expensive commercial solutions. With MTL Audio Locators, we've built an open-source tool that detects song sections automatically using AI and signal processing.
In this article, we'll explore how automatic structure analysis works and how you can use it in your projects.
Song structure – the arrangement of sections like INTRO, VERSE, CHORUS, BRIDGE, and OUTRO – is fundamental to how we experience music. For technical applications, this data enables:
| Use Case | Application |
|---|---|
| DJ Software | Auto-sync to chorus, smart mixing |
| Music Education | Visual learning aids, section practice |
| Content Creation | Quick navigation, highlight extraction |
| DAW Integration | Import markers for editing |
| Recommendation Systems | Structure-aware similarity matching |
The challenge is that this information rarely exists in metadata. Even when producers add markers in their DAW, these don't export with the audio file.
MTL Audio Locators provides three analysis backends, automatically selecting the best available option:
| Engine | Type | Accuracy | Speed | Use Case |
|---|---|---|---|---|
| allin1 | Deep Learning | High | Slower | Production-quality analysis |
| MSAF | Traditional MIR | Medium | Medium | Research, comparison |
| librosa | Spectral | Basic | Fast | Fallback, quick scans |
The allin1 model represents the state-of-the-art in music structure analysis. It's a deep learning model trained specifically on song segmentation tasks. The analysis pipeline:
# Using allin1 directly
import allin1
result = allin1.analyze("track.mp3")
print(f"BPM: {result.bpm}")
for segment in result.segments:
print(f"{segment.start:.2f}s - {segment.label}")
MSAF provides traditional Music Information Retrieval (MIR) algorithms. We use spectral flux for boundary detection combined with Fourier Magnitude Coefficients for labeling:
import msaf
boundaries, labels = msaf.process(
"track.mp3",
boundaries_id='sf', # Spectral flux
labels_id='fmc2d' # 2D Fourier Magnitude Coefficients
)
When neither allin1 nor MSAF is available, we use a custom algorithm based on librosa that combines multiple features:
| Feature | Purpose |
|---|---|
| Chroma CQT | Harmonic content (chord changes) |
| MFCC | Timbral characteristics (texture changes) |
| Spectral Contrast | Brightness/darkness patterns |
The algorithm:
All backends output sections with a consistent naming convention designed for DAW compatibility:
| Section | Prefix | Color | Example |
|---|---|---|---|
| Intro | INT | #808080 | INT1 |
| Verse | VRS | #4CAF50 | VRS1, VRS2 |
| Pre-Chorus | PRE | #8BC34A | PRE1 |
| Chorus | CHO | #FF5722 | CHO1, CHO2 |
| Bridge | BRG | #00BCD4 | BRG1 |
| Breakdown | BRD | #2196F3 | BRD1 |
| Build | BUI | #9C27B0 | BUI1 |
| Outro | OUT | #607D8B | OUT1 |
The numbered suffix (VRS1, VRS2) indicates occurrence order – useful for identifying repeating sections.
Analysis results follow a JSON schema designed for easy integration:
{
"track_id": "my_song",
"bpm": 128,
"sections": [
{"label": "INT1", "time_s": 0.0, "color": "#808080"},
{"label": "VRS1", "time_s": 15.5, "color": "#4CAF50"},
{"label": "CHO1", "time_s": 45.2, "color": "#FF5722"},
{"label": "VRS2", "time_s": 75.0, "color": "#4CAF50"},
{"label": "CHO2", "time_s": 105.3, "color": "#FF5722"},
{"label": "OUT1", "time_s": 135.8, "color": "#607D8B"}
]
}

git clone https://github.com/musictechlab/mtl-audiolocators
cd mtl-audiolocators
poetry install
poetry run analyze track.mp3
poetry run analyze track.mp3 -o structure.json
Sample CLI output:
Analyzing: track.mp3
Using allin1 (AI-based analysis)
========================================
Track: track
BPM: 128
Sections: 8
========================================
0:00.00 INT1
0:15.50 VRS1
0:45.20 CHO1
1:15.00 VRS2
1:45.30 CHO2
2:15.80 BRG1
2:45.00 CHO3
3:15.80 OUT1
-> Import to REAPER: Run import_structure_markers.lua
and select: track_structure.json
For integration into web applications, MTL Audio Locators includes a FastAPI server:
poetry run serve
Available endpoints:
| Endpoint | Method | Description |
|---|---|---|
/analyze | POST | Full structure analysis |
/analyze/features | POST | Audio features only (BPM, energy) |
/waveform | POST | Waveform data for visualization |
/engines | GET | List available backends |
/logs/stream | GET | SSE stream for progress updates |
/health | GET | Service health check |
Example API call:
curl -X POST "http://localhost:8000/analyze" \
-F "file=@track.mp3"
Response:
{
"track_id": "track",
"bpm": 128,
"duration_s": 210.5,
"sections": [...],
"analysis_method": "allin1"
}
from audiolocators import analyze_track, extract_audio_features
# Full structure analysis
result = analyze_track("track.mp3")
print(f"BPM: {result['bpm']}")
for section in result['sections']:
print(f"{section['time_s']}s - {section['label']}")
# Features only (faster)
features = extract_audio_features("track.mp3")
print(f"Duration: {features['duration_s']}s")
print(f"Energy segments: {len(features['energy_segments'])}")
For long-running analyses, the API provides Server-Sent Events (SSE) for real-time progress updates:
const eventSource = new EventSource('http://localhost:8000/logs/stream');
eventSource.onmessage = (event) => {
const log = JSON.parse(event.data);
console.log(`[${log.source}] ${log.message}`);
// Example: [allin1] Separating stems (vocals, drums, bass, other)...
};
Progress stages for allin1 analysis:
Combine structure data with waveform visualization:
import WaveSurfer from 'wavesurfer.js';
import RegionsPlugin from 'wavesurfer.js/dist/plugins/regions.js';
const wavesurfer = WaveSurfer.create({
container: '#waveform',
plugins: [RegionsPlugin.create()]
});
// After analysis
structure.sections.forEach((section, i) => {
const nextSection = structure.sections[i + 1];
wavesurfer.addRegion({
start: section.time_s,
end: nextSection ? nextSection.time_s : wavesurfer.getDuration(),
color: section.color + '40', // Add transparency
content: section.label
});
});
The JSON format is designed for easy DAW import. A companion Lua script for REAPER can read the output and create project markers:
-- In REAPER: Actions > Run ReaScript
-- Select: import_structure_markers.lua
Use both tools together for comprehensive music analysis:
Structure detection accuracy depends on several factors:
| Factor | Impact | Mitigation |
|---|---|---|
| Genre | Pop/EDM highest accuracy | Use genre-appropriate backend |
| Mixing | Clear separations help | allin1 handles complex mixes well |
| Length | Very long tracks may segment oddly | Consider splitting before analysis |
| Tempo changes | Can confuse boundary detection | Manual review recommended |
For production use, we recommend:
The allin1 engine provides the highest quality analysis, but it has a complex dependency chain that currently causes compatibility issues across platforms:
allin1 1.1.0
├── natten (Neighborhood Attention)
│ └── API changed in 0.21.x, allin1 expects older API
├── madmom (Music Beat Detection)
│ └── Uses collections.MutableSequence, removed in Python 3.10+
└── demucs (stem separation)
└── Works fine
| Dependency | Problem | Impact |
|---|---|---|
natten 0.21.x | New API incompatible with allin1 | Import fails on all platforms |
madmom 0.16.1 | Uses deprecated Python API | Breaks on Python 3.10+ |
| shi-labs.com wheels | SSL certificate expired | Can't download pre-built binaries |
Current recommendation: Use librosa fallback which works reliably across all platforms. The allin1 ecosystem requires updates from upstream maintainers.
# What you'll see when running analysis:
$ poetry run analyze track.mp3
Using librosa enhanced analysis # allin1 not available, fallback works fine
We provide a Dockerfile for containerized deployment:
FROM python:3.10-slim
WORKDIR /app
# System dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
ffmpeg libsndfile1 git cmake build-essential \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip Cython numpy poetry
COPY pyproject.toml poetry.lock README.md ./
RUN poetry config virtualenvs.create false \
&& poetry install --no-interaction --no-ansi --no-root
COPY src/ ./src/
RUN poetry install --no-interaction --no-ansi --only-root
EXPOSE 8000
CMD ["python", "-m", "uvicorn", "audiolocators.server:app", "--host", "0.0.0.0", "--port", "8000"]
# Build the image
docker build -t mtl-audiolocators .
# Run analysis (uses librosa)
docker run -v $(pwd)/tracks:/app/tracks mtl-audiolocators \
python -c "from audiolocators import analyze_track; print(analyze_track('/app/tracks/song.mp3'))"
# Start the API server
docker run -p 8000:8000 mtl-audiolocators
curl http://localhost:8000/health
# {"status":"ok","allin1_available":false}
| Python | madmom | natten | allin1 | Status |
|---|---|---|---|---|
| 3.9 | 0.16.1 | 0.17.x | 1.1.0 | Best chance for allin1 |
| 3.10+ | 0.16.1 | any | 1.1.0 | madmom fails (deprecated API) |
| 3.11 | – | 0.21.x | 1.1.0 | natten API mismatch |
| any | – | – | – | librosa fallback always works |
Current limitations to be aware of:
If you're exploring music structure analysis beyond MTL Audio Locators, here are other tools worth considering:
| Tool | Focus | Structure Detection | Notes |
|---|---|---|---|
| allin1 | Full structure | Yes (verse, chorus, bridge) | Best quality, but dependency issues |
| MSAF | Segmentation | Abstract labels (A, B, C) | Academic, multiple algorithms |
| Essentia | Comprehensive MIR | Basic segmentation | MTG Barcelona, well maintained |
| librosa | Audio analysis | Heuristic-based | Always works, lower accuracy |
| Tool | Focus | Notes |
|---|---|---|
| madmom | Beat/onset detection | Strong beat tracking, Python 3.9 only |
| Aubio | Lightweight MIR | Fast, C library with Python bindings |
| BeatNet | Beat/downbeat | Real-time, deep learning |
| Service | Features | What It Does NOT Do |
|---|---|---|
| Cyanite.ai | Mood, genre, instruments (15s segments) | No verse/chorus labels |
| Musiio (Beatport) | Genre, mood, BPM | No structure detection |
| ACRCloud | Audio fingerprinting, recognition | No structure detection |
Song structure detection is a niche problem with limited commercial demand:
This leaves open source as the primary option for structure analysis.
Many MIR libraries suffer from a common pattern: written once, then abandoned. Here's why:
| Factor | Impact |
|---|---|
| Academic origins | Most MIR tools (madmom, MSAF, allin1) come from PhD research. When the researcher graduates or changes projects, development stops. |
| Publication vs. maintenance | In academia, publishing a paper = success. Maintaining software ≠ academic credit. |
| Niche community | Small user base means few contributors and little pressure to fix bugs. |
| Rapid Python/ML changes | Python 3.10 removed collections.MutableSequence, PyTorch changes API constantly. Keeping up requires active maintenance. |
| No commercial sponsorship | Unlike web frameworks (React/Meta, Angular/Google), MIR tools have no corporate backing. |
Real examples:
| Project | Last PyPI Release | Issue |
|---|---|---|
| madmom | 2019 | Uses deprecated Python APIs |
| MSAF | 2020 | Minimal maintenance |
| allin1 | 2024 | Active, but blocked by natten/madmom issues |
This is why MTL Audio Locators defaults to librosa – it's the only dependency that's actively maintained and works across all Python versions.
| Model | Source | Notes |
|---|---|---|
| Demucs (Meta) | GitHub | Best quality, used by allin1 |
| Spleeter (Deezer) | GitHub | Faster, good for batch processing |
Interested in music analysis, AI-powered audio tools, or DAW integration? Get in touch – we're always happy to discuss music technology.
Have a similar project in mind? We'd love to hear about it.
Get in touch to discuss how we can help bring your vision to life.
Walkative 2.0 Global Booking Engine
How we built a cloud-based booking platform for the leading free walking tour company, handling thousands of weekly reservations across European cities.
Building a Claude Skill for DDEX Validation: Automate Music Metadata Checks with AI
A step-by-step guide to creating a Claude Code skill that validates DDEX ERN XML files against official schemas - catching metadata errors before they reach distributors.