Case Study·

Batch ISRC Enrichment That Turns Messy Catalogs Into Clean Data

How we built Scout to batch-enrich music metadata via Spotify and MusicBrainz APIs, flag ISRC mismatches, and export clean CSVs for catalog evaluation and royalty reconciliation.
Batch ISRC Enrichment That Turns Messy Catalogs Into Clean Data

Key Takeaways

Scout batch-enriches 40,000+ tracks from Spotify and MusicBrainz APIs in minutes, not days.
Confidence scores range from 0.00 to 1.00: Spotify +0.50, MusicBrainz +0.30, resolved ISRC +0.20.
ISRC mismatches between sources usually mean different release editions, not data errors.
Exports a clean CSV with all enriched fields ready for rights management or investor decks.

If you have ever opened a 40,000-row CSV and found half the ISRC codes missing, inconsistent, or flat-out wrong, you know the pain. Manually looking up each track via Spotify, MusicBrainz, or ISRC lookup tools takes days. We built Scout to do it in minutes.

The Problem Every Catalog Manager Knows

Track data arrives as CSV or Excel files. They contain track names, artist names, and sometimes ISRCs. The problem is that "sometimes" is doing a lot of heavy lifting in that sentence.

Missing ISRCs
Tracks without ISRC codes can't be matched to rights holders, leading to unclaimed royalties.
Inconsistent ISRCs
The same recording may have different ISRCs across distributors, DSPs, and catalog systems.
Duplicate Entries
Multiple rows resolve to the same ISRC, inflating track counts and skewing analytics.
Manual Lookup
Checking each track one-by-one against web tools or APIs takes 2+ days for a large catalog.

This is not a niche issue. Anyone doing soundtrack acquisition, catalog evaluation, royalty reconciliation, or building investor decks hits this wall. The data exists across multiple sources, but nobody connects it at scale.

A Real-World Example: Santana's "Smooth"

To show why this matters, take a concrete case. Santana's "Smooth" appears in a royalty report with ISRC USAT29900471. When Scout looks up the same track on Spotify, it comes back as USAR19900033.

Both are valid ISRCs for the same recording. The first is from the original 1999 Arista release, the second from a later digital distribution. This happens constantly in the music industry because every release edition (original, remaster, compilation, deluxe, single) can get its own ISRC.

ISRC mismatches between sources do not mean the data is wrong. They mean the same recording exists in multiple editions. Scout flags these for review rather than treating them as errors.

Without automated enrichment, you would either miss this mismatch entirely or spend hours tracking it down manually. Multiply that by 40,000 rows and you understand why catalog managers describe this work as "soul-crushing."

What Scout Does

Scout is a feature inside MusicData Lab that batch-enriches track metadata from two authoritative sources: the Spotify API and the MusicBrainz API.

Upload and Map

Drop a CSV or Excel file. Scout auto-detects common column names (track, artist, ISRC, URL) and lets you adjust the mapping before processing.

Enrich Every Track

For each row, Scout runs a four-stage lookup pipeline:

Loading diagram...

Spotify Lookup
Searches by track URL (if present), then by track name + artist name. Returns track name, artist, album, ISRC, release date, popularity, and a direct Spotify link.
MusicBrainz Lookup
Searches by ISRC first (using the Spotify ISRC or input ISRC), then falls back to recording name + artist. Returns artist, release, ISRC, label, and country.
ISRC Resolution
Picks the best ISRC from all available sources (Spotify, MusicBrainz, input) and assigns it as the resolved identifier for the track.
Flagging
Detects ISRC mismatches between sources, artist name mismatches (using fuzzy matching), and duplicate ISRCs across rows.

Confidence Scoring

Each track gets a confidence score from 0.00 to 1.00:

SourceScore
Spotify match+0.50
MusicBrainz match+0.30
Resolved ISRC+0.20

A track matched by both Spotify and MusicBrainz with a resolved ISRC scores a perfect 1.00. Flags are informational only. They tell you something needs attention without penalizing the match quality.

Export

When processing completes, download a clean CSV with all original data plus every enriched field: Spotify metadata, MusicBrainz metadata, resolved ISRCs, status, flags, and confidence scores. Ready for analysis, investor decks, or import into your rights management system.

Processing at Scale

Scout processes files asynchronously using Celery workers. A 40,000-row file runs in the background while you continue working. The job detail page shows real-time progress without page refreshes:

Live Progress
Progress bar, track counts, and status badges update via AJAX polling every 3 seconds.
Streaming Logs
Watch Spotify and MusicBrainz lookups happen in real time in the Logs tab.
Column Filters
Search and filter results by track, artist, ISRC, status, confidence range, or flags.
CSV Export
Download enriched data as a clean, grouped CSV when processing completes.

The Flags That Matter

Scout raises four types of flags:

  • isrc_mismatch - the input ISRC, Spotify ISRC, and MusicBrainz ISRC do not all agree. Most common flag. Usually means different release editions.
  • artist_mismatch - the artist name from Spotify and MusicBrainz has a fuzzy match score below 75%. Can indicate featuring artists, name variations, or genuine data issues.
  • duplicate_isrc - multiple rows in your file resolve to the same ISRC. Important for catching double-counted tracks.
  • enrichment_error - the lookup failed for technical reasons (API timeout, rate limit, etc.).
Filter by isrc_mismatch to quickly review all tracks where your input ISRC differs from what Spotify and MusicBrainz report. These are your highest-priority reconciliation items.

From Script to Product

The inspiration for Scout came from a real conversation with a music professional doing soundtrack acquisition work. They had built a Python script that batch-processed metadata from Spotify and MusicBrainz APIs using pandas. No AI needed, just API calls and data wrangling.

It cut a 2-day manual task down to 30 minutes.

We took that exact workflow and productized it inside MusicData Lab:

No Coding Required
Upload a file, map columns, click run.
Any File Format
Works with any CSV or Excel file, auto-detects common column layouts.
Persistent Results
Jobs, tracks, and logs are stored in the database. Come back to any job weeks later.
Built for Large Catalogs
Background processing with live progress, not a script that blocks your terminal.

Who This Is For

Scout is built for anyone who deals with music metadata at scale:

  • Rights managers reconciling royalty statements across distributors
  • A&R teams evaluating catalogs for acquisition
  • Independent labels cleaning up their metadata before pitching to sync agents
  • Royalty analysts spotting unclaimed revenue from ISRC gaps
  • Catalog managers preparing clean data for investor decks or audits

If you are spending days in spreadsheets manually looking up ISRCs, Scout does that work for you and flags the problems worth your attention.

Scout is part of MusicData Lab, our music distribution analytics platform. If you are dealing with messy catalog data and want to see Scout in action, get in touch.

Let's Build Something Together

Have a similar project in mind? We'd love to hear about it.

Get in touch to discuss how we can help bring your vision to life.