Music Royalty Data Chaos: 13 Distributors, 5 Formats

Every month, an independent label receives royalty reports from over a dozen distributors. Not a single one looks the same.

This isn't a hypothetical. This is what a real download folder looks like when you work with music royalty data at scale.

The wall of files

Here's a small sample of actual filenames from a single label's monthly data intake -anonymised, but otherwise untouched:

Distributor	Example Filename	Format	File Size
FUGA	`FUGA_Statement_June_2024.xlsx`	.xlsx	~1 MB
ADA	`SR1_Distribution_Aug_24_-_00054061_-_2024-8.xlsb`	.xlsb	~70 KB
Ingrooves	`20240801-1496-DS-GBP_Digital_Sales.csv`	.csv	up to 226 MB
The Orchard	`The_Orchard20240821_Jun2024_fullreport_catalogue_US.xls`	.xls	up to 700 MB
Bandcamp	`bandcamp_rev_report_20240801-20240831.csv`	.csv	~6 KB
MVD	`MVD_Statement_DigitalSales_2024-07.xls`	.xls	~12 KB
MVD	`MVD_Statement_DigitalSales_2024-07.xlsx`	.xlsx	~12 KB
Emerald	`Emerald_202408_DSR.csv`	.csv	~46 MB
Safari Records	`Safari_Records_202408_DSR.xlsx`	.xlsx	~2 MB
ADA (legacy)	`ADAOCT1.XLS`	.XLS	up to 150 MB
MAC	`MAC_Developments_iTunes_August_2024.xlsx`	.xlsx	~49 KB
Absolute	`Absolute_2024021.CSV`	.CSV	up to 226 MB
Qello	`DetailedSheet_Records_Ltd_20240801_20240831.xlsx`	.xlsx	~10 KB
SFM	`sfmaug2024.xlsx`	.xlsx	~2.5 MB
BOFM	`BOFM_Aug2024.xlsx`	.xlsx	~2.5 MB
Dome Records	`Dome_Records_202408_DSR.csv`	.csv	~1 MB
MDR	`MDR_May-2024_65634.92_Records.xlsx`	.xlsx	~500 KB
Merlin	`Merlin_Nov24_eg.for.jack.xlsx`	.xlsx	~703 KB

That's 18 adapters across 500+ files per year -each with its own naming convention, file format, and internal structure. From a 6 KB Bandcamp CSV to a single Orchard report that can reach 700 MB.

Spot the pattern

Go ahead, try. You won't find one.

5 file formats

`.xlsx` · `.xlsb` · `.xls` · `.XLS` · `.csv` · `.CSV`

6 date conventions in filenames

`2024-07` · `202408` · `Aug_24` · `August_2024` · `20240801-20240831` · `2024021`

Same report, multiple formats

Some distributors send both `.xls` and `.xlsx` versions of the exact same data.

No naming standard

camelCase, ALLCAPS, underscores, hyphens, internal reference numbers, random hash suffixes.

And that's just the filenames. Open these files and you'll find different column names for the same data, different date formats inside the cells, different encodings, and multi-sheet workbooks where each sheet follows its own rules.

Why this matters

Someone has to make sense of all this. Every month.

For most independent labels, that means hours of manual work -copying data between spreadsheets, reformatting dates, matching column names, fixing encoding issues that turn artist names into garbled text.

The cost isn't just time. It's delayed royalty payments to artists. It's reporting errors that erode trust. It's the finance team spending their week on data cleanup instead of analysis.

One distributor changed their report format mid-year without notice. The same filename pattern, but completely different column structure inside. Manual processes break silently when this happens.

Someone working on a laptop with spreadsheet data

How teams try to solve this

There's more than one way to tackle this problem. Here's how the most common approaches compare:

Approach	Setup effort	Maintenance	Handles format changes	Scales with new sources
Manual spreadsheets	None	Hours every month	Breaks silently	Every new source = more hours
Generic ETL tools (Fivetran, Airbyte)	Medium	Low	Limited - connectors are generic	Only if a connector exists
Custom Python scripts	High	High - fragile, hard to maintain	Depends on the developer	Every new source = new script
Adapter-based pipeline	High upfront	Low - each adapter is isolated	Adapter update, no side effects	Add an adapter, done

Generic ETL tools work well for standardised APIs and databases. But music royalty data doesn't come from APIs - it comes from email attachments, FTP servers, and distributor portals. Each source is its own special case. That's why an adapter-based approach wins here: each distributor gets its own parser, isolated from the rest, easy to update when formats change.

One clean dataset

Here's what the pipeline looks like in practice:

Loading diagram...

Every file goes through its format-specific adapter - handling encoding, column mapping, date parsing, and multi-sheet logic. What comes out the other side is one consistent dataset: same columns, same date format, same encoding. Ready for analysis, reporting, and artist payouts.

But consistent columns are only the beginning. The data inside those files is just as messy - 830 raw values that need mapping to 19 canonical names before you can run a single meaningful query. And once the data is truly clean, it opens the door to AI-powered analytics where business users ask questions in plain English and get charts back in seconds.

That's what we build at MusicTech Lab. Not another dashboard on top of messy data - but the data layer underneath that turns chaos into clarity.

Looks familiar?

If your monthly royalty workflow involves more spreadsheet wrangling than actual analysis, we should talk. We've built data pipelines for independent labels handling exactly this kind of complexity - and we can do the same for you.

13 Distributors, 5 File Formats, Zero Standards -The Reality of Music Royalty Data

Key Takeaways

The wall of files

Spot the pattern

Why this matters

How teams try to solve this

One clean dataset

Looks familiar?

Let's Build Something Together

Related Articles

Share this article

Newsletter