Distribution·

13 Distributors, 5 File Formats, Zero Standards -The Reality of Music Royalty Data

Every month, independent labels receive royalty reports from over a dozen distributors. No two look the same. Here's what that actually looks like.
13 Distributors, 5 File Formats, Zero Standards -The Reality of Music Royalty Data

Every month, an independent label receives royalty reports from over a dozen distributors. Not a single one looks the same.

This isn't a hypothetical. This is what a real download folder looks like when you work with music royalty data at scale.

The wall of files

Here's a small sample of actual filenames from a single label's monthly data intake -anonymised, but otherwise untouched:

DistributorExample FilenameFormatFile Size
FUGAFUGA_Statement_June_2024.xlsx.xlsx~1 MB
ADASR1_Distribution_Aug_24_-_00054061_-_2024-8.xlsb.xlsb~70 KB
Ingrooves20240801-1496-DS-GBP_Digital_Sales.csv.csvup to 226 MB
The OrchardThe_Orchard20240821_Jun2024_fullreport_catalogue_US.xls.xlsup to 700 MB
Bandcampbandcamp_rev_report_20240801-20240831.csv.csv~6 KB
MVDMVD_Statement_DigitalSales_2024-07.xls.xls~12 KB
MVDMVD_Statement_DigitalSales_2024-07.xlsx.xlsx~12 KB
EmeraldEmerald_202408_DSR.csv.csv~46 MB
Safari RecordsSafari_Records_202408_DSR.xlsx.xlsx~2 MB
ADA (legacy)ADAOCT1.XLS.XLSup to 150 MB
MACMAC_Developments_iTunes_August_2024.xlsx.xlsx~49 KB
AbsoluteAbsolute_2024021.CSV.CSVup to 226 MB
QelloDetailedSheet_Records_Ltd_20240801_20240831.xlsx.xlsx~10 KB
SFMsfmaug2024.xlsx.xlsx~2.5 MB
BOFMBOFM_Aug2024.xlsx.xlsx~2.5 MB
Dome RecordsDome_Records_202408_DSR.csv.csv~1 MB
MDRMDR_May-2024_65634.92_Records.xlsx.xlsx~500 KB
MerlinMerlin_Nov24_eg.for.jack.xlsx.xlsx~703 KB

That's 18 adapters across 500+ files per year -each with its own naming convention, file format, and internal structure. From a 6 KB Bandcamp CSV to a single Orchard report that can reach 700 MB.

Spot the pattern

Go ahead, try. You won't find one.

5 file formats

.xlsx · .xlsb · .xls · .XLS · .csv · .CSV

6 date conventions in filenames

2024-07 · 202408 · Aug_24 · August_2024 · 20240801-20240831 · 2024021

Same report, multiple formats

Some distributors send both .xls and .xlsx versions of the exact same data.

No naming standard

camelCase, ALLCAPS, underscores, hyphens, internal reference numbers, random hash suffixes.

And that's just the filenames. Open these files and you'll find different column names for the same data, different date formats inside the cells, different encodings, and multi-sheet workbooks where each sheet follows its own rules.

Why this matters

Someone has to make sense of all this. Every month.

For most independent labels, that means hours of manual work -copying data between spreadsheets, reformatting dates, matching column names, fixing encoding issues that turn artist names into garbled text.

The cost isn't just time. It's delayed royalty payments to artists. It's reporting errors that erode trust. It's the finance team spending their week on data cleanup instead of analysis.

One distributor changed their report format mid-year without notice. The same filename pattern, but completely different column structure inside. Manual processes break silently when this happens.

How teams try to solve this

There's more than one way to tackle this problem. Here's how the most common approaches compare:

ApproachSetup effortMaintenanceHandles format changesScales with new sources
Manual spreadsheetsNoneHours every monthBreaks silentlyEvery new source = more hours
Generic ETL tools (Fivetran, Airbyte)MediumLowLimited - connectors are genericOnly if a connector exists
Custom Python scriptsHighHigh - fragile, hard to maintainDepends on the developerEvery new source = new script
Adapter-based pipelineHigh upfrontLow - each adapter is isolatedAdapter update, no side effectsAdd an adapter, done

Generic ETL tools work well for standardised APIs and databases. But music royalty data doesn't come from APIs - it comes from email attachments, FTP servers, and distributor portals. Each source is its own special case. That's why an adapter-based approach wins here: each distributor gets its own parser, isolated from the rest, easy to update when formats change.

One clean dataset

Here's what the pipeline looks like in practice:

Loading diagram...

Every file goes through its format-specific adapter - handling encoding, column mapping, date parsing, and multi-sheet logic. What comes out the other side is one consistent dataset: same columns, same date format, same encoding. Ready for analysis, reporting, and artist payouts.

That's what we build at MusicTech Lab. Not another dashboard on top of messy data - but the data layer underneath that turns chaos into clarity.

Looks familiar?

If your monthly royalty workflow involves more spreadsheet wrangling than actual analysis, we should talk. We've built data pipelines for independent labels handling exactly this kind of complexity - and we can do the same for you.

Let's Build Something Together

Have a similar project in mind? We'd love to hear about it.

Get in touch to discuss how we can help bring your vision to life.