
Every month, an independent label receives royalty reports from over a dozen distributors. Not a single one looks the same.
This isn't a hypothetical. This is what a real download folder looks like when you work with music royalty data at scale.
Here's a small sample of actual filenames from a single label's monthly data intake -anonymised, but otherwise untouched:
| Distributor | Example Filename | Format | File Size |
|---|---|---|---|
| FUGA | FUGA_Statement_June_2024.xlsx | .xlsx | ~1 MB |
| ADA | SR1_Distribution_Aug_24_-_00054061_-_2024-8.xlsb | .xlsb | ~70 KB |
| Ingrooves | 20240801-1496-DS-GBP_Digital_Sales.csv | .csv | up to 226 MB |
| The Orchard | The_Orchard20240821_Jun2024_fullreport_catalogue_US.xls | .xls | up to 700 MB |
| Bandcamp | bandcamp_rev_report_20240801-20240831.csv | .csv | ~6 KB |
| MVD | MVD_Statement_DigitalSales_2024-07.xls | .xls | ~12 KB |
| MVD | MVD_Statement_DigitalSales_2024-07.xlsx | .xlsx | ~12 KB |
| Emerald | Emerald_202408_DSR.csv | .csv | ~46 MB |
| Safari Records | Safari_Records_202408_DSR.xlsx | .xlsx | ~2 MB |
| ADA (legacy) | ADAOCT1.XLS | .XLS | up to 150 MB |
| MAC | MAC_Developments_iTunes_August_2024.xlsx | .xlsx | ~49 KB |
| Absolute | Absolute_2024021.CSV | .CSV | up to 226 MB |
| Qello | DetailedSheet_Records_Ltd_20240801_20240831.xlsx | .xlsx | ~10 KB |
| SFM | sfmaug2024.xlsx | .xlsx | ~2.5 MB |
| BOFM | BOFM_Aug2024.xlsx | .xlsx | ~2.5 MB |
| Dome Records | Dome_Records_202408_DSR.csv | .csv | ~1 MB |
| MDR | MDR_May-2024_65634.92_Records.xlsx | .xlsx | ~500 KB |
| Merlin | Merlin_Nov24_eg.for.jack.xlsx | .xlsx | ~703 KB |
That's 18 adapters across 500+ files per year -each with its own naming convention, file format, and internal structure. From a 6 KB Bandcamp CSV to a single Orchard report that can reach 700 MB.
Go ahead, try. You won't find one.
5 file formats
.xlsx · .xlsb · .xls · .XLS · .csv · .CSV
6 date conventions in filenames
2024-07 · 202408 · Aug_24 · August_2024 · 20240801-20240831 · 2024021
Same report, multiple formats
Some distributors send both .xls and .xlsx versions of the exact same data.
No naming standard
camelCase, ALLCAPS, underscores, hyphens, internal reference numbers, random hash suffixes.
And that's just the filenames. Open these files and you'll find different column names for the same data, different date formats inside the cells, different encodings, and multi-sheet workbooks where each sheet follows its own rules.
Someone has to make sense of all this. Every month.
For most independent labels, that means hours of manual work -copying data between spreadsheets, reformatting dates, matching column names, fixing encoding issues that turn artist names into garbled text.
The cost isn't just time. It's delayed royalty payments to artists. It's reporting errors that erode trust. It's the finance team spending their week on data cleanup instead of analysis.

There's more than one way to tackle this problem. Here's how the most common approaches compare:
| Approach | Setup effort | Maintenance | Handles format changes | Scales with new sources |
|---|---|---|---|---|
| Manual spreadsheets | None | Hours every month | Breaks silently | Every new source = more hours |
| Generic ETL tools (Fivetran, Airbyte) | Medium | Low | Limited - connectors are generic | Only if a connector exists |
| Custom Python scripts | High | High - fragile, hard to maintain | Depends on the developer | Every new source = new script |
| Adapter-based pipeline | High upfront | Low - each adapter is isolated | Adapter update, no side effects | Add an adapter, done |
Generic ETL tools work well for standardised APIs and databases. But music royalty data doesn't come from APIs - it comes from email attachments, FTP servers, and distributor portals. Each source is its own special case. That's why an adapter-based approach wins here: each distributor gets its own parser, isolated from the rest, easy to update when formats change.
Here's what the pipeline looks like in practice:
Loading diagram...
Every file goes through its format-specific adapter - handling encoding, column mapping, date parsing, and multi-sheet logic. What comes out the other side is one consistent dataset: same columns, same date format, same encoding. Ready for analysis, reporting, and artist payouts.
That's what we build at MusicTech Lab. Not another dashboard on top of messy data - but the data layer underneath that turns chaos into clarity.
If your monthly royalty workflow involves more spreadsheet wrangling than actual analysis, we should talk. We've built data pipelines for independent labels handling exactly this kind of complexity - and we can do the same for you.
Have a similar project in mind? We'd love to hear about it.
Get in touch to discuss how we can help bring your vision to life.
Walkative 2.0 Global Booking Engine
How we built a cloud-based booking platform for the leading free walking tour company, handling thousands of weekly reservations across European cities.
830 Ways to Say Spotify - Normalizing Music Streaming Data
After solving the file format problem, the data inside is just as messy. Different names for the same platforms, labels, currencies, and territories. Here's how we normalize it.
Get music tech insights, case studies, and industry news delivered to your inbox.