Case Study·

Why Music Companies Need AI-Powered Analytics (And How We Built One)

Music royalty data is complex. Non-technical stakeholders need insights without SQL. Here's how we built an AI dashboard that turns plain English into charts.
Why Music Companies Need AI-Powered Analytics (And How We Built One)

Your data team knows the answer is in the database. Your A&R lead, your finance director, and your label manager do not know how to get it out. This is the gap that costs music companies real money, not in licensing fees or infrastructure, but in decisions delayed, trends missed, and reports that arrive a week too late.

We built an AI-powered analytics dashboard to close that gap. It sits inside MusicData Lab, our royalty analytics platform, and it lets anyone with access type a question in plain English and get a chart back in seconds.

The problem everybody talks about at conferences

The music industry has a data access problem. Not a data collection problem. Labels and distributors already have millions of rows of streaming data, royalty reports, and territorial breakdowns sitting in their databases. The bottleneck is getting from "I need to know which retailers drove revenue last quarter" to an actual answer.

Today, that journey typically looks like this:

  1. A business stakeholder writes an email or Slack message to the data team
  2. The data team interprets the request, writes a SQL query, exports the results
  3. Someone pastes the numbers into a spreadsheet and builds a chart
  4. The chart goes back to the stakeholder, who asks a follow-up question
  5. Repeat from step 1

This loop can take hours, sometimes days. Multiply it across every label, every territory, every reporting period, and you start to see the scale of the problem.

A 2024 IFPI report noted that the global recorded music market generated $28.6 billion in revenue, with streaming accounting for 67% of that. The volume of data behind those numbers is staggering, and growing every quarter.

What if business users could just ask?

That was the design question behind our AI Dashboard. Instead of routing every data request through a technical team, what if the platform could understand a question like "top 5 artists from the US by income" and return a bar chart?

The workflow is simple:

  1. A user types a question in the chat interface
  2. An LLM translates the question into a ClickHouse SQL query
  3. The query runs against the analytics database (read-only, with safety guardrails)
  4. Results come back as an interactive chart and a data table

No SQL knowledge required. No waiting for the data team. No spreadsheets.

Natural Language Input
Ask questions in plain English. No SQL, no training, no onboarding friction.
ClickHouse Analytics
Columnar storage handles millions of streaming records with sub-second query times.
Instant Visualization
Results render as interactive charts. Bar, line, and doughnut, selected automatically.

Why this matters for music companies specifically

Music royalty data is uniquely complex. A single label might receive reports from 13 different distributors, each with its own file format, column naming, and date conventions. Once that data is normalised and loaded into an analytics database, the schema reflects that complexity: dozens of fields covering artists, tracks, retailers, territories, currencies, and time periods.

This is precisely the kind of dataset where AI-assisted querying shines. The system prompt includes the full database schema, domain-specific hints (like "use retailer_union for normalised retailer names" or "use final_income for currency-converted revenue"), and few-shot examples that teach the model how to write correct ClickHouse SQL.

The result is that even someone who has never seen a database can ask:

  • "Monthly income trend for 2024" and get a line chart
  • "Income by country" and get a ranked bar chart
  • "Top 10 tracks by streams" and get a sortable table with the data

Data privacy: keeping everything local

Here is where most "AI analytics" solutions stumble. They require sending your data, or at least your queries, to a third-party API. For a music company handling confidential royalty data, artist earnings, and pre-release catalogue information, that is often a non-starter.

Our architecture solves this with a pluggable LLM design:

Ollama (Local, Default)
Open-source LLM runtime running in Docker alongside the application. All inference happens on your infrastructure. Zero data exposure.
Claude API (Optional)
Anthropic's Claude API for higher-quality SQL generation. Swap in with a single environment variable change.

Switching between providers is a configuration change, not a code change. The same application code, the same security layer, the same chart rendering.

The pluggable pattern means you can start with Ollama for privacy, evaluate quality, and switch to Claude API for specific use cases without any migration work.

Security by design, not by afterthought

Letting an AI write SQL that runs against your production database sounds risky. It is, if you do it naively. Our approach layers multiple security controls:

SQL Guard validates every query before execution:

  • Only SELECT statements are allowed. Any INSERT, UPDATE, DELETE, DROP, or DDL keyword is blocked
  • Queries are restricted to a single analytics table (no access to user accounts, credentials, or other application data)
  • Result sets are capped at 100 rows, with LIMIT enforced automatically
  • Comments, semicolons, and multi-statement queries are stripped or rejected

Read-only execution provides a second layer:

  • ClickHouse is configured with readonly=1 at the connection level
  • Query execution is limited to 10 seconds
  • Maximum result rows are enforced at the database level

This belt-and-suspenders approach means that even if the LLM generates a malicious query (unlikely, but possible), it cannot modify data, access unauthorised tables, or run expensive long-running operations.

No AI system should have write access to production data. Always enforce read-only execution at both the application layer and the database layer.

Why ClickHouse for music analytics

We chose ClickHouse as the analytics engine for MusicData Lab because music streaming data is a textbook columnar analytics workload: append-only, time-series, high-volume, and query-heavy.

A typical label might have 10 to 50 million rows of streaming data, partitioned by month. Common queries aggregate by artist, retailer, territory, or time period. ClickHouse handles these in milliseconds where PostgreSQL would take seconds or minutes.

Key advantages for music data:

  • MergeTree engine with monthly partitioning matches the natural cadence of royalty reporting
  • Low-cardinality string optimisation is perfect for fields like retailer_union, artist, and country_code that have a bounded set of values
  • Columnar compression keeps storage costs low even as data grows to hundreds of millions of rows
  • HTTP API makes it straightforward to build read-only query interfaces with proper access controls

What we learned building this

Three insights from the implementation that apply to any company considering AI-powered analytics:

1. The prompt is the product

The quality of SQL generation depends almost entirely on the system prompt. Including the full schema, domain-specific hints, and few-shot examples made the difference between "sometimes works" and "reliably useful." We dynamically build the prompt from the Django model metadata, so it stays in sync with schema changes automatically.

2. Start local, scale to cloud

Running Ollama locally for development and testing removed the biggest adoption barrier: "we can't send data to an external API." Once stakeholders see the value, the conversation about using a cloud API for better quality becomes much easier.

3. Security is a feature, not a constraint

The SQL Guard and read-only execution are not just safety nets. They are what made the business comfortable deploying this. When your CFO asks "can this AI delete our data?", the answer needs to be a confident "no, here's why."

Who is this for?

This is not a product launch. It is a proof of concept that we built for our own platform and for our clients. If you recognise any of these situations, it might be relevant to you:

  • Your data team spends too much time answering ad-hoc reporting requests
  • Business stakeholders wait days for insights that should take seconds
  • You have sensitive royalty or financial data and cannot use cloud-based AI tools
  • You already have analytics data in ClickHouse, PostgreSQL, or a similar database and want to make it more accessible

At MusicTech Lab, we build data platforms for the music industry. The AI Dashboard is one piece of a larger system that handles royalty ingestion, normalisation, currency conversion, and reporting. If this resonates, we should talk.

What comes next

This is a v1. The underlying pattern, natural language to SQL to visualisation, is not limited to music data. Any company with a structured analytics database can benefit from making that data conversational. The technology is ready. The question is whether your organisation is ready to let business users ask their own questions.

Let's Build Something Together

Have a similar project in mind? We'd love to hear about it.

Get in touch to discuss how we can help bring your vision to life.