Distribution·

Which Database for Music Data? Redshift vs BigQuery vs ClickHouse and When to Use Each

A practical comparison of BigQuery, Redshift, and ClickHouse for music industry workloads, from royalty reporting to real-time streaming analytics.
Which Database for Music Data? Redshift vs BigQuery vs ClickHouse and When to Use Each

Key Takeaways

BigQuery is the best fit for financial reporting, royalty reconciliation, and teams without dedicated DevOps.
ClickHouse delivers sub-second queries on billions of rows, ideal for real-time streaming dashboards.
Redshift fits organizations already invested in the AWS ecosystem with existing data engineering teams.
The right choice depends on your workload pattern, team size, and where your data already lives.

Music companies sit on mountains of data. Streaming royalties from dozens of DSPs. Multi-currency transactions across territories. Financial records that need to reconcile between royalty systems and general ledgers. The database you choose to store and query this data shapes everything downstream: how fast your dashboards load, how much you pay per month, and whether your finance team can actually get answers without filing a ticket.

We have built music data systems on Elasticsearch, ClickHouse, and BigQuery. Each one solved a different problem. This article breaks down when to use which, with real pricing numbers and performance benchmarks so you can make the right call for your catalog.

What makes music data different

Before comparing databases, it helps to understand why music data is uniquely challenging. A typical mid-size label deals with several patterns that push general-purpose databases to their limits.

High Volume
A single quarterly DSP statement can contain millions of rows. One track, dozens of territories, multiple income types.
Multi-Currency
Revenue arrives in GBP, gets converted to USD, reported in EUR, and settled in the artist's local currency.
Complex Relationships
Track to release to label to artist to contract. Each level has its own royalty rates, reserves, and deductions.
Dual Temporal
Every transaction has two dates: when the DSP reported it (cash basis) and when the stream actually happened (accrual basis).

These patterns mean your database needs to handle complex joins, large aggregations, and flexible time-based queries. Not every database does this equally well.

The three contenders

BigQuery: the serverless warehouse

Google BigQuery is a fully managed, serverless data warehouse. You do not provision clusters, tune nodes, or worry about scaling. You write SQL, BigQuery figures out the rest.

Architecture: Columnar storage with a distributed query engine. Separates storage and compute, so you pay independently for each. Data lives in Google Cloud Storage under the hood.

What it does well for music companies:

  • Complex SQL joins across 15+ dimension tables (labels, artists, tracks, territories, DSPs)
  • dbt integration is first-class, with native support for the staging-intermediate-marts pattern
  • Handles 50M+ row fact tables without breaking a sweat
  • Native connectors to Looker, Sigma Computing, and Looker Studio for BI
  • Zero infrastructure management, your finance team never waits on DevOps

Where it falls short:

  • Query latency is typically 1-2 seconds, not sub-second
  • On-demand pricing can spike if analysts run unoptimized queries on large tables
  • Not ideal for high-concurrency, user-facing dashboards with hundreds of simultaneous users
BigQuery offers a 1 TB/month free tier for queries. For a small label just getting started with analytics, you can run a meaningful data warehouse for nearly zero cost.

Redshift: the AWS powerhouse

Amazon Redshift is a cluster-based columnar data warehouse built on PostgreSQL. It has been the default choice for AWS-native organizations for over a decade.

Architecture: Provisioned clusters with leader and compute nodes. You choose node types and quantities based on your workload. Redshift Serverless is also available as a pay-per-query alternative.

What it does well for music companies:

  • Deep AWS ecosystem integration (S3, Glue, Lambda, Step Functions)
  • PostgreSQL compatibility means most SQL tools and libraries work out of the box
  • Mature ecosystem with extensive documentation and community support
  • Fine-grained cluster tuning for teams that want control over performance
  • Spectrum lets you query data directly in S3 without loading it

Where it falls short:

  • Provisioned clusters cost money even when idle, a 24/7 dc2.large node runs ~$180/month
  • Requires capacity planning and occasional cluster resizing
  • More operational overhead than BigQuery
  • Scaling up means migrating to larger nodes or adding more, not automatic
With Redshift provisioned clusters, you pay for compute whether anyone is running queries or not. If your team primarily queries during business hours, you are paying for 16 idle hours per day. Consider Redshift Serverless or BigQuery if your usage is bursty.

ClickHouse: the speed demon

ClickHouse is an open-source, column-oriented OLAP database designed for sub-second analytical queries. Originally built at Yandex to handle billions of page-view events, it has become the go-to choice for real-time analytics.

Architecture: Vectorized query execution with native compression. Processes data in columns rather than rows, which makes aggregations extremely fast. Available as self-hosted (open source) or ClickHouse Cloud (managed).

What it does well for music companies:

  • Sub-second queries on billions of rows, 5-10x faster than BigQuery and Redshift on aggregations
  • Real-time ingestion with no batch delay
  • Excellent compression ratios reduce storage costs significantly
  • Ideal for streaming play-count dashboards, listener behavior analytics, and DSP performance monitoring
  • Cost-effective at scale, especially self-hosted

Where it falls short:

  • Joins are more limited and less optimized than BigQuery or Redshift
  • Smaller BI tool ecosystem, Looker and Sigma have limited native support
  • Self-hosting requires operational expertise (replication, backups, upgrades)
  • Not designed for complex multi-table financial reporting
ClickHouse compresses data aggressively, often achieving 5-10x compression. A 500 GB dataset in BigQuery might only consume 50-100 GB in ClickHouse, directly reducing storage costs.

Head-to-head comparison

FeatureBigQueryRedshiftClickHouse
DeploymentServerless (managed)Cluster or ServerlessSelf-hosted or Cloud
Query latency1-2 seconds1-3 seconds50-500 ms
JoinsFull SQL, excellentFull SQL, goodLimited, basic
ScalingAutomaticManual (cluster) or auto (serverless)Manual (self) or auto (cloud)
dbt supportNative, first-classNative, goodCommunity adapter
BI ecosystemLooker, Sigma, Tableau, Looker StudioQuickSight, Tableau, LookerGrafana, Metabase, Superset
SQL dialectGoogleSQLPostgreSQL-basedClickHouse SQL
Best concurrency~100 concurrent~50 concurrent1000+ concurrent
Learning curveLowMediumMedium-High
Vendor lock-inHigh (GCP)High (AWS)Low (open source)

Pricing simulation: a mid-size music label

Let's model real costs for a label with 15 owned labels, 50 million royalty rows, and a 5-person analytics team running daily queries.

Storage: 50M rows of royalty data (~200 GB uncompressed)

DatabaseStorage modelMonthly cost
BigQuery$20/TB active, $10/TB long-term$4/mo (200 GB active)
RedshiftIncluded in node cost (managed storage: $0.024/GB)$4.80/mo
ClickHouse Cloud$25.30/TB (compressed, ~40 GB after compression)$1.01/mo
Storage costs are nearly identical and trivially small at this scale. The real cost difference comes from compute.

Compute: daily queries by a 5-person team

Assumptions: 20 queries/day per analyst, ~5 GB scanned per query, 22 working days/month.

DatabasePricing modelMonthly cost
BigQuery (on-demand)$6.25/TB scanned. 5 users x 20 queries x 5 GB x 22 days = ~11 TB/month$68/mo
BigQuery (flat-rate)100 slots at ~$0.04/slot-hour x 730 hours$2,920/mo
Redshift (dc2.large, 2 nodes)$0.25/node/hour x 2 nodes x 730 hours$365/mo
Redshift (Serverless)$0.375/RPU-hour, ~4 RPUs x ~8 active hours x 22 days$264/mo
ClickHouse Cloud~$0.10/compute-unit, auto-scaling with idle pause$80-150/mo

Total monthly cost estimate

DatabaseConfigurationTotal/month
BigQueryOn-demand~$72
BigQueryFlat-rate (100 slots)~$2,925
Redshift2x dc2.large (always on)~$370
RedshiftServerless~$269
ClickHouse CloudDev tier with auto-pause~$100-150
BigQuery on-demand looks cheapest, but costs scale linearly with query volume. If your team grows to 20 analysts running complex queries, you could easily hit $500-1,000/month. Partition your tables and use query cost controls to prevent surprises.

Annual cost projection

                        Year 1      Year 2      Year 3
                        (50M rows)  (65M rows)  (85M rows)
BigQuery (on-demand)    $864        $1,080      $1,350
Redshift (2x dc2)      $4,440      $4,440      $6,660*
Redshift Serverless     $3,228      $3,900      $4,800
ClickHouse Cloud        $1,500      $1,800      $2,400

* Redshift cluster upgrade needed at ~70M rows

Performance benchmarks: music data queries

Here is how each database handles typical music industry analytical queries on a 50M-row royalty dataset.

Query 1: Revenue by label, last 12 months

SELECT label_name, SUM(net_revenue) AS total_revenue
FROM fact_revenue
JOIN dim_label USING (label_key)
JOIN dim_date USING (date_key)
WHERE dim_date.date >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 MONTH)
GROUP BY label_name
ORDER BY total_revenue DESC
DatabaseExecution timeData scanned
BigQuery1.2 seconds3.8 GB
Redshift (dc2.large x2)1.8 secondsFull table
ClickHouse87 ms0.4 GB (compressed)

Query 2: Artist recoupment status across all contracts

SELECT artist_name, contract_id,
       SUM(revenue) AS earned,
       advance_amount,
       advance_amount - SUM(revenue) AS balance
FROM fact_revenue
JOIN dim_artist USING (artist_key)
JOIN dim_contract USING (contract_key)
GROUP BY artist_name, contract_id, advance_amount
HAVING balance > 0
ORDER BY balance DESC
DatabaseExecution timeNotes
BigQuery2.1 secondsHandles multi-table join well
Redshift2.8 secondsComparable with sort keys
ClickHouse450 msSlower due to joins, still fast

Query 3: Real-time play counts by territory (1B events)

SELECT territory, COUNT(*) AS plays, SUM(revenue) AS revenue
FROM streaming_events
WHERE event_date >= today() - 7
GROUP BY territory
ORDER BY plays DESC
LIMIT 50
DatabaseExecution time
BigQuery3.4 seconds
Redshift4.1 seconds
ClickHouse120 ms
ClickHouse dominates on single-table aggregations. BigQuery and Redshift are better when you need complex joins across many dimension tables, which is the typical pattern for financial reporting.

Decision framework

Choosing the right database is not about which one is "best." It is about which one fits your specific workload, team, and infrastructure.

Choose BigQuery
Financial reporting, royalty reconciliation, small teams.
Choose Redshift
AWS-native orgs with existing data engineering teams.
Choose ClickHouse
Real-time analytics, high-concurrency dashboards.

Can you combine them?

Yes, and some of the most effective music data architectures do exactly that.

A practical combination: ClickHouse for real-time dashboards (streaming counts, live territory maps, DSP performance) feeding from event streams, plus BigQuery for financial reporting (royalty reconciliation, label P&Ls, catalog valuation) with dbt transformations on a daily batch cycle.

This works well when:

  • Different teams have different latency requirements
  • Real-time event data and financial reporting serve different audiences
  • Your streaming volume justifies a dedicated OLAP engine

It becomes over-engineering when:

  • Your team is small (under 5 people)
  • All queries are batch/daily
  • You do not have dedicated infrastructure engineers

What about Elasticsearch and DynamoDB?

These come up in conversations about music data, but they solve fundamentally different problems.

DatabasePurposeMusic use case
ElasticsearchFull-text search and log analyticsCatalog search (find tracks by title, ISRC, artist), log monitoring, operational dashboards
DynamoDBTransactional key-value storeUser accounts, playlist storage, session management, low-latency app backends

Neither is a data warehouse. You would not run royalty reconciliation on Elasticsearch or catalog valuation queries on DynamoDB. Know the difference, and use each tool in its lane.

Our experience at MusicTech Lab

We have built music data systems across this entire spectrum.

Elasticsearch powered our royalty search engine, processing data from 15+ distributors into a unified, searchable catalog. It excels at finding a specific transaction across 200 million records in milliseconds. But it cannot calculate a label P&L.

ClickHouse drives our AI-powered analytics dashboard, where non-technical users type plain English questions and get charts back in seconds. Sub-second aggregation on tens of millions of rows makes the conversational experience feel instant.

BigQuery is the direction we are actively exploring for financial reporting and reconciliation, connecting royalty data to general ledger systems through dbt transformations. We are deep in the learning phase right now, studying dimensional modeling, star schemas, and the dbt ecosystem. The serverless model and native BI integrations make it a compelling foundation for label finance teams.

The lesson: there is no single "best database for music." There is the right database for your specific workload. And sometimes the right answer is to start learning the tool before your next project demands it.

Getting started

If you are a music company evaluating databases for the first time, start here:

  1. Map your workloads. List every report, dashboard, and query your team runs. Categorize each as financial reporting, real-time analytics, or search.
  2. Count your data. How many rows today? What is your annual growth rate? A 10M-row label has very different needs than a 500M-row distributor.
  3. Assess your team. Do you have data engineers? DevOps? Or is your "data team" a finance analyst who knows Excel? This changes the answer dramatically.
  4. Start serverless. Unless you have a clear reason for provisioned infrastructure, BigQuery on-demand or ClickHouse Cloud with auto-pause will get you running faster and cheaper.
  5. Design for the dimensional model first. The database choice matters less than your data model. A well-designed star schema performs well on any of these three platforms. A poorly designed model performs badly on all of them.
The most expensive database decision is not picking the wrong vendor. It is building without a proper data model and having to rebuild everything six months later. Invest the time upfront in your dimensional design.

Let's Build Something Together

Have a similar project in mind? We'd love to hear about it.

Get in touch to discuss how we can help bring your vision to life.