Running ClickHouse in Production: Ingestion, Merges & Cost at 2B Rows

When we started building tracio.ai's analytics tier, we needed a database that could handle our specific workload: ingest 50,000 device identification events per second, store 2+ billion rows, and answer analytical queries in under one second. We evaluated PostgreSQL (too slow for aggregations at this scale), Elasticsearch (too expensive for time-series analytics), and ClickHouse. ClickHouse won decisively.

Why ClickHouse

ClickHouse is a column-oriented OLAP database designed for real-time analytics. Its key advantage for our workload is that it reads only the columns needed for each query. When a fraud analyst asks "show me the fraud rate by country for the last 7 days," ClickHouse reads only the country, timestamp, and risk_score columns — ignoring the other 40+ columns in the event table. On a 2B row table, this reduces I/O by 95%.

ClickHouse also compresses data extremely well. Our 2B row events table occupies 340 GB on disk — about 170 bytes per row compressed, versus 1.2 KB per row uncompressed. The compression ratio of 7:1 means more data fits in memory, which directly translates to faster queries.

Schema Design

Our primary table stores one row per identification event:

The table uses the MergeTree engine, ordered by (workspace_id, toDate(timestamp), visitor_hash). This ordering is critical — it means queries filtered by workspace and date range read minimal data. The visitor_hash column enables fast lookups by visitor ID without a secondary index.

We chose LowCardinality(String) for country, device_type, browser_family, and os_family because these columns have fewer than 10,000 distinct values. ClickHouse stores LowCardinality columns as dictionary-encoded integers, reducing storage by 80% compared to plain strings and speeding up GROUP BY operations.

Three optimization layers: MergeTree ordering, materialized views for dashboards, projections for visitor lookups.

Sharding Strategy

We shard the events table across 6 nodes using a hash of workspace_id. This ensures that all events for a given customer are on the same shard, which means most queries (filtered by workspace_id) hit a single shard. Cross-shard queries are needed only for internal analytics.

Each shard has 2 replicas for high availability. Replication uses ClickHouse's built-in ReplicatedMergeTree engine with ZooKeeper coordination. Failover is automatic — if a shard goes down, queries are routed to the replica with no client-side changes.

Ingestion Pipeline

Events flow from our Kafka topic into ClickHouse through a custom Go service that batches inserts. We insert in batches of 10,000 rows every 500ms — this balances ingestion latency (sub-second) with insert efficiency (ClickHouse performs best with large batches).

The ingestion service handles back-pressure gracefully. If ClickHouse is slow to accept inserts (during merges or heavy query load), the service buffers up to 1 million events in memory and applies back-pressure to the Kafka consumer. In 18 months of production, we have never lost an event.

Query Optimization

Materialized Views

For common dashboard queries, we use materialized views that pre-aggregate data. Our fraud rate dashboard, for example, reads from a materialized view that aggregates fraud_detected counts by workspace, country, and hour. The view reduces the data scanned for this query from 2B rows to 5M rows.

Projection Ordering

ClickHouse projections let us define alternative sort orders for a table without duplicating data. We added a projection ordered by (workspace_id, visitor_hash, timestamp) for visitor timeline queries. Without the projection, these queries scanned entire date ranges. With it, they read only the blocks containing the target visitor.

Approximate Functions

For dashboard queries where exact counts are not critical, we use ClickHouse's approximate functions: uniqCombined for distinct counts (2% error margin, 10x faster than uniqExact) and quantileTDigest for percentile calculations. The fraud analytics dashboard uses approximate functions exclusively, which keeps all dashboard queries under 200ms.

Performance Numbers

Here are representative query benchmarks on our 2B row production cluster:

Fraud rate by country, last 7 days: 120ms. Visitor timeline (50 events): 8ms. Unique visitors per day, last 30 days: 340ms. Risk score distribution, last 24 hours: 95ms. Top 100 devices by event count, last 30 days: 210ms.

These numbers include network round-trip from our application servers to the ClickHouse cluster. Pure query execution time is typically 30-50% lower.

Production query benchmarks on 2B rows: sub-second analytics across all dashboard queries.

Operational Lessons

Lesson 1: Monitor merge lag

ClickHouse's MergeTree engine continuously merges small data parts into larger ones. If merges fall behind (due to high insert rate or disk I/O contention), query performance degrades because queries must scan more parts. We monitor the parts count per partition and alert when it exceeds 300.

Lesson 2: Avoid large ALTER TABLE operations

Adding a column to a 2B row table in ClickHouse is instant (it is metadata-only). But changing a column type requires rewriting all data parts — a process that took 6 hours on our cluster. We now treat the schema as append-only: new columns are added freely, but type changes go through a migration table.

Lesson 3: TTL with caution

ClickHouse supports automatic data expiration via TTL. We set a 90-day TTL on our events table. The gotcha: TTL deletion happens during merges, which means deleted data may persist for hours or days after the TTL expires. For compliance-critical deletion, we run explicit ALTER TABLE DELETE queries on a schedule.

Cost

Our 6-node ClickHouse cluster (each node: 32 vCPU, 128 GB RAM, 2 TB NVMe) costs approximately $8,400/month on bare metal hosting. This stores 2B rows with 90-day retention and handles 50K inserts/second plus 200 concurrent dashboard queries. The cost per event stored is $0.0000042 — orders of magnitude cheaper than comparable analytics on managed cloud databases.

Why ClickHouse

Schema Design

Our primary table stores one row per identification event:

Three optimization layers: MergeTree ordering, materialized views for dashboards, projections for visitor lookups.

Sharding Strategy

Ingestion Pipeline

Query Optimization

Materialized Views

Projection Ordering

Approximate Functions

Performance Numbers

Here are representative query benchmarks on our 2B row production cluster:

These numbers include network round-trip from our application servers to the ClickHouse cluster. Pure query execution time is typically 30-50% lower.

Production query benchmarks on 2B rows: sub-second analytics across all dashboard queries.