How We Built tracio.ai's Sub-30ms Pipeline

When we set out to build tracio.ai's device identification engine, we had one non-negotiable requirement: the entire pipeline — from receiving encrypted signals to returning a visitor ID — must complete in under 30 milliseconds at the 95th percentile. This article is a detailed walkthrough of the architecture we built to meet that target.

Pipeline Overview

The identification pipeline has five stages: signal decryption, signal normalization, hash computation, identity resolution, and response serialization. Each stage is optimized independently, and stages that can run in parallel do. The total budget is 30ms, allocated roughly as: decryption 2ms, normalization 3ms, hashing 2ms, identity resolution 20ms, serialization 1ms. The remaining 2ms is buffer.

Signal decryption reverses the client-side encrypted transport. We use Go's crypto packages with hardware acceleration, which complete decryption of a typical 4KB payload in under 1ms. Normalization parses the signal JSON, validates types, and applies platform-specific transformations — for example, normalizing user agent strings to remove version-specific noise.

Five-stage identification pipeline: decrypt, normalize, hash, resolve identity, serialize — all under 30ms at p95.

Distributed Identity Resolution

Identity resolution — determining whether this device has been seen before — is the most latency-sensitive stage. We store device profiles in Redis, sharded across a cluster using a distributed key routing layer. The routing distributes keys based on the hardware-tier fingerprint, which ensures that lookups for the same device always hit the same Redis node.

Our sharding implementation uses virtual nodes (150 per physical node) to ensure even distribution. When a node is added or removed, only 1/N of keys need to be remapped, where N is the number of nodes. We implemented the routing layer in Go with O(log n) lookup time and zero allocations.

Redis as the Identity Store

We chose Redis over alternatives (Memcached, ScyllaDB, DynamoDB) because of its consistent sub-millisecond response times and support for complex data structures. Each device profile is stored as a Redis hash with fields for each signal tier's hash, the visitor ID, the last-seen timestamp, and confidence metadata.

The identity resolution query is a single HGETALL call followed by a comparison of the incoming signal hashes against the stored hashes. If the hardware tier matches, we return the existing visitor ID with high confidence. If only the software tier matches, we perform a similarity comparison of the signal-level data to determine whether this is the same device with an updated browser. If nothing matches, we generate a new visitor ID.

ClickHouse for Event Storage

Every identification event is written to ClickHouse asynchronously. We use a buffered writer that batches inserts — collecting events for 100ms or until 1,000 events accumulate, whichever comes first. This batching is critical because ClickHouse performs best with large inserts (thousands of rows at a time) rather than individual row inserts.

Our ClickHouse schema is optimized for the two most common query patterns: looking up all events for a specific visitor ID, and aggregating events over time periods. We use a MergeTree engine with a primary key of (visitor_id, timestamp), which provides fast point lookups and efficient range scans. Materialized views maintain pre-aggregated daily and hourly metrics.

Latency budget: 28ms allocated across 5 stages with 2ms buffer. p50 = 12ms, p95 = 24ms, p99 = 38ms.

Achieving Sub-30ms at Scale

Three architectural decisions were critical for meeting our latency target. First, the pipeline is fully streaming — we begin processing signals before the entire HTTP request body has been received. Second, Redis lookups use connection pooling with persistent connections, eliminating TCP handshake overhead. Third, ClickHouse writes are fully asynchronous and never block the response path.

Under load testing with 50K requests/second, our p50 latency is 12ms, p95 is 24ms, and p99 is 38ms. The p99 occasionally exceeds our 30ms target during Redis cluster rebalancing, but the p95 remains consistently below 30ms. For customers with stricter latency requirements, we offer dedicated Redis clusters that eliminate multi-tenant contention.

Pipeline Overview

Five-stage identification pipeline: decrypt, normalize, hash, resolve identity, serialize — all under 30ms at p95.

Distributed Identity Resolution

Redis as the Identity Store

ClickHouse for Event Storage

Latency budget: 28ms allocated across 5 stages with 2ms buffer. p50 = 12ms, p95 = 24ms, p99 = 38ms.