Real-Time Fraud Scoring at Scale
Fraud scoring at scale requires a fundamentally different architecture than batch processing. When a payment is being authorized or an account is being created, you have milliseconds — not minutes — to deliver a risk score. At tracio.ai, we process over 50,000 events per second with a median scoring latency of 22ms. This article explains the architecture that makes this possible.
The Scoring Pipeline
Every incoming event enters a three-stage pipeline: signal enrichment, vector computation, and risk scoring. Signal enrichment attaches device intelligence data — the visitor's fingerprint, bot detection results, IP intelligence, and historical behavior — to the raw event. Vector computation transforms these enriched signals into a fixed-length feature vector optimized for our scoring model. Risk scoring runs the vector through our trained model and returns a score between 0.0 and 1.0.
The key design decision is that enrichment and vector computation are separated from scoring. Enrichment data is pre-computed and cached. When a visitor loads a page, we compute their device profile and store it in Redis with a TTL of 60 minutes. When a scoring request arrives — typically triggered by a payment or login — we retrieve the pre-computed profile instead of recomputing it. This reduces scoring latency from 200ms+ to under 30ms.
Stream Processing with Go
Our ingestion layer is written in Go and uses a fan-out architecture. Incoming events arrive via HTTP POST and are immediately placed on an internal channel. A pool of worker goroutines reads from this channel, performs enrichment, and writes the enriched events to ClickHouse for analytics and to a scoring queue for real-time processing. The fan-out pool scales dynamically based on queue depth.
We chose Go for the ingestion layer because of its excellent concurrency primitives and predictable memory allocation. Each worker goroutine consumes approximately 4KB of stack space, allowing us to run thousands of concurrent workers on a single node. The garbage collector's sub-millisecond pauses are critical for maintaining consistent latency at high throughput.
Edge Caching and Signal Vectors
For our highest-volume customers, we deploy scoring models at the edge using a pre-computed signal vector cache. When a device is first seen, we compute its full signal vector and store it in our edge cache (deployed on Cloudflare Workers KV). Subsequent scoring requests for the same device retrieve the cached vector and run scoring locally at the edge, achieving sub-10ms latency.
The edge scoring model is a distilled version of our full model — smaller and faster, but optimized for the same accuracy targets. We retrain the edge model weekly and deploy updates via rolling deployment to avoid cache invalidation storms. The full model runs server-side for cases where the edge model's confidence is below a configurable threshold.
ClickHouse for Analytics
All enriched events are stored in ClickHouse, our columnar analytics database. ClickHouse's compression and query performance allow us to store billions of events while supporting real-time analytical queries. Our customers use these analytics to understand fraud patterns, tune scoring thresholds, and investigate individual events.
We use materialized views in ClickHouse to maintain pre-aggregated metrics: fraud rate by country, scoring distribution by device type, and false positive rates by threshold. These materialized views update in real time as events arrive, providing dashboard-ready metrics without expensive aggregation queries.
Lessons Learned
Building a real-time scoring system taught us several lessons. First, pre-computation is the most important optimization — any work you can do before the scoring request arrives is work that does not count against your latency budget. Second, Go's concurrency model is well-suited to high-throughput event processing, but you must be disciplined about memory allocation to avoid GC pressure. Third, edge deployment is transformative for latency but requires careful model management to avoid stale predictions.