Device Graph Analysis: Connecting the Dots Across Sessions
When a single fraudster operates dozens of accounts, the individual accounts look legitimate in isolation. Each has a unique email, a plausible IP address, realistic browsing patterns. Traditional rule-based detection checks each account independently and finds nothing suspicious. The connections between accounts — the shared devices, overlapping sessions, common network fingerprints — are invisible to systems that process accounts one at a time.
Why Graphs
Device graph analysis changes the model. Instead of evaluating accounts independently, we build a graph where nodes are devices, accounts, IP addresses, and sessions, and edges represent observed connections: "this device was used to create this account," "this IP was seen with this device," "these two accounts shared a session cookie." The graph reveals structure that flat tables cannot.
A fraud ring that uses 50 accounts across 5 devices and 3 IP addresses forms a distinctive cluster in the graph. The cluster density — many connections within a small group of nodes — is a strong signal. Legitimate users rarely share devices with strangers, and their account-device connections form sparse, tree-like structures rather than dense clusters.
Graph Database Architecture
We use a property graph model with four node types: Device (identified by visitor ID), Account (your user ID), Network (IP address + ASN), and Session (individual identification event). Edges carry metadata: timestamp, confidence score, and event type.
The graph is stored in a purpose-built adjacency index optimized for 2-hop traversals. When a new identification event arrives, we insert the event as a Session node, connect it to the Device and Network nodes, and check if any linked Account has connections to other devices. This insert-and-query operation completes in under 5ms for graphs with up to 10 million nodes.
We actually tried Neo4j first. It worked great in development with 100K nodes. Then we loaded production data — 500M nodes — and Cypher queries that took 2ms started taking 800ms. David spent a week benchmarking alternatives before we built our own adjacency index backed by sharded RocksDB. Sometimes the boring, custom solution beats the elegant off-the-shelf one.
Clustering Algorithms
We apply two clustering algorithms to the device graph:
Connected Components The simplest approach: find all nodes reachable from a given device. If Device A is connected to Account 1 and Account 2, and Device B is also connected to Account 2, then Devices A and B are in the same connected component. This identifies all accounts that share any transitive device connection.
Connected components are fast to compute but can produce very large clusters when legitimate shared devices (family computers, library terminals) create bridges between unrelated accounts. We address this with edge weighting — connections through known shared environments get lower weight.
Community Detection For more nuanced analysis, we run Louvain community detection on the weighted graph. This algorithm partitions the graph into communities where intra-community connections are dense and inter-community connections are sparse. Fraud rings form tight communities even when connected to the broader graph through shared infrastructure.
The Louvain algorithm runs in O(n log n) time, making it practical for graphs with millions of nodes. We run it incrementally — when new edges are added, we update the community assignments locally rather than recomputing the entire partition.
Real-World Pattern: Fraud Ring Detection
A gaming platform integrated our device graph API to detect organized fraud rings. Within the first week, the graph revealed a cluster of 127 accounts connected through 8 devices and 4 IP addresses. The accounts had been created over a 3-month period, each with a unique email and realistic profile. Rule-based detection had flagged zero of them.
The graph structure was the giveaway: 127 accounts sharing 8 devices produces an average of 15.8 accounts per device. Legitimate users average 1.2 accounts per device on this platform. The cluster density was 47x above baseline — an unambiguous fraud signal.
Performance at Scale
Our production device graph handles 2.3 billion nodes and 8.1 billion edges. Insert latency is 2.4ms at p99. Two-hop traversal (find all accounts connected to a device through any path of length 2) completes in 4.1ms at p99. Community detection updates process 50,000 new edges per second.
The graph is sharded by device ID hash across 12 nodes, with each shard holding approximately 190 million nodes. Replication factor of 3 ensures availability. We snapshot the graph hourly for disaster recovery and run full community detection recomputation daily as a consistency check against the incremental updates.
Integration
The device graph is accessible through two interfaces: a real-time query API for individual lookups (is this device connected to other accounts?) and a batch export API for analytics (give me all clusters with more than N accounts). The real-time API is designed for inline fraud decisions — query during account creation to check if the device has seen other accounts. The batch API feeds your data team's investigation workflows.