Detecting Headless Browsers: Playwright, Puppeteer, and Beyond
Headless browsers are the weapon of choice for sophisticated web scraping, credential stuffing, and fraud operations. Unlike simple HTTP clients, headless browsers execute JavaScript, render pages, and support modern web APIs — making them much harder to detect. Our Bot Detection engine uses multiple independent detection methods to identify 15+ automation frameworks with near-zero false positives.
The Evolution of Browser Automation
Browser automation has come a long way from simple curl scripts. Modern tools like Playwright, Puppeteer, and Selenium WebDriver control real browser engines — Chromium, Firefox, or WebKit — in headless mode. They execute JavaScript, process CSS, render canvas elements, and handle WebGL queries just like headed browsers. This makes them invisible to detection methods that simply check for JavaScript execution capability.
The latest generation of tools has gone further. Playwright's stealth mode patches many of the signals that traditional bot detection relies on. Puppeteer-extra-plugin-stealth modifies navigator properties, overrides WebGL vendor strings, and fakes user interaction events. These anti-detection measures have created an arms race between bot operators and detection systems.
Detection Method 1: WebDriver Flag Analysis
The navigator.webdriver property is set to true when a browser is controlled by automation. Early detection was as simple as checking this property. But modern stealth tools delete or override it. Our detection goes deeper — we check not just the property value but its property descriptor, its presence in the prototype chain, and whether attempts have been made to redefine it. We also check for related properties like navigator.plugins length anomalies that accompany WebDriver overrides.
Detection Method 2: Chrome DevTools Protocol Artifacts
Playwright and Puppeteer control browsers through the Chrome DevTools Protocol (CDP). Even when stealth mode is active, CDP leaves artifacts in the runtime: specific global variables, modified getter functions, and altered property descriptors on the Window and Navigator objects. We probe for these artifacts using techniques that are resilient to simple overwrites.
Detection Method 3: Headless Browser Fingerprinting
Headless Chrome has a different set of capabilities than headed Chrome. It lacks certain browser plugins, has different rendering characteristics for some CSS properties, and reports different values for some MediaQuery results. We maintain a database of known headless browser characteristics and check incoming fingerprints against it.
Key headless indicators include: missing chrome.runtime (present in headed Chrome but absent in headless), zero-length navigator.plugins array, specific user agent patterns that have been associated with headless mode in prior versions, and differences in how headless Chrome handles iframe security contexts.
Detection Method 4: Eval Length Analysis
Different JavaScript engines have different implementations of built-in functions, and these implementations have different string representations. By checking the length of Function.prototype.toString.call(eval) and comparing it against known values for each browser engine, we can detect environment spoofing — for example, a headless Chrome instance that is pretending to be Firefox.
Detection Method 5: TLS Cross-Validation
As discussed in our TLS fingerprinting article, the TLS Client Hello message reveals the actual browser or HTTP library making the connection. When a Playwright script controls Chrome, the TLS fingerprint matches Chrome — this is expected. But when a custom bot uses a Python requests library or Go's net/http, the TLS fingerprint reveals the deception regardless of what user agent string is sent.
Detection Method 6: Timing and Behavioral Analysis
Real users exhibit natural variation in their interaction timing. They move the mouse in curves, not straight lines. They pause before clicking. They scroll at variable speeds. Automated tools, even those that simulate human behavior, produce statistically distinguishable patterns — too-consistent timing, perfectly linear mouse paths, and unnatural scroll velocities.
We collect minimal behavioral signals during the fingerprinting process itself — the timing of API calls, the order of signal collection, and the responsiveness of certain browser APIs. These micro-behavioral signals are difficult for automation tools to spoof because they depend on the actual execution environment, not on overrideable properties.
Detection Method 7: Permission and API Inconsistency
Real browsers have consistent permission states and API availability. A browser that claims to support notifications but has no Notification constructor, or that reports a specific screen resolution but returns different values from window.screen and CSS media queries, is exhibiting inconsistencies that indicate tampering or emulation.
We check dozens of these cross-validation points, looking for contradictions that arise when automation tools selectively override some signals without maintaining consistency across all related APIs.
Detection Method 8: VM and Emulation Detection
Many bot operations run inside virtual machines or cloud instances. While this alone is not proof of automation, it is a strong signal when combined with other indicators. We detect VMs through WebGL renderer strings that contain VM-associated keywords (like "llvmpipe" or "SwiftShader"), hardware characteristics that are inconsistent with consumer devices (exactly 2 CPU cores and 2GB memory — common VM defaults), and known cloud provider IP ranges.
The Multi-Method Advantage
Each detection method individually has limitations — a sophisticated bot operator might evade any single method. But evading all methods simultaneously, while maintaining cross-validation consistency across all of them, is prohibitively expensive. The cost of developing and maintaining a bot that passes all checks exceeds the economic value of most bot operations.
Near-Zero False Positives
Our detection operates on a whitelist model for search engine bots (Googlebot, Bingbot, etc.) verified through reverse DNS, and a multi-signal model for other traffic. We require multiple corroborating signals before classifying traffic as automated. This conservative approach ensures a false positive rate below 0.1% — verified across billions of production events.