Skip to content

Exclude captcha, analytics, and tracking requests from networkidle#194

Open
Hackerbone wants to merge 1 commit intoKaliiiiiiiiii-Vinyzu:mainfrom
bugbasesecurity:networkidle-exclusion-list
Open

Exclude captcha, analytics, and tracking requests from networkidle#194
Hackerbone wants to merge 1 commit intoKaliiiiiiiiii-Vinyzu:mainfrom
bugbasesecurity:networkidle-exclusion-list

Conversation

@Hackerbone
Copy link
Copy Markdown

Summary

Playwright's networkidle waits for 500ms with zero inflight requests. Captcha providers, analytics SDKs, fraud detection, and session heartbeat endpoints poll continuously, preventing networkidle from ever firing on real-world sites.

This adds URL-based filtering to _inflightRequestStarted and _inflightRequestFinished in FrameManager, following the existing _isFavicon exclusion pattern. Requests matching known polling domains are never added to the inflight set, so they don't delay the idle timer.

Approach

  • Uses ts-morph AST patching (project convention) — inserts the check right after the _isFavicon early return in both methods
  • The compiled output is indistinguishable from normal Playwright code (no markers, no post-compilation patching)
  • Follows the exact same pattern as the existing _isFavicon exclusion

Excluded patterns

Category Domains
Captcha challenges.cloudflare.com, google.com/recaptcha, www.gstatic.com/recaptcha, hcaptcha.com, api.funcaptcha.com, client-api.arkoselabs.com
Analytics google-analytics.com, googletagmanager.com, analytics.google.com
Session recording hotjar.com, fullstory.com, logrocket.com, mouseflow.com, clarity.ms
Telemetry browser-intake-datadoghq.com, sentry.io, newrelic.com, nr-data.net
Fraud detection forter.com
Generic /heartbeat, /keepalive, /keep-alive, /beacon

How it works

The generated code in frames.ts after patching:

private _inflightRequestStarted(request: network.Request) {
  const frame = request.frame()!;
  if (request._isFavicon)
    return;
  const _reqUrl = request.url();
  if (["challenges.cloudflare.com","google.com/recaptcha",...].some(p => _reqUrl.includes(p)))
    return;
  frame._inflightRequests.add(request);
  if (frame._inflightRequests.size === 1)
    frame._stopNetworkIdleTimer();
}

Context

This was previously submitted as patchright-python#111 which patched compiled JS post-extraction. Per feedback from @Vinyzu, this version:

  • Lives in the driver repo where it belongs
  • Uses ts-morph AST patching (project convention)
  • Compiles naturally with no detectable artifacts

…lculations

Playwright's networkidle waits for 500ms of zero inflight requests. Captcha
providers, analytics SDKs, and session heartbeat endpoints poll continuously,
preventing networkidle from ever firing on real-world sites.

This patch adds URL-based filtering to _inflightRequestStarted and
_inflightRequestFinished in FrameManager, following the existing _isFavicon
exclusion pattern. Matching requests are never added to the inflight set,
so they don't delay the 500ms idle timer.

Excluded patterns:
- Captcha: Cloudflare Turnstile, reCAPTCHA, hCaptcha, Arkose Labs
- Analytics: Google Analytics, GTM
- Session recording: Hotjar, FullStory, LogRocket, Mouseflow, Clarity
- Telemetry: Datadog, Sentry, New Relic
- Fraud detection: Forter
- Generic: /heartbeat, /keepalive, /keep-alive, /beacon
@Vinyzu
Copy link
Copy Markdown
Contributor

Vinyzu commented Apr 14, 2026

Can you explain why you think Playwright is waiting 500ms before serving such routes? All i see is that theyre getting aborted.
Also i think the system should be more robust, probably handle all requests like this. Why do you think only these Certain URLs should be handled=

@Vinyzu Vinyzu self-assigned this Apr 14, 2026
@Vinyzu Vinyzu added question Further information is requested fp-hard fingerprinting specific to patchright/automated browsers, but no hard detection labels Apr 14, 2026
@Hackerbone
Copy link
Copy Markdown
Author

Hi @Vinyzu, to clarify the mechanism and address your point:

Why 500ms

The issue is not about serving routes. It is the networkidle lifecycle event. _inflightRequestStarted tracks inflight requests, and _startNetworkIdleTimer fires a 500ms setTimeout only when the inflight set reaches 0. Captcha/analytics endpoints poll every 100-500ms, so the count never stays at 0 long enough for the timer to complete.

We're following the _inflightRequestStarted/_inflightRequestFinished exclusion pattern (same as _isFavicon - see here), not the abort pattern. These requests should still succeed (captcha needs to solve, analytics need to fire), they just should not block networkidle.

On making it more robust

Agreed that a hardcoded URL list is not ideal long-term. My current approach covers two categories:

  1. Known polling domains - captcha providers, analytics, telemetry (the most common offenders in practice)
  2. Generic path patterns - /heartbeat, /keepalive, /beacon, /ping etc. to catch first-party polling endpoints regardless of domain

But this is not exhaustive. Some ideas for a more robust system:

  • Resource-type based - exclude ping and beacon resource types from inflight tracking
  • Configurable hook - expose something like networkIdleExcludePatterns on BrowserContext so users can add their own exclusions
  • Combination - ship sensible defaults (the domain/path list) + allow user overrides

What approach would you prefer? Happy to rework. Would love to hear if you had something else in mind when you said "handle all requests like this."

@Vinyzu
Copy link
Copy Markdown
Contributor

Vinyzu commented Apr 16, 2026

@Hackerbone I dont have that much problem with you using agentic coding, as long as the code quality is met. But i dont like you answering my questions by just copying the answer of an LLM. I can prompt a LLM myself you know?
Please inform yourself about the topic (possibly with the help of AI) and make sure you understand the topic yourself, then answer my questions yourself.
If you do not answer my questions yourself i will close this PR.

That said your linked source says that this might only be an Issue in Firefox. Could you please check if this is even a problem in Chromium, or if we can patch out the Timeout entirely.

@Hackerbone
Copy link
Copy Markdown
Author

Hey @Vinyzu I use LLMs to structure and better represent my thinking to avoid grammatical mistakes. And also use agentic tooling to code the solution.

While building our own tooling we use patchright and this is something we noticed happening. We use only chromium in our testing and we noticed this.

I will share comparison on the same with the patch and without the patch so that we can further discuss this and improve the solution.

PS: this response is not AI generated at all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fp-hard fingerprinting specific to patchright/automated browsers, but no hard detection question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants