Commit ebce0b9
feat(amazon): Add stealth extraction for improved success rate (#31)
* docs: add Amazon stealth extraction design
Design for improving Amazon price extraction from ~10% to 50-60%
success rate using full stealth Playwright techniques:
- Browser identity rotation with fingerprint randomization
- Human-like behavior simulation (mouse, scroll, timing)
- Request strategy with graceful degradation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* chore: add .worktrees to gitignore
Prepare for git worktree usage for feature development.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: add Amazon stealth implementation plan
8 TDD tasks covering:
- Browser identity profiles and rotation
- Human-like behavior simulation
- Stealth extraction with playwright-stealth
- Price service integration
- Monitoring and feature flags
Estimated: ~3 hours
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* deps: add playwright-stealth for Amazon extraction
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat(amazon): add BrowserIdentity dataclass and profile pool
Add BrowserIdentity dataclass to represent unique browser fingerprints
for stealth extraction, and IDENTITY_PROFILES list with 12 realistic
browser configurations (Chrome, Safari, Firefox, Edge across Mac,
Windows, and Linux).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat(amazon): add IdentityManager with Redis persistence
Implements browser identity rotation and burn tracking:
- Rotates identities to avoid detection patterns
- Tracks request counts per identity in Redis
- Burns identities that trigger CAPTCHA for 24 hours
- Persists cookies per identity for session reuse
- Prefers lowest-usage identities for load balancing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat(amazon): add human-like behavior simulation functions
Add behaviors.py with functions to simulate human browsing:
- human_delay(): Random delays with variance (returns seconds)
- generate_bezier_points(): Natural mouse movement paths
- human_mouse_move(): Move mouse along bezier curve
- human_scroll(): Human-like page scrolling
- handle_cookie_banner(): Dismiss Amazon cookie dialogs
- interact_like_human(): Combined human simulation sequence
- COOKIE_ACCEPT_SELECTORS: Amazon-specific cookie button selectors
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat(amazon): add stealth extractor with failure classification
- Add ExtractionResult dataclass for extraction outcomes
- Add AmazonFailureType enum (CAPTCHA, RATE_LIMITED, NO_PRICE_FOUND, NETWORK_ERROR)
- Add classify_failure() function to categorize extraction failures
- Add stealth_fetch_amazon() async function using playwright-stealth v2.0.1
- Add stealth_fetch_amazon_sync() synchronous wrapper
- Integrate with existing BrowserIdentity and IdentityManager
- Persist cookies between requests via identity_manager
- Update package __init__.py with new exports
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat(amazon): integrate stealth extraction into price service
- Add AMAZON_STEALTH_ENABLED feature flag (default: True)
- Add _get_identity_manager() singleton for lazy initialization
- Modify _fetch_amazon_price to use stealth when enabled
- Rename original logic to _fetch_amazon_price_legacy for fallback
- Mark identity as burned on CAPTCHA detection
- Fall back to legacy extraction when stealth unavailable
- Add 5 integration tests covering all code paths
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat(amazon): add stealth metrics logging and config flag
- Add AMAZON_STEALTH_ENABLED environment variable to config.py
(defaults to true, can be disabled via env var)
- Add log_stealth_extraction() function to price_metrics.py for
monitoring stealth extraction attempts with identity tracking
- Update price_service.py to import feature flag from Config
instead of using hardcoded value
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: add Amazon stealth extraction documentation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>1 parent 034e9ee commit ebce0b9
File tree
18 files changed
+3293
-0
lines changed- docs/plans
- services
- amazon_stealth
- tests/unit
18 files changed
+3293
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
155 | 163 | | |
156 | 164 | | |
157 | 165 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
99 | 102 | | |
100 | 103 | | |
101 | 104 | | |
| |||
0 commit comments