Skip to content

Commit ebce0b9

Browse files
ssk42claude
andauthored
feat(amazon): Add stealth extraction for improved success rate (#31)
* docs: add Amazon stealth extraction design Design for improving Amazon price extraction from ~10% to 50-60% success rate using full stealth Playwright techniques: - Browser identity rotation with fingerprint randomization - Human-like behavior simulation (mouse, scroll, timing) - Request strategy with graceful degradation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * chore: add .worktrees to gitignore Prepare for git worktree usage for feature development. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add Amazon stealth implementation plan 8 TDD tasks covering: - Browser identity profiles and rotation - Human-like behavior simulation - Stealth extraction with playwright-stealth - Price service integration - Monitoring and feature flags Estimated: ~3 hours Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * deps: add playwright-stealth for Amazon extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(amazon): add BrowserIdentity dataclass and profile pool Add BrowserIdentity dataclass to represent unique browser fingerprints for stealth extraction, and IDENTITY_PROFILES list with 12 realistic browser configurations (Chrome, Safari, Firefox, Edge across Mac, Windows, and Linux). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(amazon): add IdentityManager with Redis persistence Implements browser identity rotation and burn tracking: - Rotates identities to avoid detection patterns - Tracks request counts per identity in Redis - Burns identities that trigger CAPTCHA for 24 hours - Persists cookies per identity for session reuse - Prefers lowest-usage identities for load balancing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(amazon): add human-like behavior simulation functions Add behaviors.py with functions to simulate human browsing: - human_delay(): Random delays with variance (returns seconds) - generate_bezier_points(): Natural mouse movement paths - human_mouse_move(): Move mouse along bezier curve - human_scroll(): Human-like page scrolling - handle_cookie_banner(): Dismiss Amazon cookie dialogs - interact_like_human(): Combined human simulation sequence - COOKIE_ACCEPT_SELECTORS: Amazon-specific cookie button selectors Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(amazon): add stealth extractor with failure classification - Add ExtractionResult dataclass for extraction outcomes - Add AmazonFailureType enum (CAPTCHA, RATE_LIMITED, NO_PRICE_FOUND, NETWORK_ERROR) - Add classify_failure() function to categorize extraction failures - Add stealth_fetch_amazon() async function using playwright-stealth v2.0.1 - Add stealth_fetch_amazon_sync() synchronous wrapper - Integrate with existing BrowserIdentity and IdentityManager - Persist cookies between requests via identity_manager - Update package __init__.py with new exports Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(amazon): integrate stealth extraction into price service - Add AMAZON_STEALTH_ENABLED feature flag (default: True) - Add _get_identity_manager() singleton for lazy initialization - Modify _fetch_amazon_price to use stealth when enabled - Rename original logic to _fetch_amazon_price_legacy for fallback - Mark identity as burned on CAPTCHA detection - Fall back to legacy extraction when stealth unavailable - Add 5 integration tests covering all code paths Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(amazon): add stealth metrics logging and config flag - Add AMAZON_STEALTH_ENABLED environment variable to config.py (defaults to true, can be disabled via env var) - Add log_stealth_extraction() function to price_metrics.py for monitoring stealth extraction attempts with identity tracking - Update price_service.py to import feature flag from Config instead of using hardcoded value Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add Amazon stealth extraction documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 034e9ee commit ebce0b9

18 files changed

+3293
-0
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,3 +116,6 @@ test_output.txt
116116
test_run.log
117117
coverage.xml
118118
.coverage
119+
120+
# Git worktrees
121+
.worktrees/

CLAUDE.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,14 @@ The application prevents gift receivers from seeing who claimed/purchased their
152152
- Production (Heroku): PostgreSQL via `DATABASE_URL` environment variable
153153
- Automatic postgres:// to postgresql:// URI conversion for Heroku compatibility
154154

155+
#### Amazon Stealth Extraction
156+
The application uses stealth Playwright techniques for Amazon price extraction:
157+
- **Browser Identity Rotation**: 12 realistic browser profiles rotated every 10-20 requests
158+
- **Human-like Behavior**: Mouse movements, scrolling, natural delays
159+
- **Identity Burn Tracking**: Identities that trigger CAPTCHA are disabled for 24 hours
160+
- **Feature Flag**: Controlled via `AMAZON_STEALTH_ENABLED` environment variable
161+
- **Implementation**: `services/amazon_stealth/` module
162+
155163
### Testing Architecture
156164
- **Unit tests** (`tests/unit/`): Use Flask test client with temporary SQLite database
157165
- **Browser tests** (`tests/browser/`): Playwright-based end-to-end tests with live test server on port 5001

config.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,9 @@ def get_ratelimit_storage_uri():
9696
LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO')
9797
LOG_FILE = os.getenv('LOG_FILE', 'wishlist.log')
9898

99+
# Amazon stealth extraction settings
100+
AMAZON_STEALTH_ENABLED = os.environ.get('AMAZON_STEALTH_ENABLED', 'true').lower() == 'true'
101+
99102
# Security Headers (configured in app.py)
100103
SECURITY_HEADERS = {
101104
'X-Content-Type-Options': 'nosniff',

0 commit comments

Comments
 (0)