Skip to content

Implement runtime filter for asset archiving#498

Draft
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-388
Draft

Implement runtime filter for asset archiving#498
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-388

Conversation

Copilot AI commented Sep 23, 2025

Copy link
Copy Markdown
Contributor

This PR implements runtime filters for asset archiving as requested, allowing users to limit asset archiving by number, file type, and time spent per page.

New Features

1. Maximum Assets Limit (--max-assets)

Limits the number of assets archived per page:

./Zeno get url https://example.com --max-assets 5

2. File Type Filtering

Control which asset types to archive:

# Only archive stylesheets and scripts
./Zeno get url https://example.com --assets-allowed-file-types css,js

# Exclude video files
./Zeno get url https://example.com --assets-disallowed-file-types mp4,avi,mov

3. Time-Based Filtering (--assets-archiving-timeout)

Stop archiving assets after a specified time per page:

./Zeno get url https://example.com --assets-archiving-timeout 30s

Implementation Details

  • Asset filtering is applied in the postprocessor during extraction using file extension matching
  • Timeout handling uses Go context cancellation in the archiver to cleanly stop asset archiving
  • Precedence rules: Allowed file types take precedence over disallowed types when both are specified
  • Default behavior preserved: When no filtering flags are specified, all assets are archived (existing behavior)

Combined Usage

All filters can be combined for fine-grained control:

./Zeno get url https://example.com \
  --max-assets 10 \
  --assets-allowed-file-types css,js \
  --assets-archiving-timeout 1m

Testing

The implementation includes comprehensive test coverage:

  • Unit tests for individual filter functionality
  • Integration tests for combined filtering scenarios
  • Edge case handling (nil assets, invalid URLs, configuration conflicts)
  • Validation that existing behavior is preserved

Fixes #388.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: ./Zeno get url REDACTED --max-assets 2 --assets-allowed-file-types css,js --log-level debug --workers 1 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits September 23, 2025 08:44
- Add configuration options for max assets, file type filtering, and timeout
- Implement filterAssets function in postprocessor
- Add command-line flags for asset filtering
- Add comprehensive tests for asset filtering logic
- Modify archiver to support timeout-based asset archiving

Co-authored-by: CorentinB <5089772+CorentinB@users.noreply.github.com>
…documentation

- Add integration tests for combined filtering scenarios
- Add configuration validation tests
- Fix test cases for proper edge case handling
- Add comprehensive documentation with examples
- All features working as designed with default behavior preserved

Co-authored-by: CorentinB <5089772+CorentinB@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement runtime filter for asset archiving Implement runtime filter for asset archiving Sep 23, 2025
Copilot AI requested a review from CorentinB September 23, 2025 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement runtime filter for asset archiving

2 participants