Skip to content

feat(cli): Add --filename-template and --max-length options #763

Open
AdamQuadmon wants to merge 1 commit intoadbar:masterfrom
AdamQuadmon:feature-754-output-filenames
Open

feat(cli): Add --filename-template and --max-length options #763
AdamQuadmon wants to merge 1 commit intoadbar:masterfrom
AdamQuadmon:feature-754-output-filenames

Conversation

@AdamQuadmon
Copy link
Copy Markdown

Introduces two new CLI arguments to allow fine-grained control over how output file paths are generated:

--filename-template: Specify a template string using variables like {domain}, {hash}, {ext} to define a custom directory structure and file naming scheme

--max-length: Set a maximum character limit for generated file paths, intelligently truncating if needed while preserving essential components

Includes documentation and tests.

Closes #754

Introduces two new CLI arguments to allow fine-grained control over how output file paths are generated:

--filename-template: Specify a template string using variables like {domain}, {hash}, {ext} to define a custom directory structure and file naming scheme

--max-length: Set a maximum character limit for generated file paths, intelligently truncating if needed while preserving essential components

Includes documentation updates covering the new options, examples, and troubleshooting.

Closes adbar#754
@DesBw
Copy link
Copy Markdown

DesBw commented Dec 8, 2024

That will be great. I am looking forward for this feature to make it to this incredible tool.

Comment thread tests/filename_tests.py
f"Generated path length {len(output_dir)} exceeds limit of 50: {output_dir}",
)
self.assertTrue(
output_dir.startswith("example.com"),

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization

The string [example.com](1) may be at an arbitrary position in the sanitized URL.
Comment thread tests/filename_tests.py
f"Generated path length {len(output_dir)} exceeds limit of 40: {output_dir}",
)
self.assertTrue(
output_dir.startswith("example.com"),

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization

The string [example.com](1) may be at an arbitrary position in the sanitized URL.
Comment thread tests/filename_tests.py

# Basic assertions
self.assertTrue(
output_dir.startswith("example.com"), f"Domain not preserved: {output_dir}"

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization

The string [example.com](1) may be at an arbitrary position in the sanitized URL.
Comment thread tests/filename_tests.py
# Verify long path handling
self.assertLessEqual(len(output_dir2), 50, "Long path not properly truncated")
self.assertTrue(
output_dir2.startswith("example.com"), "Domain lost in long path truncation"

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization

The string [example.com](1) may be at an arbitrary position in the sanitized URL.
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 9, 2024

Codecov Report

❌ Patch coverage is 94.11765% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.03%. Comparing base (76200b7) to head (c44c7b5).
⚠️ Report is 16 commits behind head on master.

Files with missing lines Patch % Lines
trafilatura/filename.py 93.24% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #763      +/-   ##
==========================================
- Coverage   99.26%   99.03%   -0.24%     
==========================================
  Files          21       22       +1     
  Lines        3559     3728     +169     
==========================================
+ Hits         3533     3692     +159     
- Misses         26       36      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@adbar
Copy link
Copy Markdown
Owner

adbar commented Dec 9, 2024

@AdamQuadmon Thanks for the substantial PR, it's great that you included tests and documentation. Could you please make sure the tests pass for older Python versions? I don't think the code security warning are important.

@adbar
Copy link
Copy Markdown
Owner

adbar commented Dec 9, 2024

Please also improve test coverage.

@adbar
Copy link
Copy Markdown
Owner

adbar commented Jul 14, 2025

@AdamQuadmon Are you still working on the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLI: better control of output file names

4 participants