Skip to content

ETT-316 issues with glacier expiration job#186

Merged
moseshll merged 4 commits into
mainfrom
ETT-316_glacier_parallelization
Jun 23, 2026
Merged

ETT-316 issues with glacier expiration job#186
moseshll merged 4 commits into
mainfrom
ETT-316_glacier_parallelization

Conversation

@moseshll

@moseshll moseshll commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
  • Add expire_versions.pl invoked as worker
  • Add BackupExpirationBatch.pm as core logic inside expire_versions.pl
    • No tests for this class, relying on the existing end-to-end tests to trickle down coverage
  • BackupExpiration
    • fork/exec worker processes to handle deletion in parallel
    • Set default number of BackupExpiration workers to 8
    • Log some details on worker spawn and despawn
  • Tests
    • Existing test suite left as intact as possible
    • Add test for unknown storage throwing exception
    • Add test with job size 1 so multiple workers are involved
    • Fix brittle tests with potential collisions from old_random_timestamp and new_random_timestamp

- `BackupExpiration.pm` forks/execs worker processes to handle deletion in parallel.
- Existing test suite left as intact as possible.
Comment thread t/backup_expiration.t
my $exp = HTFeed::BackupExpiration->new(storage_name => $vars{storage_name}, dry_run => 0);
my $exp = HTFeed::BackupExpiration->new(
storage_name => $vars{storage_name},
storage_config => $vars{storage_config},

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change here is to get storage config into the class under test, so it can pass it along (via YAML) to the worker processes that actually need it. Can't rely on patching the global config as we used to, when the work was being done in (or close to) the current process.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mere presence of storage_config passed to new is what prompts BackupExpiration to write the custom config to a file. I think it would be possible to just pass a flag that says in effect "this is custom config that won't be discoverable by a spawned process -- please write your config to a YAML file." But this is the mechanism that emerged, and it seems to work.

moseshll added 2 commits June 18, 2026 09:20
… be already there, maybe as a subdependency..
- `BackupExpiration`
  - Set default number of `BackupExpiration` workers to 8
  - Allow spawning worker while iterating versions (inner loop) instead of waiting until the end
    - allows testing with a bunch of versions of one object
  - Log some details on worker spawn and despawn
- Tests
  - Add test for unknown storage throwing exception
  - Add test with job size 1 so multiple workers are involved
  - Fix brittle tests with potential collisions from `old_random_timestamp` and `new_random_timestamp`
    - Now choose one and increment or decrement when looping
@moseshll moseshll marked this pull request as ready for review June 18, 2026 13:42
@moseshll moseshll requested a review from aelkiss June 18, 2026 13:42

@aelkiss aelkiss left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall my sense is:

  • writing out batches of IDs seems fine
  • expire_versions.pl seems fine
  • I think worth considering whether there are benefits to the fork/exec logic that justify the complexity here vs. other approaches

AND version < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 180 DAY),"%Y%m%d%H%i%S")
ORDER BY version DESC
SQL
# Write storage config to the temp directory for child processes to get at it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see why this is needed in terms of injecting config to the child process if we are doing fork() and exec() rather than just fork() and retaining the parents' config. It may be possible to set an env var to use a custom config (HTFEED_CONFIG) and write that out in the test? But if we need to do this I think it's OK.

Comment thread lib/HTFeed/BackupExpiration.pm
@moseshll moseshll merged commit 02f7989 into main Jun 23, 2026
1 check passed
@moseshll moseshll deleted the ETT-316_glacier_parallelization branch June 23, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants