ETT-316 issues with glacier expiration job#186
Conversation
- `BackupExpiration.pm` forks/execs worker processes to handle deletion in parallel. - Existing test suite left as intact as possible.
| my $exp = HTFeed::BackupExpiration->new(storage_name => $vars{storage_name}, dry_run => 0); | ||
| my $exp = HTFeed::BackupExpiration->new( | ||
| storage_name => $vars{storage_name}, | ||
| storage_config => $vars{storage_config}, |
There was a problem hiding this comment.
The only change here is to get storage config into the class under test, so it can pass it along (via YAML) to the worker processes that actually need it. Can't rely on patching the global config as we used to, when the work was being done in (or close to) the current process.
There was a problem hiding this comment.
The mere presence of storage_config passed to new is what prompts BackupExpiration to write the custom config to a file. I think it would be possible to just pass a flag that says in effect "this is custom config that won't be discoverable by a spawned process -- please write your config to a YAML file." But this is the mechanism that emerged, and it seems to work.
… be already there, maybe as a subdependency..
- `BackupExpiration`
- Set default number of `BackupExpiration` workers to 8
- Allow spawning worker while iterating versions (inner loop) instead of waiting until the end
- allows testing with a bunch of versions of one object
- Log some details on worker spawn and despawn
- Tests
- Add test for unknown storage throwing exception
- Add test with job size 1 so multiple workers are involved
- Fix brittle tests with potential collisions from `old_random_timestamp` and `new_random_timestamp`
- Now choose one and increment or decrement when looping
aelkiss
left a comment
There was a problem hiding this comment.
Overall my sense is:
- writing out batches of IDs seems fine
- expire_versions.pl seems fine
- I think worth considering whether there are benefits to the fork/exec logic that justify the complexity here vs. other approaches
| AND version < DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 180 DAY),"%Y%m%d%H%i%S") | ||
| ORDER BY version DESC | ||
| SQL | ||
| # Write storage config to the temp directory for child processes to get at it. |
There was a problem hiding this comment.
I think I see why this is needed in terms of injecting config to the child process if we are doing fork() and exec() rather than just fork() and retaining the parents' config. It may be possible to set an env var to use a custom config (HTFEED_CONFIG) and write that out in the test? But if we need to do this I think it's OK.
expire_versions.plinvoked as workerBackupExpirationBatch.pmas core logic insideexpire_versions.plBackupExpirationfork/execworker processes to handle deletion in parallelBackupExpirationworkers to 8old_random_timestampandnew_random_timestamp