Skip to content

ETT-1459: reuse main_repo_audit for extracting info from mets#187

Merged
aelkiss merged 1 commit into
mainfrom
ETT-1459-extract-first-ingest-date
Jun 23, 2026
Merged

ETT-1459: reuse main_repo_audit for extracting info from mets#187
aelkiss merged 1 commit into
mainfrom
ETT-1459-extract-first-ingest-date

Conversation

@aelkiss

@aelkiss aelkiss commented Jun 22, 2026

Copy link
Copy Markdown
Member
  • rename to "crawl repo mets" and remove md5 checking (done by truenas_audit.pl)
  • record date of first ingest to feed_audit; extracts all PREMIS 'ingestion' events & takes the first
  • separate flag for source mets handling
  • extract source METS PREMIS events
  • extract methods to make testable (similar to treatment for populate_rights)

This is the second half of ETT-1459 -- backfilling feed_audit with the first ingest date for things that don't have it.

This used the same approach to getting it under test as I did for populate_rights_data.pl -- I tried calling out with system but ran into some of the same issues you did with Glacier with not inheriting the config. I figured making it more directly testable was probably preferable, and I think it's close to a point where it could be changed to be an object to further make it testable.

Of particular note is that some of the things in crawl_repo_mets are still duplicative of truenas_audit -- probably we should extract those somewhere separate or do less rigorous checks in crawl_repo_mets.

I don't love that the interface is outputting to stdout, but I can redirect the output to a file in Kubernetes and process later.

* rename to "crawl repo mets" and remove md5 checking (done by
  truenas_audit.pl)
* record date of first ingest to feed_audit; extracts all PREMIS
  'ingestion' events & takes the first
* separate flag for source mets handling
* extract source METS PREMIS events
* extract methods to make testable (similar to treatment for
  populate_rights)
@aelkiss aelkiss requested a review from moseshll June 22, 2026 20:45

@moseshll moseshll left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lotsa stuff we've seen before. Could see if HTFeed::RepositoryIterator would do some of the lifting, maybe at a later point. It would probably have some gotchas to refactor since it's really tailored to the truenas audit -- not worth the pain right now IMHO.

@aelkiss aelkiss merged commit 8f45469 into main Jun 23, 2026
1 check passed
@aelkiss aelkiss deleted the ETT-1459-extract-first-ingest-date branch June 23, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants