fix(github): bisect adaptive time windows for workflow runs 40k pagination cap#8844
Open
yamoyamoto wants to merge 1 commit intoapache:mainfrom
Open
fix(github): bisect adaptive time windows for workflow runs 40k pagination cap#8844yamoyamoto wants to merge 1 commit intoapache:mainfrom
yamoyamoto wants to merge 1 commit intoapache:mainfrom
Conversation
…ation cap GitHub's /actions/runs enforces a 40k cap in unfiltered mode and a 1,000-item cap per filtered search, making any repo with >40k workflow runs uncollectable. Switch to filtered mode and recursively bisect time windows at integer-second midpoints, probing via SubmitBlocking to share rate-limit with the main collector and feeding leaves to a single ApiCollector so raw-table Delete fires only once. See PR description for the full design rationale. Closes apache#8842 Signed-off-by: yamoyamoto <yamo7yamoto@gmail.com>
fcf84bb to
35cf936
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
pr-type/bug-fix.Summary
Repositories with more than ~40k GitHub Actions workflow runs cannot be collected today:
Collect Workflow Runshits GitHub'sper_page * page > 40,000cap in unfiltered mode and fails with HTTP 422, leaving_tool_github_runsempty.This PR switches the collector to always use filtered mode (
created=<from>..<to>) and adds adaptive time-window bisection to work around filtered mode's own 1,000-item-per-search cap. Leaf windows are collected through a singleApiCollectorfed by anInputiterator, so the raw-tableDeletestill fires only once.github_graphqlinherits the fix automatically via its existingCollectRunsMetaimport.Does this close any open issues?
Closes #8842
Screenshots
N/A — internal collector change with no UI surface.
Other Information
cicd_run_collector_test.gocover pagination edges,total_count >= 1000/ HTTP 422 bisection triggers, the non-overlapping integer-second split rule, bootstrap from epoch, and a thin integration check that the raw-tableDeleteis invoked exactly once regardless of how many leaf windows are produced.make buildandmake unit-testpass locally.v1.0.3-beta10and deployed it against our own DevLake instance pointing at a GitHub repository that had previously hit the 40k cap. TheCollect Workflow Runssubtask now completes successfully and_tool_github_runsis populated as expected, so the change has been exercised against real GitHub traffic rather than only synthetic mocks.