[YUNIKORN-3243] Bounded visit guarantee for sibling queues#1
Open
shubhM13 wants to merge 1 commit into
Open
Conversation
When two sibling queues have highly asymmetric guaranteed resources
(e.g. 3600:1), the DRF sort keeps the larger queue's ratio below the
smaller queue's for thousands of cycles. Combined with the
single-successful-allocation-per-cycle contract, the smaller queue is
never visited by TryAllocate, which also keeps its asks invisible to
the autoscaler (inspectOutstandingRequests gates on
schedulingAttempted, which is only set inside app.tryAllocate()).
This change adds a configurable bounded-visit guarantee at the parent
queue level:
* New queue property `queue.starvation.delay` (default 15s, 0
disables) configures how long a child queue may be skipped before
the parent's sort hoists it to the front.
* Each queue tracks `lastSchedulingAttempt`, started when pending
demand transitions from zero, cleared when pending drains, and
refreshed at the top of every TryAllocate descent.
* sortQueues() runs the normal DRF/priority sort first, then
hoistStarvedQueues() moves children whose elapsed time exceeds
the delay to the front, ordered oldest-first so the most-neglected
sibling wins when multiple queues cross the threshold in the same
cycle. Non-starved children keep their DRF order.
The implementation is local to queue.go, does not change the scheduling
cycle's single-allocation contract (keeps K8s predicate correctness),
and is opt-outable per queue. Integration tests reuse the 3600:1
reproducer from PR apache#1077 to validate that the sparkpi ask is served on
the first cycle after the delay elapses and that its
schedulingAttempted flag flips to true, restoring the autoscaler
signal.
Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When two sibling queues have highly asymmetric guaranteed resources (e.g. 3600:1), the DRF sort keeps the larger queue's ratio below the smaller queue's for thousands of cycles. Combined with the single-successful-allocation-per-cycle contract, the smaller queue is never visited by TryAllocate, which also keeps its asks invisible to the autoscaler (inspectOutstandingRequests gates on schedulingAttempted, which is only set inside app.tryAllocate()).
This change adds a configurable bounded-visit guarantee at the parent queue level:
queue.starvation.delay(default 15s, 0 disables) configures how long a child queue may be skipped before the parent's sort hoists it to the front.lastSchedulingAttempt, started when pending demand transitions from zero, cleared when pending drains, and refreshed at the top of every TryAllocate descent.The implementation is local to queue.go, does not change the scheduling cycle's single-allocation contract (keeps K8s predicate correctness), and is opt-outable per queue. Integration tests reuse the 3600:1 reproducer from PR apache#1077 to validate that the sparkpi ask is served on the first cycle after the delay elapses and that its schedulingAttempted flag flips to true, restoring the autoscaler signal.
Made-with: Cursor
What is this PR for?
A few sentences describing the overall goals of the pull request's commits.
First time? Check out the contributing guide - http://yunikorn.apache.org/community/how_to_contribute
What type of PR is it?
Todos
What is the Jira issue?
[YUNIKORN-2] Gang scheduling interface parametersHow should this be tested?
Screenshots (if appropriate)
Questions: