Skip to content

[YUNIKORN-3243] Bounded visit guarantee for sibling queues#1

Open
shubhM13 wants to merge 1 commit into
masterfrom
YUNIKORN-3243
Open

[YUNIKORN-3243] Bounded visit guarantee for sibling queues#1
shubhM13 wants to merge 1 commit into
masterfrom
YUNIKORN-3243

Conversation

@shubhM13
Copy link
Copy Markdown
Owner

When two sibling queues have highly asymmetric guaranteed resources (e.g. 3600:1), the DRF sort keeps the larger queue's ratio below the smaller queue's for thousands of cycles. Combined with the single-successful-allocation-per-cycle contract, the smaller queue is never visited by TryAllocate, which also keeps its asks invisible to the autoscaler (inspectOutstandingRequests gates on schedulingAttempted, which is only set inside app.tryAllocate()).

This change adds a configurable bounded-visit guarantee at the parent queue level:

  • New queue property queue.starvation.delay (default 15s, 0 disables) configures how long a child queue may be skipped before the parent's sort hoists it to the front.
  • Each queue tracks lastSchedulingAttempt, started when pending demand transitions from zero, cleared when pending drains, and refreshed at the top of every TryAllocate descent.
  • sortQueues() runs the normal DRF/priority sort first, then hoistStarvedQueues() moves children whose elapsed time exceeds the delay to the front, ordered oldest-first so the most-neglected sibling wins when multiple queues cross the threshold in the same cycle. Non-starved children keep their DRF order.

The implementation is local to queue.go, does not change the scheduling cycle's single-allocation contract (keeps K8s predicate correctness), and is opt-outable per queue. Integration tests reuse the 3600:1 reproducer from PR apache#1077 to validate that the sparkpi ask is served on the first cycle after the delay elapses and that its schedulingAttempted flag flips to true, restoring the autoscaler signal.

Made-with: Cursor

What is this PR for?

A few sentences describing the overall goals of the pull request's commits.
First time? Check out the contributing guide - http://yunikorn.apache.org/community/how_to_contribute

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

When two sibling queues have highly asymmetric guaranteed resources
(e.g. 3600:1), the DRF sort keeps the larger queue's ratio below the
smaller queue's for thousands of cycles. Combined with the
single-successful-allocation-per-cycle contract, the smaller queue is
never visited by TryAllocate, which also keeps its asks invisible to
the autoscaler (inspectOutstandingRequests gates on
schedulingAttempted, which is only set inside app.tryAllocate()).

This change adds a configurable bounded-visit guarantee at the parent
queue level:

  * New queue property `queue.starvation.delay` (default 15s, 0
    disables) configures how long a child queue may be skipped before
    the parent's sort hoists it to the front.
  * Each queue tracks `lastSchedulingAttempt`, started when pending
    demand transitions from zero, cleared when pending drains, and
    refreshed at the top of every TryAllocate descent.
  * sortQueues() runs the normal DRF/priority sort first, then
    hoistStarvedQueues() moves children whose elapsed time exceeds
    the delay to the front, ordered oldest-first so the most-neglected
    sibling wins when multiple queues cross the threshold in the same
    cycle. Non-starved children keep their DRF order.

The implementation is local to queue.go, does not change the scheduling
cycle's single-allocation contract (keeps K8s predicate correctness),
and is opt-outable per queue. Integration tests reuse the 3600:1
reproducer from PR apache#1077 to validate that the sparkpi ask is served on
the first cycle after the delay elapses and that its
schedulingAttempted flag flips to true, restoring the autoscaler
signal.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants