Skip to content

[YUNIKORN-3243] Add tests reproducing fair-share queue starvation and autoscaler blindness#1077

Open
shubhM13 wants to merge 2 commits into
apache:masterfrom
shubhM13:master
Open

[YUNIKORN-3243] Add tests reproducing fair-share queue starvation and autoscaler blindness#1077
shubhM13 wants to merge 2 commits into
apache:masterfrom
shubhM13:master

Conversation

@shubhM13
Copy link
Copy Markdown

@shubhM13 shubhM13 commented Mar 20, 2026

What is this PR for?

This PR adds 20 tests (18 integration + 2 unit) that reproduce a critical scheduling bug where sibling queues with asymmetric guaranteed resources cause indefinite starvation of the smaller queue, make its pending asks invisible to the cluster autoscaler, and validates that the priority.offset queue property eliminates the starvation.

The Bug

When two sibling queues under the same parent have vastly different guaranteed resources (e.g., 3600:1 ratio), the fair-share queue sorting in TryAllocate causes the smaller queue to never be visited by the scheduling loop:

  1. sortQueuesByFairnessAndPriority sorts children by max(allocated/guaranteed) (dominant resource fairness)
  2. The larger queue's ratio stays below the smaller queue's ratio for hundreds/thousands of cycles
  3. TryAllocate returns on the first child success → smaller queue is never reached
  4. SetSchedulingAttempted(true) (set inside app.tryAllocate() at line 1035) is never called for the smaller queue's asks
  5. inspectOutstandingRequests skips asks where schedulingAttempted=false → autoscaler never notified

Production impact: Platform health check jobs (spark-pi) starved for 9.5+ minutes during peak batch processing. No PodUnschedulable events emitted. Karpenter completely unaware capacity was needed.

Config-level fix: Setting priority.offset on the smaller queue causes sortQueuesByPriorityAndFairness (the default sort with prioritySortEnabled=true) to sort by priority FIRST, overriding fair share. The smaller queue sorts first regardless of ratio, eliminating the starvation.

Test Matrix

Files Changed

File Change
pkg/scheduler/tests/queue_starvation_test.go NEW — 18 integration tests
pkg/scheduler/scheduler_test.go MODIFIED — 1 new unit test added

Integration Tests (queue_starvation_test.go)

All tests use the mockScheduler harness with MultiStepSchedule (manual scheduling mode).

Core Bug Proof

Test Cycles Assert What It Proves
TestQueueStarvationWithPriorAllocation 20 small=0, schedulingAttempted=false 3600:1 ratio + prior allocation on small queue = complete starvation for 20+ cycles. Autoscaler blind.
TestQueueStarvationSchedulingAttemptedFlag 10 schedulingAttempted=false Focused proof that the flag is never set when the queue is starved.

Controls (Expected Behavior)

Test Assert What It Proves
TestQueueStarvationNoStarvationAtZero small>0 after 5 cycles When both queues start at 0 allocated, small IS served within 2 cycles. Starvation requires non-zero prior allocation.
TestStarvationWithEqualGuarantees queue-b>0 after 5 cycles With 1:1 guaranteed ratio, fair share alternates correctly. No starvation even with prior allocation.
TestQueueStarvationTiebreakerRecovery sparkpi>0 after 5 cycles At 10:1 ratio, large surpasses small's ratio quickly (2 cycles). Short-lived starvation.

Severity Scaling

Test Sub-tests What It Proves
TestStarvationSeverityByRatio 10:1 (mild), 100:1 (moderate), 3600:1 (severe) Starvation duration scales with the guaranteed ratio. At 10:1, sparkpi served within 5 cycles. At 3600:1, sparkpi starved for all 20 cycles.

Amplifiers

Test Setup What It Proves
TestQueueStarvationContinuousDemand 4 waves of 5 asks each When new asks keep arriving on the large queue, starvation is infinite. Ratio never catches up because demand is replenished. schedulingAttempted stays false across all 20 cycles.
TestUnplaceableAppsAmplifyStarvation 3 unplaceable apps (1T asks > 500G node) + 1 placeable app Apps that can never allocate (simulating wrong tolerations) are a permanent scheduling tax. They're tried every cycle but never leave pending. Combined with one placeable app that succeeds, starvation persists.
TestReservedAllocatePriorityDocumented (documentation) Documents that tryReservedAllocate runs before tryAllocate every cycle, adding another starvation layer when combined with fair share asymmetry.

Autoscaler Blindness

Test Assert What It Proves
TestScaleUpTriggeredIsOneShot HasTriggeredScaleUp() stays true after set Once SetScaleUpTriggered(true) is called, the ask is permanently excluded from outstanding requests. The autoscaler gets exactly one chance.
TestStarvedQueueNoEventsEmitted allocated=0, schedulingAttempted=false, scaleUpTriggered=false Complete silence. A starved queue produces zero observable signals — no headroom failure, no scheduling attempt, no autoscaler trigger, no K8s events, no pod conditions. The autoscaler is completely unaware the ask exists.

Compounding

Test Setup What It Proves
TestFullCompoundStarvation 3600:1 ratio + prior allocation + 3 unplaceable apps + 30 placeable asks All amplifiers combined. sparkpi gets 0 allocations, schedulingAttempted=false, scaleUpTriggered=false. Full production scenario reproduced.

Recovery

Test Assert What It Proves
TestStarvationRecoveryAfterDemandExhaustion sparkpi=1M after large exhausts 5 asks Starvation IS bounded when the large queue runs out of asks. Small queue served on the very next cycle.

Fix Validation — schedulingAttempted at Registration

Test Approach What It Proves
TestSchedulingAttemptedSetOnRegistrationFix Manually sets schedulingAttempted=true on starved ask If the flag were set at ask registration (not inside tryAllocate), the autoscaler would detect the ask immediately. sparkpi is still starved (scheduling order unchanged), but schedulingAttempted=true ensures inspectOutstandingRequests will include it.

Fix Validation — priority.offset Config Change

Test Config What It Proves
TestPriorityOffsetPreventsStarvation root.small: priority.offset=1000, root.large: priority.offset=0 With priority offset, small queue sorts BEFORE large regardless of fair-share ratio. sparkpi served on cycle 1. Starvation eliminated.
TestPriorityOffsetWithContinuousDemand Same config, 3 waves of sparkpi + continuous large demand Each sparkpi cron job is served immediately on arrival. Large queue gets all remaining cycles between waves. No starvation for either queue under sustained load.
TestPriorityOffsetContrastWithoutFix NO priority.offset (same as bug scenario) Contrast test: without priority.offset, sparkpi is starved. Proves priority.offset is the specific config change that eliminates starvation.

Unit Test (scheduler_test.go)

Test What It Proves
TestInspectOutstandingRequestsSkipsUnvisitedQueue Directly calls partition.tryAllocate() and scheduler.inspectOutstandingRequests(). Proves that dominant queue asks (with schedulingAttempted=true) ARE included in outstanding requests, while starved queue asks (with schedulingAttempted=false) are NOT. The autoscaler signal is asymmetric.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

How should this be tested?

# All starvation tests
go test -run "TestQueueStarvation|TestStarvation|TestScaleUp|TestUnplaceable|TestReserved|TestFullCompound|TestSchedulingAttemptedSet|TestStarvedQueue|TestInspectOutstandingRequests" \
  -count=1 -v -timeout 180s ./pkg/scheduler/tests/ ./pkg/scheduler/

# Ensure existing tests still pass
go test -timeout 300s ./pkg/scheduler/...

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@shubhM13 shubhM13 changed the title Add tests reproducing fair-share queue starvation and autoscaler blindness [YUNIKORN-3243] - Add tests reproducing fair-share queue starvation and autoscaler blindness Mar 21, 2026
…the smaller queue first regardless of fair share ratio.
@shubhM13 shubhM13 changed the title [YUNIKORN-3243] - Add tests reproducing fair-share queue starvation and autoscaler blindness [YUNIKORN-3243] Add tests reproducing fair-share queue starvation and autoscaler blindness Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants