Skip to content

fix: avoid race condition when evaluating runs-on with matrix strategy#6050

Open
xingzihai wants to merge 1 commit intonektos:masterfrom
xingzihai:fix-matrix-multi-runner
Open

fix: avoid race condition when evaluating runs-on with matrix strategy#6050
xingzihai wants to merge 1 commit intonektos:masterfrom
xingzihai:fix-matrix-multi-runner

Conversation

@xingzihai
Copy link
Copy Markdown

Summary

Fixes a race condition that caused inconsistent results when using matrix strategy with multiple platforms (windows-latest, macos-latest, ubuntu-latest).

Problem

When running matrix jobs in parallel:

  1. Multiple matrix jobs run via NewParallelExecutor
  2. Each job shares the same Job object (same workflow definition)
  3. runsOnPlatformNames() calls EvaluateYamlNode(ctx, &job.RawRunsOn) which modifies the yaml.Node in-place
  4. Concurrent goroutines modifying job.RawRunsOn caused race conditions

Root Cause

The EvaluateYamlNode method modifies the yaml.Node in-place via ret.Decode(node). When multiple goroutines evaluated matrix jobs concurrently, they would race to modify the shared job.RawRunsOn node.

Solution

  1. Added EvaluateYamlNodeGetResult(context.Context, *yaml.Node) (*yaml.Node, error) to ExpressionEvaluator interface
  2. Implemented this method to return an evaluated node without modifying the original
  3. Modified runsOnPlatformNames to use the new method
  4. Added helper functions extractRunsOnFromNode and nodeAsStringSlice

Changes

  • pkg/runner/expression.go - Added new interface method
  • pkg/runner/run_context.go - Modified to use new method, added helpers

Fixes #5971

Issue nektos#5971: Matrix multi-runner inconsistency

When using matrix strategy with multiple platforms (windows-latest, macos-latest,
ubuntu-latest), results were inconsistent. Sometimes all 3 runners were reported
as unsupported, even when ubuntu-latest was properly configured.

Root cause: EvaluateYamlNode modifies the yaml.Node in-place via ret.Decode(node).
Multiple matrix jobs running in parallel share the same Job object, so concurrent
calls to runsOnPlatformNames() raced to modify job.RawRunsOn, causing inconsistent
evaluation results.

Solution:
- Added EvaluateYamlNodeGetResult method to ExpressionEvaluator interface
  that returns the evaluated node without modifying the original
- Modified runsOnPlatformNames to use the new method
- Added helper functions extractRunsOnFromNode and nodeAsStringSlice
  to extract platform names from the returned evaluated node

This fix ensures each matrix job evaluates its own runs-on expression
without interfering with other parallel matrix jobs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistent results when using multiple runners in a matrix

1 participant