What happened:
updateNextBatch in pkg/util/inplaceupdate/inplace_update.go reads the InPlaceUpdateState annotation from the original pod argument instead of the freshly-fetched clone inside its retry.RetryOnConflict loop (line 267).
On conflict retries, the state is never refreshed — the NextContainerImages, NextContainerRefMetadata, and NextContainerResources used to build the UpdateSpec always come from the first attempt, even if the pod's annotations were modified between retries (e.g., by kubelet or a concurrent reconcile). This can result in wrong container images being applied or pods getting stuck in Updating state during multi-batch in-place updates.
clone, err := c.podAdapter.GetPod(pod.Namespace, pod.Name) // fresh fetch ✓
// ...
if stateStr, ok := appspub.GetInPlaceUpdateState(pod); !ok { // reads stale 'pod' ✗ — should be 'clone'
The sibling function finishGracePeriod in the same file already follows the correct pattern, reading state from clone on every retry (lines 203, 213).
What you expected to happen:
updateNextBatch should read InPlaceUpdateState from the freshly-fetched clone on each retry iteration, consistent with how finishGracePeriod handles it.
How to reproduce it (as minimally and precisely as possible):
Configure a CloneSet with containers that have different in-place update priorities (triggering multi-batch updates).
Trigger an in-place update so updateNextBatch is called for the second batch.
If a conflict occurs on the first UpdatePod call inside updateNextBatch (e.g., due to a concurrent kubelet status update), the retry will use the stale state from the original pod instead of re-reading from the API server.
This is a race condition, so it's hard to reproduce deterministically without injecting a conflict into the retry path.
Anything else we need to know?:
The fix is a one-line change — pod → clone on line 267. I have a PR ready for this.
Environment:
Kruise version: master (HEAD)
Kubernetes version: all versions affected
Install details: N/A — code-level bug
Others: N/A
What happened:
updateNextBatchinpkg/util/inplaceupdate/inplace_update.goreads theInPlaceUpdateStateannotation from the originalpodargument instead of the freshly-fetchedcloneinside itsretry.RetryOnConflictloop (line 267).On conflict retries, the state is never refreshed — the
NextContainerImages,NextContainerRefMetadata, andNextContainerResourcesused to build theUpdateSpecalways come from the first attempt, even if the pod's annotations were modified between retries (e.g., by kubelet or a concurrent reconcile). This can result in wrong container images being applied or pods getting stuck inUpdatingstate during multi-batch in-place updates.The sibling function finishGracePeriod in the same file already follows the correct pattern, reading state from clone on every retry (lines 203, 213).
What you expected to happen:
updateNextBatch should read InPlaceUpdateState from the freshly-fetched clone on each retry iteration, consistent with how finishGracePeriod handles it.
How to reproduce it (as minimally and precisely as possible):
Configure a CloneSet with containers that have different in-place update priorities (triggering multi-batch updates).
Trigger an in-place update so updateNextBatch is called for the second batch.
If a conflict occurs on the first UpdatePod call inside updateNextBatch (e.g., due to a concurrent kubelet status update), the retry will use the stale state from the original pod instead of re-reading from the API server.
This is a race condition, so it's hard to reproduce deterministically without injecting a conflict into the retry path.
Anything else we need to know?:
The fix is a one-line change — pod → clone on line 267. I have a PR ready for this.
Environment:
Kruise version: master (HEAD)
Kubernetes version: all versions affected
Install details: N/A — code-level bug
Others: N/A