Skip to content

cluster: abort force scale-in when PD delete fails#2704

Open
lhy1024 wants to merge 3 commits intopingcap:masterfrom
lhy1024:delete-store
Open

cluster: abort force scale-in when PD delete fails#2704
lhy1024 wants to merge 3 commits intopingcap:masterfrom
lhy1024:delete-store

Conversation

@lhy1024
Copy link
Copy Markdown
Contributor

@lhy1024 lhy1024 commented Apr 29, 2026

What problem does this PR solve?

Close #2705

When tiup cluster scale-in --force scales in a TiKV instance, TiUP currently ignores errors from deleting the store/member in PD, then continues to stop/destroy the instance and remove it from the topology.

If PD rejects the delete-store request, for example because removing the TiKV would make the number of remaining up stores less than max-replicas, the store remains Up / Serving in PD while TiUP no longer shows the instance in tiup cluster display.

What is changed and how it works?

  • Keep using the normal PD delete-store API for TiKV/TiFlash scale-in. This PR does not mark the store as physically destroyed and does not pass PD's force query.
  • In the scale-in --force path, return the PD delete error immediately instead of warning and continuing.
  • This prevents TiUP from stopping/destroying the instance and updating topology when PD has not accepted the store/member deletion.
  • Add a unit test to ensure PDClient.DelStore does not send the force query parameter.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release notes:

Fix an issue that `tiup cluster scale-in --force` could remove a TiKV instance from topology even when PD rejected deleting its store.

Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot Bot requested a review from nexustar April 29, 2026 05:54
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign xhebox for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 29, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 42.14%. Comparing base (9bdae04) to head (77a4237).

Files with missing lines Patch % Lines
pkg/cluster/operation/scale_in.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2704      +/-   ##
==========================================
- Coverage   42.71%   42.14%   -0.57%     
==========================================
  Files         424      424              
  Lines       49744    47146    -2598     
==========================================
- Hits        21244    19868    -1376     
+ Misses      25803    24595    -1208     
+ Partials     2697     2683      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 29, 2026
Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 30, 2026
@lhy1024 lhy1024 changed the title pd: use physically destroyed when tiup using --force cluster: abort force scale-in when PD delete fails Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scale-in --force leaves removed TiKV store as Up/Serving in PD

2 participants