Skip to content

Allow CLUSTERSCAN 0 to be executed on any node directly#3675

Open
enjoy-binbin wants to merge 4 commits into
valkey-io:unstablefrom
enjoy-binbin:clusterscan0
Open

Allow CLUSTERSCAN 0 to be executed on any node directly#3675
enjoy-binbin wants to merge 4 commits into
valkey-io:unstablefrom
enjoy-binbin:clusterscan0

Conversation

@enjoy-binbin
Copy link
Copy Markdown
Member

@enjoy-binbin enjoy-binbin commented May 12, 2026

Currently, for the initial cursor—specifically CLUSTERSCAN 0,
we calculate the slot for "0" (yielding 13907) and then redirect
the request to the corresponding node. However, the initial cursor
"0" should, in principle, be executable by any node, as its sole
purpose is to return the next CLUSTERSCAN cursor:

    /* Handle cursor "0" case. If slot information is provided we return
     * the updated cursor to scan input slot, else scan from slot 0. */
    if (strcmp(objectGetVal(c->argv[1]), "0") == 0) {
        if (opts.input_slot != -1) {
            slot = opts.input_slot;
        } else if (opts.match_slot != -1) {
            slot = opts.match_slot; /* If match maps to a particular slot, start scan from there */
        } else {
            slot = 0;
        }

        addReplyArrayLen(c, 2);
        if (skip_scan) {
            addReplyBulkCString(c, "0");
        } else {
            sds new_cursor = sdscatfmt(sdsempty(), "0-{%s}-0", crc16_slot_table[slot]);
            addReplyBulkSds(c, new_cursor);
        }
        addReplyArrayLen(c, 0);
        return;
    }

Add clusterscanGetKeys: cursor "0" returns no keys (handle locally),
non-"0" cursors return the cursor itself so the embedded {hashtag}
routes to the correct slot owner.

doesCommandHaveKeys: When a command has getkeys_proc but all its
key-specs are NOT_KEY (e.g. CLUSTERSCAN), treat it as having no real
keys so that ACL key checks, COMMAND GETKEYS, and Module API are not
misled by routing-only tokens.

ACLSelectorCheckKey: As a defense-in-depth measure, skip key-pattern
ACL validation for entries flagged CMD_KEY_NOT_KEY, since they are
routing tokens (e.g. CLUSTERSCAN cursor) rather than real user keys.

In a three shards empty cluster, before:

❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
-> Redirected to slot [13907] located at 127.0.0.1:30003
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30003> clusterscan 0-{06S}-0
-> Redirected to slot [0] located at 127.0.0.1:30001
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)

In a three shards empty cluster, after:

❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{06S}-0
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)

CLUSTERSCAN was introduced in #2934.

Currently, for the initial cursor—specifically `CLUSTERSCAN 0`,
we calculate the slot for "0" (yielding 13907) and then redirect
the request to the corresponding node. However, the initial cursor
"0" should, in principle, be executable by any node, as its sole
purpose is to return the next `CLUSTERSCAN` cursor:
```
    /* Handle cursor "0" case. If slot information is provided we return
     * the updated cursor to scan input slot, else scan from slot 0. */
    if (strcmp(objectGetVal(c->argv[1]), "0") == 0) {
        if (opts.input_slot != -1) {
            slot = opts.input_slot;
        } else if (opts.match_slot != -1) {
            slot = opts.match_slot; /* If match maps to a particular slot, start scan from there */
        } else {
            slot = 0;
        }

        addReplyArrayLen(c, 2);
        if (skip_scan) {
            addReplyBulkCString(c, "0");
        } else {
            sds new_cursor = sdscatfmt(sdsempty(), "0-{%s}-0", crc16_slot_table[slot]);
            addReplyBulkSds(c, new_cursor);
        }
        addReplyArrayLen(c, 0);
        return;
    }
```

Add clusterscanGetKeys: cursor "0" returns no keys (handle locally),
non-"0" cursors return the cursor itself so the embedded {hashtag}
routes to the correct slot owner.

In a three shards empty cluster, before:
```
❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
-> Redirected to slot [13907] located at 127.0.0.1:30003
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30003> clusterscan 0-{06S}-0
-> Redirected to slot [0] located at 127.0.0.1:30001
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)
```

In a three shards empty cluster, after:
```
❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{06S}-0
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)
```

CLUSTERSCAN was introduced in valkey-io#2934.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Comment thread src/db.c
Copy link
Copy Markdown
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the way it is today for clients specifically. I think it's easier on the client side to mark the second key as opposed to a purpose built get keys command. The only benefit we get is that you can send the command to a random node.

Comment thread src/db.c Outdated
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

CLUSTERSCAN cursors are treated as routing-only tokens (CMD_KEY_NOT_KEY) for ACL and key-extraction. A new clusterscanGetKeys marks/non-marks the cursor, doesCommandHaveKeys logic adjusted, command metadata wired to the new callback, and tests added for ACL and cluster behavior.

Changes

CLUSTERSCAN cluster routing

Layer / File(s) Summary
ACL bypass for routing-only tokens
src/acl.c
ACLSelectorCheckKey() returns early for keyspec entries marked with CMD_KEY_NOT_KEY, bypassing normal key-pattern ACL validation for routing-only tokens.
CLUSTERSCAN key extraction and doesCommandHaveKeys adjustment
src/server.h, src/db.c
Added clusterscanGetKeys: cursor "0" => numkeys=0; non-"0" cursor => report argv[1] as CMD_KEY_NOT_KEY. doesCommandHaveKeys() now treats commands as having keys only if at least one keyspec is not CMD_KEY_NOT_KEY; commands with only CMD_KEY_NOT_KEY specs and a getkeys_proc report no key args.
Command metadata wiring
src/commands/clusterscan.json, src/commands.def
CLUSTERSCAN command entry now references clusterscanGetKeys as its get_keys_function callback instead of NULL.
ACL regression and cluster behavior tests
tests/unit/cluster/clusterscan.tcl, tests/unit/introspection-2.tcl
Added ACL regression tests confirming restricted users can run CLUSTERSCAN without ACL errors on cursors, cluster tests asserting clusterscan 0 returns initial cursor and empty keys on all nodes, and introspection tests asserting COMMAND GETKEYS* reports no key args for NOT_KEY-routed commands including CLUSTERSCAN cursor forms.

Sequence Diagram

sequenceDiagram
  participant Client
  participant CLUSTERSCAN_Command
  participant clusterscanGetKeys
  participant ACLSelectorCheckKey
  participant ClusterRouter

  Client->>CLUSTERSCAN_Command: clusterscan 0
  CLUSTERSCAN_Command->>clusterscanGetKeys: extract keys from argv
  clusterscanGetKeys->>clusterscanGetKeys: cursor == "0"? Yes
  clusterscanGetKeys-->>CLUSTERSCAN_Command: numkeys = 0
  CLUSTERSCAN_Command->>ACLSelectorCheckKey: validate permissions (no keys)
  ACLSelectorCheckKey-->>CLUSTERSCAN_Command: allow
  CLUSTERSCAN_Command-->>Client: initial cursor, local keys

  Client->>CLUSTERSCAN_Command: clusterscan 0-{06S}-0
  CLUSTERSCAN_Command->>clusterscanGetKeys: extract keys from argv
  clusterscanGetKeys->>clusterscanGetKeys: cursor == "0"? No
  clusterscanGetKeys-->>CLUSTERSCAN_Command: numkeys = 1, pos=1 flagged CMD_KEY_NOT_KEY
  CLUSTERSCAN_Command->>ACLSelectorCheckKey: validate permissions for reported arg
  ACLSelectorCheckKey->>ACLSelectorCheckKey: sees CMD_KEY_NOT_KEY -> bypass
  ACLSelectorCheckKey-->>CLUSTERSCAN_Command: allow routing token
  CLUSTERSCAN_Command->>ClusterRouter: route by embedded hashtag in cursor
  ClusterRouter-->>Client: scan results from target slot
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Allow CLUSTERSCAN 0 to be executed on any node directly' clearly and specifically describes the main change, which is enabling the initial cursor '0' to be handled locally without redirection.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly explains the problem (unnecessary redirection of CLUSTERSCAN 0), the solution (clusterscanGetKeys implementation), and provides concrete before/after examples demonstrating the behavior change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@enjoy-binbin
Copy link
Copy Markdown
Member Author

@nmvk @madolson I don't know the ACL details as much as you do, i agree this is a good-to-have option, let me know if we want to close it.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/cluster/clusterscan.tcl (1)

67-94: 💤 Low value

Consider cleaning up the ACL user after the test.

The test creates user scan_acl_leak but doesn't delete it afterward. While this test block likely gets reset between runs, adding cleanup improves test isolation.

♻️ Suggested cleanup
         $rd read

         $rd close
+
+        # Clean up the test user
+        R 0 ACL DELUSER scan_acl_leak
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/cluster/clusterscan.tcl` around lines 67 - 94, Add cleanup to
remove the test ACL user "scan_acl_leak" after the test to avoid leaking state:
after the client $rd is closed (or immediately after the last $rd read), call
the corresponding ACL delete command (the inverse of R 0 ACL SETUSER used to
create the user) — e.g., issue R 0 ACL DELUSER scan_acl_leak — so the user
created by the test is removed when the test finishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/cluster/clusterscan.tcl`:
- Around line 67-94: Add cleanup to remove the test ACL user "scan_acl_leak"
after the test to avoid leaking state: after the client $rd is closed (or
immediately after the last $rd read), call the corresponding ACL delete command
(the inverse of R 0 ACL SETUSER used to create the user) — e.g., issue R 0 ACL
DELUSER scan_acl_leak — so the user created by the test is removed when the test
finishes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4ea48c2c-d593-456f-b0b0-32ac6af5740d

📥 Commits

Reviewing files that changed from the base of the PR and between a813df0 and 688ee70.

📒 Files selected for processing (6)
  • src/acl.c
  • src/commands.def
  • src/commands/clusterscan.json
  • src/db.c
  • src/server.h
  • tests/unit/cluster/clusterscan.tcl

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.68%. Comparing base (a813df0) to head (87a1338).
⚠️ Report is 2 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3675      +/-   ##
============================================
- Coverage     76.94%   76.68%   -0.26%     
============================================
  Files           162      162              
  Lines         80656    80669      +13     
============================================
- Hits          62058    61865     -193     
- Misses        18598    18804     +206     
Files with missing lines Coverage Δ
src/acl.c 92.53% <100.00%> (-0.12%) ⬇️
src/commands.def 100.00% <ø> (ø)
src/db.c 94.85% <100.00%> (+0.04%) ⬆️
src/server.h 100.00% <ø> (ø)

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread src/acl.c
Signed-off-by: Binbin <binloveplay1314@qq.com>
Copy link
Copy Markdown
Contributor

@nmvk nmvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @enjoy-binbin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants