Allow CLUSTERSCAN 0 to be executed on any node directly#3675
Allow CLUSTERSCAN 0 to be executed on any node directly#3675enjoy-binbin wants to merge 4 commits into
Conversation
Currently, for the initial cursor—specifically `CLUSTERSCAN 0`,
we calculate the slot for "0" (yielding 13907) and then redirect
the request to the corresponding node. However, the initial cursor
"0" should, in principle, be executable by any node, as its sole
purpose is to return the next `CLUSTERSCAN` cursor:
```
/* Handle cursor "0" case. If slot information is provided we return
* the updated cursor to scan input slot, else scan from slot 0. */
if (strcmp(objectGetVal(c->argv[1]), "0") == 0) {
if (opts.input_slot != -1) {
slot = opts.input_slot;
} else if (opts.match_slot != -1) {
slot = opts.match_slot; /* If match maps to a particular slot, start scan from there */
} else {
slot = 0;
}
addReplyArrayLen(c, 2);
if (skip_scan) {
addReplyBulkCString(c, "0");
} else {
sds new_cursor = sdscatfmt(sdsempty(), "0-{%s}-0", crc16_slot_table[slot]);
addReplyBulkSds(c, new_cursor);
}
addReplyArrayLen(c, 0);
return;
}
```
Add clusterscanGetKeys: cursor "0" returns no keys (handle locally),
non-"0" cursors return the cursor itself so the embedded {hashtag}
routes to the correct slot owner.
In a three shards empty cluster, before:
```
❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
-> Redirected to slot [13907] located at 127.0.0.1:30003
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30003> clusterscan 0-{06S}-0
-> Redirected to slot [0] located at 127.0.0.1:30001
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)
```
In a three shards empty cluster, after:
```
❯ ./src/valkey-cli -p 30001 -c
127.0.0.1:30001> clusterscan 0
1) "0-{06S}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{06S}-0
1) "0-{8M}-0"
2) (empty array)
127.0.0.1:30001> clusterscan 0-{8M}-0
-> Redirected to slot [5461] located at 127.0.0.1:30002
1) "0-{63n}-0"
2) (empty array)
127.0.0.1:30002> clusterscan 0-{63n}-0
-> Redirected to slot [10923] located at 127.0.0.1:30003
1) "0"
2) (empty array)
```
CLUSTERSCAN was introduced in valkey-io#2934.
Signed-off-by: Binbin <binloveplay1314@qq.com>
madolson
left a comment
There was a problem hiding this comment.
I prefer the way it is today for clients specifically. I think it's easier on the client side to mark the second key as opposed to a purpose built get keys command. The only benefit we get is that you can send the command to a random node.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
📝 WalkthroughWalkthroughCLUSTERSCAN cursors are treated as routing-only tokens (CMD_KEY_NOT_KEY) for ACL and key-extraction. A new clusterscanGetKeys marks/non-marks the cursor, doesCommandHaveKeys logic adjusted, command metadata wired to the new callback, and tests added for ACL and cluster behavior. ChangesCLUSTERSCAN cluster routing
Sequence DiagramsequenceDiagram
participant Client
participant CLUSTERSCAN_Command
participant clusterscanGetKeys
participant ACLSelectorCheckKey
participant ClusterRouter
Client->>CLUSTERSCAN_Command: clusterscan 0
CLUSTERSCAN_Command->>clusterscanGetKeys: extract keys from argv
clusterscanGetKeys->>clusterscanGetKeys: cursor == "0"? Yes
clusterscanGetKeys-->>CLUSTERSCAN_Command: numkeys = 0
CLUSTERSCAN_Command->>ACLSelectorCheckKey: validate permissions (no keys)
ACLSelectorCheckKey-->>CLUSTERSCAN_Command: allow
CLUSTERSCAN_Command-->>Client: initial cursor, local keys
Client->>CLUSTERSCAN_Command: clusterscan 0-{06S}-0
CLUSTERSCAN_Command->>clusterscanGetKeys: extract keys from argv
clusterscanGetKeys->>clusterscanGetKeys: cursor == "0"? No
clusterscanGetKeys-->>CLUSTERSCAN_Command: numkeys = 1, pos=1 flagged CMD_KEY_NOT_KEY
CLUSTERSCAN_Command->>ACLSelectorCheckKey: validate permissions for reported arg
ACLSelectorCheckKey->>ACLSelectorCheckKey: sees CMD_KEY_NOT_KEY -> bypass
ACLSelectorCheckKey-->>CLUSTERSCAN_Command: allow routing token
CLUSTERSCAN_Command->>ClusterRouter: route by embedded hashtag in cursor
ClusterRouter-->>Client: scan results from target slot
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/cluster/clusterscan.tcl (1)
67-94: 💤 Low valueConsider cleaning up the ACL user after the test.
The test creates user
scan_acl_leakbut doesn't delete it afterward. While this test block likely gets reset between runs, adding cleanup improves test isolation.♻️ Suggested cleanup
$rd read $rd close + + # Clean up the test user + R 0 ACL DELUSER scan_acl_leak }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/cluster/clusterscan.tcl` around lines 67 - 94, Add cleanup to remove the test ACL user "scan_acl_leak" after the test to avoid leaking state: after the client $rd is closed (or immediately after the last $rd read), call the corresponding ACL delete command (the inverse of R 0 ACL SETUSER used to create the user) — e.g., issue R 0 ACL DELUSER scan_acl_leak — so the user created by the test is removed when the test finishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/cluster/clusterscan.tcl`:
- Around line 67-94: Add cleanup to remove the test ACL user "scan_acl_leak"
after the test to avoid leaking state: after the client $rd is closed (or
immediately after the last $rd read), call the corresponding ACL delete command
(the inverse of R 0 ACL SETUSER used to create the user) — e.g., issue R 0 ACL
DELUSER scan_acl_leak — so the user created by the test is removed when the test
finishes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 4ea48c2c-d593-456f-b0b0-32ac6af5740d
📒 Files selected for processing (6)
src/acl.csrc/commands.defsrc/commands/clusterscan.jsonsrc/db.csrc/server.htests/unit/cluster/clusterscan.tcl
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3675 +/- ##
============================================
- Coverage 76.94% 76.68% -0.26%
============================================
Files 162 162
Lines 80656 80669 +13
============================================
- Hits 62058 61865 -193
- Misses 18598 18804 +206
🚀 New features to boost your workflow:
|
Signed-off-by: Binbin <binloveplay1314@qq.com>
nmvk
left a comment
There was a problem hiding this comment.
LGTM, Thanks @enjoy-binbin.
Currently, for the initial cursor—specifically
CLUSTERSCAN 0,we calculate the slot for "0" (yielding 13907) and then redirect
the request to the corresponding node. However, the initial cursor
"0" should, in principle, be executable by any node, as its sole
purpose is to return the next
CLUSTERSCANcursor:Add clusterscanGetKeys: cursor "0" returns no keys (handle locally),
non-"0" cursors return the cursor itself so the embedded {hashtag}
routes to the correct slot owner.
doesCommandHaveKeys: When a command has getkeys_proc but all its
key-specs are NOT_KEY (e.g. CLUSTERSCAN), treat it as having no real
keys so that ACL key checks, COMMAND GETKEYS, and Module API are not
misled by routing-only tokens.
ACLSelectorCheckKey: As a defense-in-depth measure, skip key-pattern
ACL validation for entries flagged CMD_KEY_NOT_KEY, since they are
routing tokens (e.g. CLUSTERSCAN cursor) rather than real user keys.
In a three shards empty cluster, before:
In a three shards empty cluster, after:
CLUSTERSCAN was introduced in #2934.