Set repl-timeout for slotmigrations tests to prevent disconnections#3703
Conversation
This test will insert keys, and as can be seen from the logs, the insertion somehow is very slow in this CI, eventually causing a repl-timeout disconnection. ``` 50655:M 14 May 2026 00:44:57.812 - DB 0: 5 keys (0 volatile) in 7 slots HT. 50655:M 14 May 2026 00:45:02.884 - DB 0: 6 keys (0 volatile) in 7 slots HT. 50655:M 14 May 2026 00:45:06.014 # Timing out slot migration xxx after not receiving ack for too long ``` Set repl-timeout for these tests to prevent the timeout disconnections. Also the tests does not actually require inserting distinct keys, we only need to fill the replication buffer, so there is no need for the different keyname, this can save the CI some memory. Closes valkey-io#3702. Signed-off-by: Binbin <binloveplay1314@qq.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughTwo test cases in the cluster migration test suite are updated: each receives an increased replication timeout override and a modification to how test data is populated. The data generation now writes the same key repeatedly instead of creating unique keys per loop iteration. ChangesCluster migration test reliability adjustments
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3703 +/- ##
============================================
- Coverage 76.71% 76.69% -0.03%
============================================
Files 162 162
Lines 80656 80662 +6
============================================
- Hits 61872 61860 -12
- Misses 18784 18802 +18 🚀 New features to boost your workflow:
|
rainsupreme
left a comment
There was a problem hiding this comment.
The change looks good to me. Did you verify that this fixes the flakiness? It's weird that it would be so slow in the CI workflow 🤔
zuiderkwast
left a comment
There was a problem hiding this comment.
Ah, I see the ASM code uses repl-timeout for this:
if (last_interaction &&
(server.unixtime - last_interaction > server.repl_timeout)) {
serverLog(LL_WARNING,
"Timing out slot migration %s "
"after not receiving ack for too long",
job->description);Then, I'm convinced it will fix the failure where this warning is logged. 👍
I did not bother to trigger the test in slow CI... I think the change is good enough to merge. |
This test will insert keys, and as can be seen from the logs, the insertion
somehow is very slow in this CI, eventually causing a repl-timeout disconnection.
Set repl-timeout for these tests to prevent the timeout disconnections.
Also the tests does not actually require inserting distinct keys, we only
need to fill the replication buffer, so there is no need for the different
keyname, this can save the CI some memory.
Closes #3702.