draft: make dict a fully concrete, transparent set of map helpers for hashtable, removing dict elements where redundant#3720
Conversation
|
Important Review skippedIgnore keyword(s) in the title. ⛔ Ignored keywords (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Comment |
Replace the dictAddOrFind + dictSetKey pattern in rememberReplicaKeyWithExpire with dictAddRaw, which separates existence-check from entry creation. On new entries, set de->key directly to the owned sds copy — no post-insert mutation needed. This is the only caller of dictSetKey, so remove it from the dict API entirely. 81 expire tests pass. Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Rain Valentine <rainval@amazon.com>
dictEntry is a transparent struct — the accessor functions (dictGetKey, dictGetVal, dictSetVal, dictGetUnsignedIntegerVal, dictSetUnsignedIntegerVal, dictIncrUnsignedIntegerVal, etc.) add no abstraction value. They just wrap direct field access with extra function call syntax. Remove all accessor wrappers from dict.h and replace all call sites with direct field access: dictGetKey(de) -> de->key dictGetVal(de) -> de->v.val dictSetVal(d, de, v) -> de->v.val = v dictGetUnsignedIntegerVal(de) -> de->v.u64 dictSetUnsignedIntegerVal(de,v) -> de->v.u64 = v dictIncrUnsignedIntegerVal(de,v)-> de->v.u64 += v (or --de->v.u64) This makes the code shorter, removes a layer of indirection in reading, and makes it explicit that dictEntry is a plain data struct — not an opaque handle. dictEntryGetKey (the hashtableType callback) is retained since it serves as a function pointer for the hashtable infrastructure. 22 files changed, 227 insertions(+), 284 deletions(-). 954 tests pass (expire, scripting, multi, pubsub, latency, maxmemory, wait). Signed-off-by: Rain Valentine <rsg000@gmail.com> Signed-off-by: Rain Valentine <rainval@amazon.com>
Match the concise copyright format used by hashtable.h/hashtable.c. Signed-off-by: Rain Valentine <rainval@amazon.com>
rainsupreme
left a comment
There was a problem hiding this comment.
It's a little rough maybe, but fine for discussion
3008fec to
b4d6efb
Compare
There was a problem hiding this comment.
Interesting!
You removed the most common accessors like dictGetVal and the diff is just about 300 lines in the whole code base! I'm surprised it's not more.
Is the point to remove all abstractions around dictEntry, so that we don't have a leaky abstraction? Either fully opaque or fully concrete, but not a leaky abstraction with non-opaque types?
I could buy that... My main argument against it is to avoid touching all these lines.
In the past, long ago, the dictEntry was not opaque, yet there were accessors like dictGetVal, though they were just simple defines: https://github.com/valkey-io/valkey/blob/6.0/src/dict.h#L144
The reason I made it opaque (in 7.2) was to be able to mess with the internals and do pointer tagging for the dictEntry pointer. I wouldn't have proposed it otherwise. But also, I wouldn't have proposed swapping all dictGetVal to direct field access either... Avoiding unnecessary changes was a strong desire of the maintainers at the time.
|
@zuiderkwast Yeah, exactly! 😁 My opinion is that it should either be fully opaque or fully concrete "in the end", and this is my idea of what the fully-concrete version might look like. The first commit alone is more conservative - it's just a version that gets rid of dictSetKey by taking advantage of the fact that dict is transparent. Ignoring dictSetKey for now, I'm more interested in what you think of this style of using dict where I reach through to the hashtable underneath. For some reason I felt like I was sinning when I wrote it, but maybe I'm just used to the old dict 😆 I think I disagree with the old maintainers about avoiding change 😅 I lean towards frequently cleaning and organizing as our needs change and evolve, even if it doesn't fix a bug, add a feature, or improve performance. I've had challenging experiences in the past where the team had leaned too heavily in the "avoid changing too much" direction, and it resulted in massive built up tech debt that was miserable to work with and seriously slowed us down. Of course, to go the always-cleaning route you need very strong unit tests and integration tests to make reviewing easier and avoid accidental regression, but Valkey is doing pretty well in that regard. As for solid reasons to go the opaque route instead - I think embedding keys into dictEntry could make sense to eliminate 8B overhead and one memory fetch. If we did that then maybe opaque starts to make more sense, as it would help prevent bugs from people incorrectly accessing/modifying it. 🤷♀️ On the other hand, maybe those cases should be migrated to hashtable instead - it might make more sense to embed the key data in whatever object we're indexing. I raised this PR as an example: #3706 |
… directly Remove the typedef aliases (dict -> hashtable, dictType -> hashtableType, dictIterator -> hashtableIterator) and update all callers to use the hashtable type names directly. This makes it explicit that dict.h is a helper layer on top of hashtable, not a separate type system. The dict* operation helpers (dictFind, dictAdd, dictDelete, etc.) remain as convenience functions that manage dictEntry allocation. 29 files changed, 279/279 unit tests pass. Signed-off-by: Rain Valentine <rainval@amazon.com>
|
just to see what it looks like for discussion, I added a fourth commit that removes the dict and dictType typedefs. Makes it about a 600 line change |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #3720 +/- ##
============================================
+ Coverage 76.65% 76.68% +0.03%
============================================
Files 162 162
Lines 80662 80645 -17
============================================
+ Hits 61830 61846 +16
+ Misses 18832 18799 -33
🚀 New features to boost your workflow:
|
disclaimer: This is mainly for discussion purposes, and represents one potential final state for dict as convenience helpers for using hashtable as a map.
Summary
This is an alternative approach to #3566 (eliminate
dictSetKey). Instead of migratingreplicaKeysWithExpireto a new hashtable struct, this PR:Eliminates
dictSetKeyby usingdictAddRawto get the insert position, then setting the owned key directly on the new entry — no post-insert key mutation needed.Removes all dict accessor wrappers (
dictGetKey,dictGetVal,dictSetVal,dictGetUnsignedIntegerVal,dictSetUnsignedIntegerVal,dictIncrUnsignedIntegerVal, etc.) and replaces all call sites with direct field access (de->key,de->v.val,de->v.u64).Motivation
If we treat
dictEntryas a transparent struct (which it is — it's defined in the public header), then the accessor functions add no abstraction value. They just wrapde->fieldwith extra syntax.This reframes dict as a set of operation helpers (find, add, delete, iterate) that manage the dictEntry allocation + hashtable position dance, rather than an opaque abstraction that hides its internals behind getters/setters.
What remains in dict.h after this
dict = hashtable,dictType = hashtableType,dictIterator = hashtableIteratordictSize,dictCreate,dictRelease, etc. (just naming)dictAdd,dictAddRaw,dictFind,dictDelete,dictNext, etc. (handle zmalloc + cast)dictEntryGetKey(hashtableType callback),dictEntryMemUsage,dictMemUsageFor discussion
Opening as draft to explore what "dict as transparent convenience layer" looks like in practice. This is relevant to the ongoing discussion about dict's long-term role (#3561, #3566).
22 files changed, ~240 insertions, ~290 deletions. All tests pass.