[#10474] docs: Add concurrency-control design for multi-node Gravitino (TreeLock)#11888
Open
yuqi1129 wants to merge 1 commit into
Open
[#10474] docs: Add concurrency-control design for multi-node Gravitino (TreeLock)#11888yuqi1129 wants to merge 1 commit into
yuqi1129 wants to merge 1 commit into
Conversation
…avitino Analyzes what TreeLock protects today, why it breaks under HA, and compares two directions (remove the lock + DB-level OCC vs. a cross-node lock). Concludes with the remove/shrink direction and a phased plan. Includes rendered diagrams under design-docs/images and their .mmd sources under design-docs/diagrams.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new design document analyzing why the current in-process TreeLock does not provide cross-node safety in HA deployments, compares “DB-driven correctness (OCC + conditional inserts)” vs “distributed locking”, and recommends Direction 1 (move correctness to the shared database, then shrink TreeLock to a small per-entity in-process helper).
Changes:
- Added a detailed HA concurrency-control design doc for TreeLock limitations and the dual-write model (external catalog + Gravitino store).
- Documented a phased proposal centered on OCC retries scoped to the relational store layer, always-increasing versioning, and parent-aliveness conditional inserts.
- Embedded/linked rendered diagrams to illustrate write flows, import/read-repair, and tradeoffs.
| /metalake/cat/db/t1 READ (parent write-locked) (parent write-locked) | ||
| ``` | ||
|
|
||
| This is built on `LockManager`, which keeps an in-memory tree of `TreeLockNode`s (each one wraps a `ReentrantReadWriteLock`), plus reference counting, a background thread that removes unused nodes, and another background thread that checks for deadlocks inside the same JVM. It is about 700 lines of code in total. |
|
|
||
| ### Not every catalog has an external system that decides the winner | ||
|
|
||
| Whether the external system can act as the judge depends on the catalog. In the code this is the `managedStorage` capability (`Capability` in `core/.../connector/capability/Capability.java`; the default returns "managed" only for functions). The catalogs split into two groups: |
Comment on lines
+98
to
+99
|  | ||
|
|
Code Coverage Report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Adds a design document (
design-docs/treelock-necessity-and-concurrency-design.md, with rendered diagrams) that:TreeLockprotects today and why it is per-JVM, so it stops protecting anything across servers under HA.Rendered diagrams are embedded under
design-docs/images/treelock-*.png.Why are the changes needed?
#10474 asks for a concrete proposal on TreeLock's limitations for HA. This document provides the analysis and a recommended direction before any code change.
Related to #10474
Does this PR introduce any user-facing change?
No. Documentation only.
How was this patch tested?
Not applicable — documentation only. Diagrams were rendered from their
.mmdsources with mermaid-cli and embedded as PNGs.