Skip to content

fix(db): resolve resource leakage in database close()#6688

Open
warku123 wants to merge 17 commits intotronprotocol:developfrom
warku123:481_database_leakage_fix
Open

fix(db): resolve resource leakage in database close()#6688
warku123 wants to merge 17 commits intotronprotocol:developfrom
warku123:481_database_leakage_fix

Conversation

@warku123
Copy link
Copy Markdown

What does this PR do?

Fix resource leakage in CheckPointV2Store.close() and TronDatabase.close() by ensuring dbSource.closeDB() is always executed, even when writeOptions.close() throws an exception.

Changes:

  • TronDatabase.close(): Split the single try-catch block into two independent try-catch blocks for writeOptions.close() and dbSource.closeDB(), so that a failure in one does not prevent the other from executing.
  • CheckPointV2Store.close(): Close the subclass-specific writeOptions first, then delegate to super.close() for parent resource cleanup, removing duplicated close logic and logging.
  • CheckPointV2StoreTest: Add two test cases:
    • testCloseReleasesAllResources — verifies that dbSource.closeDB() is called during normal close.
    • testCloseDbSourceWhenWriteOptionsThrows — mocks both child and parent writeOptions.close() to throw exceptions, then verifies dbSource.closeDB() is still called.

Why are these changes required?

Previously, writeOptions.close() and dbSource.closeDB() were in the same try block in both TronDatabase.close() and CheckPointV2Store.close(). If writeOptions.close() threw an exception, dbSource.closeDB() would be skipped, causing the underlying database (LevelDB/RocksDB) to remain open and leak native resources. This is particularly problematic in environments where multiple database instances are created and destroyed.

This PR has been tested by:

  • Unit Tests (CheckPointV2StoreTest — both happy path and exception scenario)
  • Manual Testing

Follow up

N/A

Extra details

This is a revised version of #6621, addressing all review feedback:

  • Fixed the root cause in TronDatabase.close() (parent class), not just the subclass
  • Tests now cover the actual bug scenario (writeOptions throwing exceptions)
  • Tests verify dbSource.closeDB() is called, which is the core guarantee of this fix
  • Removed vacuous assertions, duplicate close calls in finally blocks, and redundant Javadoc

warku123 and others added 5 commits February 6, 2026 16:49
Split the try-catch block to ensure parent resources are released
even when writeOptions.close() throws exception.

- Add proper cleanup in finally block to prevent cross-test contamination
- Add Javadoc annotations for test class and methods
- Fix checkstyle violations (import order and line length)

Test: Add unit test to verify super.close() is always called
Split the single try-catch block in TronDatabase.close() into separate
blocks for writeOptions and dbSource to ensure dbSource.closeDB() runs
even when writeOptions.close() throws an exception.

Rewrite tests to verify the exception scenario: mock writeOptions to
throw and assert that dbSource.closeDB() is still called.
@warku123 warku123 changed the title fix(db): resolve resource leakage in CheckPointV2Store and TronDatabase close() fix(db): resolve resource leakage in database close() Apr 16, 2026
@halibobo1205 halibobo1205 added the topic:DB Database label Apr 16, 2026
@halibobo1205 halibobo1205 added this to the GreatVoyage-v4.8.2 milestone Apr 16, 2026
logger.warn("Failed to close dbSource in {}.", getName(), e);
}
logger.info("******** End to close {}. ********", getName());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix for splitting the try-catch blocks is clean and directly addresses the root cause — really nice work isolating each resource cleanup independently!

One small thing: the "End to close" log moved from a finally block to a bare statement after both try-catch blocks. In the (admittedly rare) event that an unchecked Error (e.g. OutOfMemoryError) escapes from dbSource.closeDB(), this line won't execute — whereas the original finally guaranteed it would. Would it be worth wrapping both try-catch blocks in an outer try-finally?

try {
  try { writeOptions.close(); } catch (Exception e) { ... }
  try { dbSource.closeDB(); } catch (Exception e) { ... }
} finally {
  logger.info("******** End to close {}. ********", getName());
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! Good catch on the Error case. I've moved the log into a finally on the second try-catch block instead of adding an outer try-finally — same guarantee, slightly less nesting. Updated in bcb7c0b.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, putting the log in the second finally is a clean alternative — less nesting with the same guarantee. Looks good!

Move the "End to close" log back into a finally block so it is
guaranteed to execute even if dbSource.closeDB() throws an Error.
logger.debug("******** End to close {}. ********", getName());
logger.warn("Failed to close writeOptions in {}.", getName(), e);
}
super.close();
Copy link
Copy Markdown
Collaborator

@halibobo1205 halibobo1205 Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[[SHOULD]]CheckPointV2Store.close() previously logged at debug on purpose — SnapshotManager.createCheckpoint() opens/closes one on every checkpoint flush. Delegating to super.close() promotes those lines to info and will spam logs.

Suggest a protected doClose() hook in the parent so the subclass can keep its debug level without duplicating logs:

// TronDatabase
@Override
public void close() {
  logger.info("******** Begin to close {}. ********", getName());
  doClose();
  logger.info("******** End to close {}. ********", getName());
}

protected void doClose() {
  try { writeOptions.close(); }
 catch (Exception e) { logger.warn("Failed to close writeOptions in {}.", getName(), e); }
  try { dbSource.closeDB(); }
  catch (Exception e) { logger.warn("Failed to close dbSource in {}.", getName(), e); }
}
// CheckPointV2Store
@Override
public void close() {
  logger.debug("******** Begin to close {}. ********", getName());
  try { writeOptions.close(); }
  catch (Exception e) { logger.warn("Failed to close writeOptions in {}.", getName(), e); }
  doClose();
  logger.debug("******** End to close {}. ********", getName());
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Implemented the doClose() hook in 39cddb5 — CheckPointV2Store.close() now logs at debug level and calls doClose() directly, so high-frequency checkpoint flushes won't spam the info log.

…ug level

TronDatabase.close() now delegates resource cleanup to a protected
doClose() method, keeping info-level logging in the base class.
CheckPointV2Store.close() overrides with debug-level logging and calls
doClose() directly, avoiding log spam on high-frequency checkpoint flushes.
@warku123 warku123 force-pushed the 481_database_leakage_fix branch from 4a979db to 22dd4a1 Compare April 17, 2026 09:44
@Before
public void before() throws IOException {
EventPluginLoader.getInstance().setFilterQuery(null);
Args.getInstance().setNodeListenPort(10000 + port.incrementAndGet());
Copy link
Copy Markdown
Author

@warku123 warku123 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defensive reset unrelated to this PR's resource-leakage fix, but needed to stabilize the JaCoCo overall-delta gate.

Root cause: FilterQueryTest (in org.tron.common) sets EventPluginLoader.filterQuery = {fromBlock=100, toBlock=190} at the end of testMatchFilter() with no cleanup. With forkEvery=100, this PR's new test methods shifted the JVM batch boundary, causing FilterQueryTest and BlockEventGetTest to land in the same JVM. Since start(config) does not reset filterQuery, the stale range filter caused matchFilter() to return false for block 1, skipping processTrigger entirely — 0/293 instructions covered in the PR run vs 277/293 on base.

Evidence:

Fix: FilterQueryTest now resets filterQuery to null in @After, eliminating the pollution at its source. The @Before reset here is an additional guard against any other test leaving dirty singleton state.

Note: This singleton pollution pattern may exist elsewhere in the codebase — any test that writes to a static singleton (e.g. Args, EventPluginLoader, ConfigLoader) without resetting it in @After is a potential source of similar unexpected JaCoCo coverage drops when batch boundaries shift.

FilterQueryTest.testMatchFilter() sets filterQuery={fromBlock=100,
toBlock=190} on the EventPluginLoader singleton with no cleanup.
With forkEvery=100, tests sharing the same JVM inherit this stale
state, causing matchFilter() to return false for block 1 and
skipping processTrigger() entirely (0/293 instructions covered).

Add @after teardown in FilterQueryTest to reset filterQuery to null,
and add a defensive reset in BlockEventGetTest.@before as a guard
against any other preceding test leaving dirty singleton state.
@warku123 warku123 force-pushed the 481_database_leakage_fix branch from 6b37267 to ab0db5f Compare April 22, 2026 08:11
@warku123 warku123 requested a review from halibobo1205 April 22, 2026 08:20
Copy link
Copy Markdown
Collaborator

@halibobo1205 halibobo1205 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

@Slf4j
public class FilterQueryTest {

@After
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warku123, [SHOULD] This @After cleanup of EventPluginLoader.getInstance().setFilterQuery(null) and the similar change in BlockEventGetTest.before() are test-pollution fixes unrelated to the TronDatabase.close() resource leak this PR targets. Mixing them into the same PR makes git blame and review harder. Please split the test-pollution fixes into a separate PR, or add a note in the PR description explaining why they must ship together (e.g. the new tests in this PR surface the existing pollution).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The background and rationale are already written up in an earlier inline comment: #6688 (comment)

TL;DR: the new tests in this PR shifted Gradle's forkEvery=100 batch boundary, which exposed a pre-existing singleton-pollution bug in FilterQueryTestEventPluginLoader.filterQuery was never reset in @After, so BlockEventGetTest landing in the same JVM read a stale {fromBlock=100, toBlock=190} and matchFilter() returned false for block 1, causing processTrigger to go from 277/293 to 0/293 instructions and failing the overall-delta gate.

So these @After / @Before resets aren't ornamental test-pollution fixes — they are a key finding of this PR: without them the CI coverage gate fails (evidence links in the inline comment). I kept them here so the PR lands green.

That said, you're right that the root fix belongs outside this PR's scope. My plan is to open a follow-up issue after this merges, covering both:

  1. The FilterQueryTest singleton cleanup (done here defensively, but the pattern likely exists elsewhere — any test writing to a static singleton without @After reset is a latent coverage-gate hazard).
  2. Auditing other singletons (e.g. Args, EventPluginLoader, ConfigLoader) for similar cross-test leakage.

I'll also update the PR description with a short pointer to the inline comment so reviewers don't have to dig for it.

try {
writeOptions.close();
dbSource.closeDB();
} catch (Exception e) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warku123, [SHOULD] The fact that testCloseDbSourceWhenWriteOptionsThrows has to reflectively set BOTH CheckPointV2Store.writeOptions and TronDatabase.writeOptions proves the subclass has its own writeOptions field shadowing the parent's — two WriteOptionsWrapper instances live side by side. This is pre-existing, but the new close() structure makes the implicit contract "child closes its own writeOptions first, then delegates to parent" very fragile. Consider either promoting the parent field to protected and removing the subclass field, or adding a class-level comment explaining why two writeOptions instances are intentional.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. The two WriteOptionsWrapper instances are intentional rather than accidental shadowing:

  • Parent TronDatabase.writeOptions is configured with isDbSync()
  • CheckPointV2Store.writeOptions is configured with isCheckpointSync() (a dedicated checkpoint-specific sync flag)

Collapsing them into a single protected field would change the sync semantics for either the parent or the checkpoint store, so it's not a zero-risk refactor. Since this shadowing pre-dates the PR and the scope here is the resource-leak fix, I'd prefer to leave the structure as-is and file a follow-up issue if you'd like the shadowing formally addressed. Happy to add a class-level comment documenting the intent if that would help.

try {
doClose();
} finally {
logger.debug("******** End to close {}. ********", getName());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warku123, [SHOULD] doClose() already swallows every exception internally (both the writeOptions and dbSource paths turn into logger.warn), so this outer try { doClose(); } finally { ... } never actually protects anything — the finally only exists to print the End log. It can be simplified to:

doClose();
logger.debug("******** End to close {}. ********", getName());

Keeping the current form isn't wrong, but the finally adds cognitive load without guarding a resource.

Copy link
Copy Markdown
Author

@warku123 warku123 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — doClose() swallows everything internally so the outer try { ... } finally { ... } only existed to print the End log. Simplified to a straight doClose(); logger.debug(...) .

}
try {
doClose();
} finally {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warku123, [QUESTION] Why call doClose() directly instead of super.close()? I assume it's to avoid the "Begin/End to close" info logs being emitted twice, but this turns "child must run its own cleanup first and then invoke doClose() (not super.close())" into an implicit contract — if TronDatabase.close() ever grows new template logic, subclasses will silently miss it. Consider adding Javadoc on doClose() stating: "Subclasses that need to release extra resources before parent cleanup should override close(), release their own resources, and invoke doClose() directly — not super.close() — to avoid duplicated logging."

Copy link
Copy Markdown
Author

@warku123 warku123 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — the "call doClose() not super.close()" contract was implicit and would quietly break subclasses if TronDatabase.close() ever gained template logic. Added Javadoc on doClose() stating the contract explicitly and noting the best-effort/WARN exception semantics.

…ore.close()

Address review feedback on PR tronprotocol#6688:
- Add Javadoc on TronDatabase.doClose() making the "subclasses should call
  doClose() not super.close() to avoid duplicated logs" contract explicit
- Simplify CheckPointV2Store.close() by dropping the outer try/finally around
  doClose(), which swallows all exceptions internally — the finally only
  existed to print the End log
@Override
public void close() {
logger.info("******** Begin to close {}. ********", getName());
try {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD] I think the try-finally statement is redundant; the following statement would suffice.

doClose();
logger.info("******** End to close {}. ********", getName());

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, same reasoning as the CheckPointV2Store.close() simplification from the previous round — doClose() swallows everything internally, so the outer try/finally only existed to print the End log. Applied the suggested form in dadf9bb85.

…/finally

doClose() swallows all exceptions internally (both writeOptions and dbSource
paths log at WARN and do not rethrow), so the outer try/finally only existed
to print the End log. Matches the same simplification applied to
CheckPointV2Store.close() in the previous commit.

Addresses review feedback on PR tronprotocol#6688.
@warku123 warku123 requested a review from xxo1shine April 24, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:DB Database

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants