Skip to content

fix(index): invalidate stale index cache entries after index replacement#7094

Open
geserdugarov wants to merge 2 commits into
lance-format:mainfrom
geserdugarov:fix-invalid-cache-for-removed
Open

fix(index): invalidate stale index cache entries after index replacement#7094
geserdugarov wants to merge 2 commits into
lance-format:mainfrom
geserdugarov:fix-invalid-cache-for-removed

Conversation

@geserdugarov

Copy link
Copy Markdown
Contributor

What

CreateIndex commits can retire existing index metadata, for example when optimize_indices(...retrain...) replaces an IVF index with a new UUID. If the old index had been prewarmed, the session index cache could still hold root entries and IVF partition entries for the removed UUID.

A later read in the same session could then observe stale cached index state that no longer matches the manifest. This clears cache entries for removed index UUIDs after the commit succeeds.

Changes

  • Add exact-key invalidation to LanceCache via invalidate_with_key and invalidate_unsized_with_key.
  • Extend CacheBackend with invalidate_entry and implement it for the Moka backend and existing serializing test backends.
  • On successful Operation::CreateIndex commits, invalidate cache entries for each removed index:
    • legacy vector index root entry
    • IVF state root entry
    • IVF partition sub-cache under uuid/
    • corresponding uuid-fri_uuid/ entries when a fragment-reuse index is active
  • Use the rebased transaction’s removed_indices, so conflict resolution invalidates the indices actually retired by the committed manifest.

Notes

  • Exact-key invalidation is needed because root index cache entries are keyed by UUID, while partition entries live under a cache prefix. Prefix invalidation alone cannot remove one root entry without risking unrelated cache entries.
  • Invalidation happens only after the manifest write succeeds.
  • This does not change index metadata, params, protos, or Python / Java APIs.

Tests

  • Cache unit tests for exact-key invalidation, sibling key/type preservation, prefix scoping, nested sub-cache preservation, missing-key no-op behavior, and unsized cache keys.
  • An index integration test that prewarms an IVF-PQ index, retrains it through optimize_indices, and verifies the removed index UUID no longer has cached root or partition entries.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the bug Something isn't working label Jun 4, 2026
@geserdugarov

Copy link
Copy Markdown
Contributor Author

@Xuanwo, @wjones127, hi!
If you don't mind, could you please review this PR related to index cache entries?

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.41897% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-core/src/cache/mod.rs 98.48% 2 Missing ⚠️
rust/lance/src/index/vector/ivf/v2.rs 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 self-assigned this Jun 4, 2026
Comment thread rust/lance/src/index.rs

// Bare-UUID entries from caches created without FRI.
cache
.invalidate_with_key(&LegacyVectorIndexCacheKey::new(&uuid_str, None))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Also invalidate ScalarIndexDetailsKey?

@wjones127 wjones127 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A later read in the same session could then observe stale cached index state that no longer matches the manifest.

I'm confused by this sentence. Does this suggest a later read could read the wrong index cache entry? Or just that those entries will take up memory when they are likely no longer needed?

This PR seems to presuppose that there are no readers intentionally reading an old version of the table. I think that's a bad assumption. This can happen in two ways:

  1. Because of MVCC, a query that started earlier than the write, but is still ongoing, may still reference that index and want to use it.
  2. Users can and do checkout older versions of tables. Sometimes they might pin their readers to a version while writers do multiple operations on their table.

I worry this PR is going to hurt performance in those two cases. I wonder if a better solution would instead be a TTL expiry on cache entries. What do you think of that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants