Skip to content

perf(migrations): skippable extension reconcile + drop unused global vector index#2309

Merged
nicoloboschi merged 2 commits into
mainfrom
perf-bulk-extension-precheck
Jun 19, 2026
Merged

perf(migrations): skippable extension reconcile + drop unused global vector index#2309
nicoloboschi merged 2 commits into
mainfrom
perf-bulk-extension-precheck

Conversation

@nicoloboschi

@nicoloboschi nicoloboschi commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Two changes that trim the post-migration extension reconcile
(ensure_vector_extension / ensure_text_search_extension), which runs per
tenant schema and dominates the migration job at large tenant counts.

1. --skip-extension-reconcile on run-db-migration

The reconcile only does real work when the configured backend
(HINDSIGHT_API_VECTOR_EXTENSION / HINDSIGHT_API_TEXT_SEARCH_EXTENSION) differs
from a schema's existing index/column shape — a rare, operator-driven change (and
on populated tables it refuses and asks you to re-embed anyway). On a routine
no-change re-migration over many tenants it's pure per-tenant overhead.

run_migrations_for_schemas already takes an ensure_extensions parameter; this
exposes it on the CLI as --skip-extension-reconcile (default off → behavior
unchanged). A backend change still needs a normal run to reshape the indexes —
the flag is an explicit "I haven't changed the backend" assertion.

2. Stop creating the unused global memory_units vector index

For per-bank backends (pgvector / pgvectorscale / vchord), every vector search is
bank + fact_type scoped and served by the per-(bank, fact_type) partial indexes
created at bank-creation time (bank_utils.create_bank_vector_indexes). The
global idx_memory_units_embedding is never chosen by the planner when bank_id
is in the WHERE clause — which is exactly why migration d5e6f7a8b9c0 drops it
for these backends.

ensure_vector_extension nonetheless recreated that global index on a fresh/empty
schema (the "No embedding index found, will create it" path), putting an unused,
write-amplifying index back on every schema. This skips creating it for per-bank
backends. scann (which genuinely uses a global filtered index) is unaffected.

Verified via EXPLAIN on a real DB (3000 rows, both indexes present): the
semantic query uses the per-bank idx_mu_emb_* index, and dropping the global
index leaves the plan identical.

Tests

  • Parametrized check that --skip-extension-reconcile threads ensure_extensions
    through to run_migrations_for_schemas.
  • A pg0 integration test asserting a fresh pgvector schema run through
    migrate + ensure_vector_extension ends up with no global
    idx_memory_units_embedding (fails without the fix).

The post-migration step (ensure_vector_extension / ensure_text_search_extension)
runs per tenant schema, each opening a fresh connection and probing the catalog.
Across tens of thousands of tenant schemas this dominates the migration job, even
though the step only does real work when the configured backend
(HINDSIGHT_API_VECTOR_EXTENSION / HINDSIGHT_API_TEXT_SEARCH_EXTENSION) differs from
a schema's existing index/column shape — a rare, operator-driven change that, on
populated tables, refuses and asks you to re-embed anyway.

`run_migrations_for_schemas` already accepts `ensure_extensions`; this just exposes
it on the CLI as `--skip-extension-reconcile` (default off, so behavior is
unchanged) so an operator who has NOT changed the backend can skip the reconcile
on a routine re-migration over many tenants. A backend change still needs a normal
run to reshape the indexes.

Test: parametrized check that the flag threads ensure_extensions through to
run_migrations_for_schemas.
@nicoloboschi nicoloboschi force-pushed the perf-bulk-extension-precheck branch from 2b792b8 to bbde4ea Compare June 19, 2026 13:20
@nicoloboschi nicoloboschi changed the title perf(migrations): bulk pre-filter the post-migration extension reconcile feat(admin-cli): add --skip-extension-reconcile to run-db-migration Jun 19, 2026
…index

For per-bank vector backends (pgvector / pgvectorscale / vchord) every vector
search is bank + fact_type scoped and served by the per-(bank, fact_type) partial
indexes created at bank-creation time (bank_utils.create_bank_vector_indexes).
The global idx_memory_units_embedding is never chosen by the planner when bank_id
is in the WHERE clause — which is exactly why migration d5e6f7a8b9c0 drops it for
these backends. Verified via EXPLAIN: the semantic query uses idx_mu_emb_* whether
or not the global index exists.

ensure_vector_extension nonetheless recreated that global index on a fresh/empty
schema (the "No embedding index found, will create it" path), adding an unused,
write-amplifying index back to every schema. Skip creating it for per-bank
backends. scann (which genuinely uses a global filtered index) is unaffected.

Test: a fresh pgvector schema run through migrate + ensure_vector_extension ends
up with no idx_memory_units_embedding (fails without the fix).
@nicoloboschi nicoloboschi changed the title feat(admin-cli): add --skip-extension-reconcile to run-db-migration perf(migrations): skippable extension reconcile + drop unused global vector index Jun 19, 2026
@nicoloboschi nicoloboschi merged commit 51cb328 into main Jun 19, 2026
98 checks passed
@nicoloboschi nicoloboschi deleted the perf-bulk-extension-precheck branch June 19, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant