Performance regression: IVF/PQ Python search slower in 466405f

The lance-bench run for Lance `466405f476bcd02b6dfa3f78133eaebd76158c66` flagged many Python IVF/PQ search benchmarks as slower, mostly cached 100K/1M variants with 10 or 50 probes.

Benchmark run: https://github.com/lancedb/lance-bench/actions/runs/27047460383

High concern statistical results from lance-bench include:

- `python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-no_refine-10probes-1M]`: p=0.000020, +55.05% slower
- `python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-no_refine-50probes-1M]`: p=0.000256, +82.81% slower
- `python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k100-no_refine-50probes-1M]`: p=0.000286, +74.22% slower
- `python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-refine_1x-50probes-1M]`: p=0.000601, +72.98% slower
- `python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search_with_payload[cache-k10-no_refine-50probes-1M]`: p=0.000632, +78.38% slower
- `python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search_with_payload[cache-k10-refine_1x-50probes-1M]`: p=0.000678, +74.42% slower
- Several related cached 100K and k100/refine variants are also flagged slower in the 25-70% range.

Representative per-result timings queried from the benchmark DB:

```text
python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-no_refine-10probes-1M]
older avg: 2.05 ms, last4 avg: 3.18 ms, change: +55.1%
2026-05-17 17:25  53b8556  3.38 ms
2026-05-18 00:10  2f7a96f  2.89 ms
2026-05-18 07:17  3313b56  2.91 ms
2026-06-05 23:46  466405f  3.57 ms

python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-no_refine-50probes-1M]
older avg: 6.14 ms, last4 avg: 11.23 ms, change: +82.8%
2026-05-17 17:25  53b8556  10.95 ms
2026-05-18 00:10  2f7a96f  10.65 ms
2026-05-18 07:17  3313b56  11.55 ms
2026-06-05 23:46  466405f  11.76 ms
```

Note: for the 50-probe representative, the latest value is similar to the last few May measurements, but the benchmark's full historical baseline is much lower, which is why the last-4 analysis is still flagged. The broader report shows many IVF/PQ variants moving slower together.

The Python benchmark history has a data gap for these benchmarks between `3313b56` on 2026-05-18 and `466405f` on 2026-06-05, so I cannot isolate this to one commit. Potentially relevant recent changes include:

```text
2026-06-05T06:09:49Z 1f7e8f7 feat(index): support multi-bit IVF_RQ storage (#7038)
2026-06-04T21:25:31Z f80b83a perf(pq): wrap l2_targets in Arc to eliminate per-partition clones (#7093)
2026-06-04T09:04:27Z 679ef3d fix: advance kmeans redo random init rng (#7074)
2026-06-04T05:58:34Z 38d289d feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds (#7014)
2026-06-03T16:44:47Z 7e4cbad perf: make HNSW cheaper to load (#6798)
```

Suggested reproduction: run the listed Python IVF/PQ benchmarks against `466405f` and compare with `3313b56` plus intermediate commits in the 2026-05-18 to 2026-06-05 range.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression: IVF/PQ Python search slower in 466405f #7137

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Performance regression: IVF/PQ Python search slower in 466405f #7137

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions