Skip to content

Performance regression: IVF/PQ Python search slower in 466405f #7137

@westonpace

Description

@westonpace

The lance-bench run for Lance 466405f476bcd02b6dfa3f78133eaebd76158c66 flagged many Python IVF/PQ search benchmarks as slower, mostly cached 100K/1M variants with 10 or 50 probes.

Benchmark run: https://github.com/lancedb/lance-bench/actions/runs/27047460383

High concern statistical results from lance-bench include:

  • python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-no_refine-10probes-1M]: p=0.000020, +55.05% slower
  • python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-no_refine-50probes-1M]: p=0.000256, +82.81% slower
  • python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k100-no_refine-50probes-1M]: p=0.000286, +74.22% slower
  • python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-refine_1x-50probes-1M]: p=0.000601, +72.98% slower
  • python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search_with_payload[cache-k10-no_refine-50probes-1M]: p=0.000632, +78.38% slower
  • python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search_with_payload[cache-k10-refine_1x-50probes-1M]: p=0.000678, +74.42% slower
  • Several related cached 100K and k100/refine variants are also flagged slower in the 25-70% range.

Representative per-result timings queried from the benchmark DB:

python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-no_refine-10probes-1M]
older avg: 2.05 ms, last4 avg: 3.18 ms, change: +55.1%
2026-05-17 17:25  53b8556  3.38 ms
2026-05-18 00:10  2f7a96f  2.89 ms
2026-05-18 07:17  3313b56  2.91 ms
2026-06-05 23:46  466405f  3.57 ms

python/ci_benchmarks/benchmarks/test_ivf_pq_search.py::test_ivf_pq_search[cache-k10-no_refine-50probes-1M]
older avg: 6.14 ms, last4 avg: 11.23 ms, change: +82.8%
2026-05-17 17:25  53b8556  10.95 ms
2026-05-18 00:10  2f7a96f  10.65 ms
2026-05-18 07:17  3313b56  11.55 ms
2026-06-05 23:46  466405f  11.76 ms

Note: for the 50-probe representative, the latest value is similar to the last few May measurements, but the benchmark's full historical baseline is much lower, which is why the last-4 analysis is still flagged. The broader report shows many IVF/PQ variants moving slower together.

The Python benchmark history has a data gap for these benchmarks between 3313b56 on 2026-05-18 and 466405f on 2026-06-05, so I cannot isolate this to one commit. Potentially relevant recent changes include:

2026-06-05T06:09:49Z 1f7e8f7 feat(index): support multi-bit IVF_RQ storage (#7038)
2026-06-04T21:25:31Z f80b83a perf(pq): wrap l2_targets in Arc to eliminate per-partition clones (#7093)
2026-06-04T09:04:27Z 679ef3d fix: advance kmeans redo random init rng (#7074)
2026-06-04T05:58:34Z 38d289d feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds (#7014)
2026-06-03T16:44:47Z 7e4cbad perf: make HNSW cheaper to load (#6798)

Suggested reproduction: run the listed Python IVF/PQ benchmarks against 466405f and compare with 3313b56 plus intermediate commits in the 2026-05-18 to 2026-06-05 range.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions