Add a fast path to `Map::prepare`. by hildebrandmw · Pull Request #1023 · microsoft/DiskANN

hildebrandmw · 2026-05-05T20:32:01Z

When the map is empty, we can avoid doing any work in Map::prepare. This change makes Map::prepare efficient to call after a Map::clear, which helps keep calling code simpler in diskann-garnet when we dynamically switch to quantization part-way through backedge pruning.

Copilot

Pull request overview

This PR adds an early-return fast path to diskann::graph::workingset::Map::prepare when the internal HashMap is empty, aiming to make prepare() cheap to call after Map::clear() and avoid touching the provided iterator in that case.

Changes:

Add self.map.is_empty() fast path to Map::prepare, returning before doing generation/eviction work.
Document that prepare() may not consume its iterator.
Add a unit test asserting the iterator is not advanced (and that generation does not change) on the empty-map fast path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov-commenter · 2026-05-05T21:37:59Z

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.51%. Comparing base (09af6f0) to head (baea179).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
diskann/src/graph/workingset/map.rs	91.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1023      +/-   ##
==========================================
- Coverage   89.51%   89.51%   -0.01%     
==========================================
  Files         460      460              
  Lines       85424    85436      +12     
==========================================
+ Hits        76467    76477      +10     
- Misses       8957     8959       +2

Flag	Coverage Δ
miri	`89.51% <91.66%> (-0.01%)`	⬇️
unittests	`89.35% <91.66%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
diskann/src/graph/workingset/map.rs	`97.36% <91.66%> (-0.08%)`	⬇️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@suhasjs

# DiskANN v0.52.0 Release Notes ## Breaking Changes An AI generated, human reviewed list of changes is summarized below. ### `get_degree_stats` signature changed ([#998](#998)) `DiskANNIndex::get_degree_stats` now takes an explicit iterator of IDs instead of requiring the data provider to implement `IntoIterator`. ```rust // Before — provider had to impl IntoIterator index.get_degree_stats(&mut accessor)?; // After — caller supplies the ID iterator index.get_degree_stats(&mut accessor, id_iter)?; ``` ### PQ dimension contract tightened; entries now `&[f32]` only ([#1044](#1044)) With `AlignedBoxWithSlice` removed from the PQ path, the dimension handling has been refactored into a three-layer contract: | Layer | Where | Contract | |---|---|---| | **Boundary (inmem)** | `QueryComputer::new`, `MultiQueryComputer::new`, `DistanceComputer::evaluate_similarity` | `len == dim` (returns `Err` on mismatch) | | **Boundary (disk)** | `PQScratch::set` | `len >= dim`, slices to `[..dim]` | | **Internal** | `TableL2/IP/Cosine::{new, populate}` | Trusted — no re-validation | **Other changes:** - PQ table populate/distance methods now accept `&[f32]` instead of `<U: Into<f32>>`. Callers must pre-decode quantized vectors via `VectorRepr::as_f32`. - Generic trampoline impls (`&Vec<u8>`, `&&[u8]`) on `QueryComputer` / `DistanceComputer` have been removed. ### `calculate_chunk_offsets` relocated to `ChunkOffsets` constructors ([#976](#976)) The free functions `calculate_chunk_offsets` and `calculate_chunk_offsets_auto` have been moved into constructors on `ChunkOffsets` / `ChunkOffsetsView` in `diskann-quantization::views`. ```rust // Before let offsets = calculate_chunk_offsets(dim, num_chunks); // After (allocating) let offsets = ChunkOffsets::partition(dim, num_chunks)?; // After (zero-alloc, borrows caller-owned scratch) let view = ChunkOffsetsView::partition_into(dim, &mut scratch)?; ``` Additionally, `get_chunk_from_training_data` has been moved from public API. ### `CachingProvider` removed ([#1052](#1052)) The entire `diskann_providers::model::graph::provider::async_::caching` module has been deleted. **Why:** The `CachingProvider` was an experiment in transparent caching over `DataProvider`. In practice it required double monomorphization of the indexing code, didn't save integration work for bulk methods like `on_elements_unordered`/`distances_unordered`, and was complex to maintain. An internal user who …migrated off it removed ~1,000 lines of code, improved compile times by ~20%, and substantially reduced complexity. **Upgrade:** Manage caching directly in your `DataProvider` implementation. ## New Features ### AVX-512 4-bit distance kernels ([#1045](#1045)) Native V4 (AVX-512) specializations for 4-bit packed vector distance computations: - **`SquaredL2`** — 16 × `u32` lanes per iteration via `_mm512_madd_epi16`. - **`InnerProduct`** — AVX-512 VNNI (`_mm512_dpbusd_epi32`) over `u8x64` / `i8x64` operands. Previously, V4 hardware fell back to two AVX2 (V3) kernel invocations per 512-bit chunk. The native kernels double per-instruction throughput. No API changes — existing code benefits automatically on AVX-512 capable hardware. ## Merged PRs * Deprecate 32-bit targets by @suhasjs in #1022 * Add a fast path to `Map::prepare`. by @hildebrandmw in #1023 * Add boundary checks in gen_associated_data_from_range() by @Copilot in #847 * [deps] Don't pull `rayon` as a dependency of `diskann`. by @hildebrandmw in #1024 * Bump openssl from 0.10.78 to 0.10.79 by @dependabot[bot] in #1026 * Cleaning up test work and changing the get_degree_stats signature. by @JordanMaples in #998 * Reduce scalar-quantization benchmark monomorphization by @suri-kumkaran in #1041 * [diskann-vector] Support truly unaligned distances. by @hildebrandmw in #981 * rename spherical.json to graph index with spherical quantization by @harsha-simhadri in #1042 * [PQ Cleanup] Part 2: Consolidate `calculate_chunk_offsets*` by @arkrishn94 in #976 * PQ: tighten dim contract; right-size scratch buffer by @wuw92 in #1044 * Add v4 distance kernels (4-bit SquaredL2 / InnerProduct) by @m3hm3t in #1045 * Remove the Caching Provider by @hildebrandmw in #1052 ## New Contributors * @suhasjs made their first contribution in #1022 * @m3hm3t made their first contribution in #1045 **Full Changelog**: v0.51.0...v0.52.0 Co-authored-by: Mark Hildebrand <mhildebrand@microsoft.com>

Add a fast path to Map::prepare.

7923922

hildebrandmw requested review from a team and Copilot May 5, 2026 20:32

Copilot started reviewing on behalf of hildebrandmw May 5, 2026 20:37 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread diskann/src/graph/workingset/map.rs Outdated

Comment thread diskann/src/graph/workingset/map.rs Outdated

metajack approved these changes May 5, 2026

View reviewed changes

harsha-simhadri approved these changes May 5, 2026

View reviewed changes

Mark Hildebrand added 2 commits May 5, 2026 14:07

Run formatter.

eb06ad1

Fix failing tests.

baea179

metajack approved these changes May 6, 2026

View reviewed changes

hildebrandmw merged commit f4b44c5 into main May 6, 2026
26 checks passed

hildebrandmw deleted the mhildebr/prepare-fast-path branch May 6, 2026 16:08

hildebrandmw mentioned this pull request May 12, 2026

Set version to v0.52.0 #1056

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a fast path to `Map::prepare`.#1023

Add a fast path to `Map::prepare`.#1023
hildebrandmw merged 3 commits into
mainfrom
mhildebr/prepare-fast-path

hildebrandmw commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

hildebrandmw commented May 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented May 5, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants