Add a fast path to Map::prepare.#1023
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds an early-return fast path to diskann::graph::workingset::Map::prepare when the internal HashMap is empty, aiming to make prepare() cheap to call after Map::clear() and avoid touching the provided iterator in that case.
Changes:
- Add
self.map.is_empty()fast path toMap::prepare, returning before doing generation/eviction work. - Document that
prepare()may not consume its iterator. - Add a unit test asserting the iterator is not advanced (and that generation does not change) on the empty-map fast path.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
metajack
approved these changes
May 5, 2026
harsha-simhadri
approved these changes
May 5, 2026
added 2 commits
May 5, 2026 14:07
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1023 +/- ##
==========================================
- Coverage 89.51% 89.51% -0.01%
==========================================
Files 460 460
Lines 85424 85436 +12
==========================================
+ Hits 76467 76477 +10
- Misses 8957 8959 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
metajack
approved these changes
May 6, 2026
Merged
hildebrandmw
added a commit
that referenced
this pull request
May 12, 2026
# DiskANN v0.52.0 Release Notes ## Breaking Changes An AI generated, human reviewed list of changes is summarized below. ### `get_degree_stats` signature changed ([#998](#998)) `DiskANNIndex::get_degree_stats` now takes an explicit iterator of IDs instead of requiring the data provider to implement `IntoIterator`. ```rust // Before — provider had to impl IntoIterator index.get_degree_stats(&mut accessor)?; // After — caller supplies the ID iterator index.get_degree_stats(&mut accessor, id_iter)?; ``` ### PQ dimension contract tightened; entries now `&[f32]` only ([#1044](#1044)) With `AlignedBoxWithSlice` removed from the PQ path, the dimension handling has been refactored into a three-layer contract: | Layer | Where | Contract | |---|---|---| | **Boundary (inmem)** | `QueryComputer::new`, `MultiQueryComputer::new`, `DistanceComputer::evaluate_similarity` | `len == dim` (returns `Err` on mismatch) | | **Boundary (disk)** | `PQScratch::set` | `len >= dim`, slices to `[..dim]` | | **Internal** | `TableL2/IP/Cosine::{new, populate}` | Trusted — no re-validation | **Other changes:** - PQ table populate/distance methods now accept `&[f32]` instead of `<U: Into<f32>>`. Callers must pre-decode quantized vectors via `VectorRepr::as_f32`. - Generic trampoline impls (`&Vec<u8>`, `&&[u8]`) on `QueryComputer` / `DistanceComputer` have been removed. ### `calculate_chunk_offsets` relocated to `ChunkOffsets` constructors ([#976](#976)) The free functions `calculate_chunk_offsets` and `calculate_chunk_offsets_auto` have been moved into constructors on `ChunkOffsets` / `ChunkOffsetsView` in `diskann-quantization::views`. ```rust // Before let offsets = calculate_chunk_offsets(dim, num_chunks); // After (allocating) let offsets = ChunkOffsets::partition(dim, num_chunks)?; // After (zero-alloc, borrows caller-owned scratch) let view = ChunkOffsetsView::partition_into(dim, &mut scratch)?; ``` Additionally, `get_chunk_from_training_data` has been moved from public API. ### `CachingProvider` removed ([#1052](#1052)) The entire `diskann_providers::model::graph::provider::async_::caching` module has been deleted. **Why:** The `CachingProvider` was an experiment in transparent caching over `DataProvider`. In practice it required double monomorphization of the indexing code, didn't save integration work for bulk methods like `on_elements_unordered`/`distances_unordered`, and was complex to maintain. An internal user who …migrated off it removed ~1,000 lines of code, improved compile times by ~20%, and substantially reduced complexity. **Upgrade:** Manage caching directly in your `DataProvider` implementation. ## New Features ### AVX-512 4-bit distance kernels ([#1045](#1045)) Native V4 (AVX-512) specializations for 4-bit packed vector distance computations: - **`SquaredL2`** — 16 × `u32` lanes per iteration via `_mm512_madd_epi16`. - **`InnerProduct`** — AVX-512 VNNI (`_mm512_dpbusd_epi32`) over `u8x64` / `i8x64` operands. Previously, V4 hardware fell back to two AVX2 (V3) kernel invocations per 512-bit chunk. The native kernels double per-instruction throughput. No API changes — existing code benefits automatically on AVX-512 capable hardware. ## Merged PRs * Deprecate 32-bit targets by @suhasjs in #1022 * Add a fast path to `Map::prepare`. by @hildebrandmw in #1023 * Add boundary checks in gen_associated_data_from_range() by @Copilot in #847 * [deps] Don't pull `rayon` as a dependency of `diskann`. by @hildebrandmw in #1024 * Bump openssl from 0.10.78 to 0.10.79 by @dependabot[bot] in #1026 * Cleaning up test work and changing the get_degree_stats signature. by @JordanMaples in #998 * Reduce scalar-quantization benchmark monomorphization by @suri-kumkaran in #1041 * [diskann-vector] Support truly unaligned distances. by @hildebrandmw in #981 * rename spherical.json to graph index with spherical quantization by @harsha-simhadri in #1042 * [PQ Cleanup] Part 2: Consolidate `calculate_chunk_offsets*` by @arkrishn94 in #976 * PQ: tighten dim contract; right-size scratch buffer by @wuw92 in #1044 * Add v4 distance kernels (4-bit SquaredL2 / InnerProduct) by @m3hm3t in #1045 * Remove the Caching Provider by @hildebrandmw in #1052 ## New Contributors * @suhasjs made their first contribution in #1022 * @m3hm3t made their first contribution in #1045 **Full Changelog**: v0.51.0...v0.52.0 Co-authored-by: Mark Hildebrand <mhildebrand@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When the map is empty, we can avoid doing any work in
Map::prepare. This change makesMap::prepareefficient to call after aMap::clear, which helps keep calling code simpler indiskann-garnetwhen we dynamically switch to quantization part-way through backedge pruning.