bf_tree migration away from diskann-providers#1020
Conversation
b94f4df to
44be934
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1020 +/- ##
==========================================
- Coverage 89.51% 89.50% -0.01%
==========================================
Files 461 476 +15
Lines 85920 90026 +4106
==========================================
+ Hits 76912 80581 +3669
- Misses 9008 9445 +437
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
ca413d4 to
1051152
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Migrates the async bf_tree provider away from diskann-providers by extracting it into a dedicated diskann-bf_tree-provider crate, while also modernizing quantization (PQ → spherical) and simplifying deletion semantics (hard deletes).
Changes:
- Extracts bf_tree + caching into
diskann-bf_tree-providerand updates workspace/CI accordingly. - Replaces PQ-based quant vector store with spherical quantization (
Poly<dyn Quantizer>) + updated serialization. - Simplifies delete handling by removing the delete-provider generic and switching to hard deletes.
Reviewed changes
Copilot reviewed 18 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-providers/src/model/graph/provider/async_/table_delete_provider.rs | Makes delete-table operations public and removes bf_tree-specific bitmap (de)serialization helpers/tests. |
| diskann-providers/src/model/graph/provider/async_/mod.rs | Adjusts module exports, removes bf_tree/caching modules, re-exports postprocess utilities. |
| diskann-providers/src/model/graph/provider/async_/caching/utils.rs | Updates import path and minor formatting in tests. |
| diskann-providers/src/model/graph/provider/async_/caching/provider.rs | Refactors imports/bounds formatting; adds Future import. |
| diskann-providers/src/model/graph/provider/async_/caching/mod.rs | Removes caching module root from diskann-providers. |
| diskann-providers/src/model/graph/provider/async_/caching/example.rs | Updates test imports to use diskann_providers / local module paths. |
| diskann-providers/src/model/graph/provider/async_/caching/bf_cache.rs | Updates ConfigError import path and test formatting. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/vector_provider.rs | Adds hard-delete helper and adjusts TestCallCount assertions. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs | Switches PQ → spherical quantization + new query/distance computer adapter + test updates. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs | Removes delete-provider generic, implements hard delete, switches postprocess to CopyIds, updates quantizer serialization. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/neighbor_provider.rs | Updates TestCallCount import path. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/mod.rs | Switches re-exports to new bf_tree_provider module and removes QuantAccessor re-export. |
| diskann-providers/Cargo.toml | Removes bf_tree feature/deps from diskann-providers. |
| diskann-bf_tree-provider/src/lib.rs | Introduces new crate entry point with provider + caching modules. |
| diskann-bf_tree-provider/src/caching/mod.rs | Adds caching module root (example tests disabled with TODO). |
| diskann-bf_tree-provider/Cargo.toml | Adds new crate manifest + dependencies for bf_tree provider and spherical quantization. |
| Cargo.toml | Adds diskann-bf_tree-provider to workspace members. |
| .github/workflows/ci.yml | Removes bf_tree from explicitly tested feature list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1219bcc to
cbea923
Compare
|
@copilot would you mind giving this a review |
Reviewed the current branch at |
Add bf_tree insert benchmarks (Criterion + IAI)
Add pre-migration bf_tree provider benchmarks mirroring the existing
inmem insert benchmark. Both Criterion (wall-clock) and IAI-Callgrind
(instruction-count) benchmarks use the same sift-256 dataset and PQ
training as the inmem benchmarks for apples-to-apples comparison.
Gated behind #[cfg(feature = "bf_tree")] so existing benchmarks
continue to work without the feature flag.
Run with: cargo bench --bench bench_main --features virtual_storage,bf_tree
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add bf_tree search benchmarks for migration parity
Add search benchmarks (Criterion + IAI-Callgrind) alongside existing
insert benchmarks. Search exercises the read path through
BfTreeProvider (vector lookups, neighbor access, greedy search
traversal) which is critical for detecting regressions post-migration.
The search benchmark builds the index once during setup, then measures
10 KNN queries (k=10, l=20) against the populated index.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract bf_tree provider into diskann-bf_tree-provider crate
Move the bf_tree provider and caching modules from diskann-providers
into the dedicated diskann-bf_tree-provider crate. This makes the
bf-tree dependency explicit at the crate level rather than hidden
behind a feature flag.
Changes:
- Move bf_tree/ and caching/ modules to diskann-bf_tree-provider/src/
- Set up Cargo.toml with proper dependencies (bf-tree, diskann,
diskann-providers, diskann-utils, diskann-vector, etc.)
- Convert all crate:: imports to diskann_providers:: imports
- Bump visibility of postprocess module, AsDeletionCheck/DeletionCheck
traits, and TableDeleteProviderAsync methods to pub
- Remove bf_tree feature flag and optional deps from diskann-providers
- Remove cfg(feature = "bf_tree") gates from table_delete_provider.rs
- Rename provider/provider.rs to provider/bf_tree_provider.rs (clippy)
- Disable caching example.rs tests (cross-crate cfg(test) limitation)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move bf_tree benchmarks to diskann-bf_tree-provider
Migrate Criterion and IAI-Callgrind benchmark targets from
diskann-providers to the new diskann-bf_tree-provider crate.
Update imports to use the new crate's module paths. Remove
bf_tree benchmark remnants (cfg gates, modules) from
diskann-providers benches.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cargo.lock got missed by last commit
Move delete bitmap serialization to diskann-bf_tree-provider
Move to_bytes/from_bytes logic out of TableDeleteProviderAsync (where
it was bf_tree-scoped) into a dedicated delete_bitmap_serde module in
the new crate. The reimplementation works through the public API and
produces byte-identical output (LE u32 words). Revert the methods in
diskann-providers to pub(crate) + #[cfg(test)].
Adds format-level compatibility tests including known fixtures and
padding bit rejection.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move DeletionCheck traits and RemoveDeletedIdsAndCopy to diskann::graph::glue
These are core to the inplace delete algorithm's search post-processing
and belong in the diskann crate alongside SearchPostProcess, CopyIds,
FilterStartPoints, and Pipeline. This removes the need to widen
diskann-providers' postprocess module to pub for external crates.
Also removes the now-dead to_bytes/from_bytes methods and their tests
from TableDeleteProviderAsync in diskann-providers, since that
serialization logic has been moved to diskann-bf_tree-provider.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use workspace_root() for benchmark test data paths
Replace env!("CARGO_MANIFEST_DIR")-relative paths with
diskann_utils::workspace_root() so benchmarks find test_data/
without requiring symlinks in each crate directory.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The bf_tree feature was removed from diskann-providers when the provider was extracted into its own crate (diskann-bf_tree-provider). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rewrite QuantVectorProvider to use Arc<Poly<dyn Quantizer>> - Add local HybridComputer and QuantQueryComputer wrapper - Update CreateQuantProvider impl for Poly<dyn Quantizer> - Update serialization to use Quantizer::serialize/try_deserialize - Rewrite all tests and doc examples for spherical quantizer - Update benchmarks to use SphericalQuantizer::train with ReciprocalMeanNorm - Move rand to regular dependencies (needed by doc tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Delete HybridAccessor, HybridComputer, and all Hybrid strategy impls (Search, PostProcess, Prune, Insert, MultiInsert, InplaceDelete) - Delete Rerank post-processor (only used by Hybrid) - Remove QuantAccessor (only constructed by Hybrid search path) - Move QuantQueryComputer from hybrid_computer.rs to quant_vector_provider.rs - Delete hybrid_computer.rs - Remove metric field from QuantVectorProvider (dead after Hybrid removal) - Add SaveWith/LoadWith serialization impls back (accidentally lost during D removal) - Fix test compilation: remove deleted_ids references, fix turbofish generics, insert vectors before status checks, skip hard-deleted IDs in verification loops - Update doc examples to remove NoDeletes/TableBasedDeletes from constructors - Clean up unused imports (distances::pq, Hybrid, DistanceFunction, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ree impl now uses hard deletion)
Restore postprocess.rs in diskann-providers with AsDeletionCheck, DeletionCheck, and RemoveDeletedIdsAndCopy. These were temporarily moved into diskann/src/graph/glue.rs during bf_tree extraction but belong alongside the inmem providers that use them. - Create diskann-providers/src/model/graph/provider/async_/postprocess.rs - Remove traits and struct from diskann/src/graph/glue.rs - Update all inmem provider imports to use local postprocess module - Update DeletionCheck impls on NoDeletes and TableDeleteProviderAsync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Restore QuantAccessor with Opaque-based element types for spherical quantization compatibility - Add Quantized strategy impls: SearchStrategy, DefaultPostProcessor, PruneStrategy, InsertStrategy, MultiInsertStrategy, InplaceDeleteStrategy - Use PassThrough working set pattern (matching inmem spherical approach) - Add OwnedOpaque newtype for workingset::View (bf_tree copies vs zero-copy) - Add BuildDistanceComputer impl using UnwrapErr<DistanceComputer> - Remove #[cfg(test)] from get_vector_into and get_vector_sync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Hybrid searches in quantized space (same as Quantized), then reranks final candidates using full-precision distances. Pruning stays in quantized space. This replaces the old HybridComputer with its 4 distance modes (full×full, quant×quant, cross-type) with a simpler search-quant → rerank-full → prune-quant pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reverts accidental pub visibility on TableDeleteProviderAsync methods (is_deleted, delete, undelete, clear) and removes unused RemoveDeletedIdsAndCopy re-export. Nothing outside diskann-providers uses these, so widening the public API was unnecessary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix FullAccessor::on_elements_unordered to silently skip deleted entries - Fix QuantAccessor::on_elements_unordered to silently skip deleted entries - Add create_quant_index and create_full_precision_index test helpers - Add tests: quantized insert+search, multi_insert, delete+search - Add tests: hybrid insert+search, hybrid delete+search - Add tests: full precision insert+search, full precision delete+search - Add doc comments for OwnedOpaque, strategy descriptions, and BfTreeProvider Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rerank::post_process used .expect() on get_vector_sync, which could panic if a concurrent delete removed a vector between the quantized search phase and the reranking phase. Replace with .filter_map() to silently skip unreadable entries, consistent with the pattern used in QuantAccessor::on_elements_unordered and FullAccessor. Caught during local code review. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0ad39b8 to
83e6231
Compare
…on delete FullAccessor::get_element and FullPrecision::get_delete_element used .expect() which was safe under the old soft-delete model where entries remained physically in the bf_tree. With hard deletes, data is removed from the tree immediately but graph edges are cleaned up lazily via drop_deleted_neighbors. Stale edges can cause get_vector_into to fail for deleted IDs, making .expect() a panic risk in production. Change GetError and DeleteElementError from Panics to ANNError and propagate errors to callers, consistent with QuantAccessor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Jordan, we can do quite a bit of cleanup on the crate structure and item names which I think would really start to tighten up the crate.
One question I had for you is what the expected end state is: are we targeting a full clean up or an incremental, non-published state while we continue working? I ask because the bf-tree provider inherited a bit of cruft from earlier iterations of the code, and I'm wondering how aggressiver we want to be in this PR in particular, considering that we still need integration tests and benchmarks.
|
|
||
| use super::super::common::TestCallCount; | ||
| use super::ConfigError; | ||
| use diskann_providers::model::graph::provider::async_::common::TestCallCount; |
There was a problem hiding this comment.
Maybe either duplicate TestCallCount or drop entirely? Alternatively, we can move this into the Context used by the bf-tree provider and always gather these metrics. It's generally pretty useful to have these things.
There was a problem hiding this comment.
duping for now, while the context would probably be the better solution, it's a lot of plumbing.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| T: VectorRepr, | ||
| Q: AsyncFriendly + QuantDelete, | ||
| { | ||
| fn release( |
There was a problem hiding this comment.
Who is the intended caller of this release function?
There was a problem hiding this comment.
It was originally intended as a mechanism for completely a soft-delete (when all incoming edges where removed). However, we have generally moved away from soft-deletes in the usual sense, so this method can probably be removed.
There was a problem hiding this comment.
can we resolve this and address removing release in a different pr?
harsha-simhadri
left a comment
There was a problem hiding this comment.
Thanks for the PR, Jordan.
Could you wire it through diskann-benchmarks and run it in streaming mode to replicate the recall results in the IP-diskann paper? That would be good E2E validation of the provider. Please all latency and total runtime to analyze performance.
I'll take care of that after I finish addressing Mark's feedback. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Migrate bf_tree provider to standalone
diskann-bftreecrateMotivation
The long-term goal is to remove
diskann-providersentirely. The bf_tree provider was one of several components keeping that crate alive — it depended ondiskann-providersfor generic delete infrastructure (DeletionCheck/AsDeletionCheck/RemoveDeletedIdsAndCopy) and PQ quantization types. This PR extracts bf_tree into a standalonediskann-bftreecrate, cuts those dependencies, simplifies the generics, and switches from PQ to spherical quantization — all to reduce the surface area ofdiskann-providersand move closer to removing it.What Changed
Crate Extraction & Rename
Moved the bf_tree provider from
diskann-providers/src/model/graph/provider/async_/bf_tree/into a standalonediskann-bftreecrate (flat module structure:lib.rs,provider.rs,vectors.rs,quant.rs,neighbors.rs). Updated workspaceCargo.tomland CI configuration.PQ → Spherical Quantization
QuantVectorProvider: StoresPoly<dyn Quantizer>(spherical) instead ofArc<FixedChunkPQTable>.QuantQueryComputer: Newtype wrapper adapting spherical'sOpaque-based API toPreprocessedDistanceFunction<&[u8], f32>.Quantizer::serialize/iface::try_deserialize(flatbuffers format).QueryLayout::FullPrecisionfor better recall with low-bit quantizers.benches/directory).Remove
D(Delete Provider) ParameterBfTreeProvider<T, Q, D>→BfTreeProvider<T, Q>.delete(key)API replace bitmap-based soft deletes.delete()removes from all three trees (neighbors, full_vectors, quant_vectors).delete_bitmap_serde.rs,NoDeletes/TableBasedDeletesfrom constructors.Simplify to Two Strategies
Removed the
Hybridstrategy entirely — onlyFullPrecisionandQuantizedremain.Quantizedsearch usesRerankpost-processor to re-rank candidates using full-precision vectors.Reduce
diskann-providersCouplingFullPrecisionandQuantizedmarker structs moved todiskann::graph::strategy(canonical home), re-exported fromdiskann-providers.NoStoredefined locally indiskann-bftree.TestCallCountduplicated locally (conditional compile: real counter in test, no-op in release).AsKeytrait added locally to replace scatteredbytes_ofcalls.diskann-providersimports:UnwrapErr(distances) and storage traits.ToRankedError HandlingReplaced blanket
ANNErrorwith ranked error types for vector access:VectorError(Deleted|NotFound): transient errors from bf_tree reads.VectorUnavailable: implementsTransientError<ANNError>— acknowledge (skip) or escalate.AccessError=RankedError<VectorUnavailable, ANNError>: type alias used asGetError.on_elements_unordered: skips transient errors, propagates real errors.status_by_internal_id: transient →ElementStatus::Deleted, real → propagate.get_delete_element: escalates (delete target must exist).Rerank::post_process: acknowledges transient (skips candidate), propagates real errors.Use
CopyIdsInstead ofRemoveDeletedIdsAndCopySince bf_tree uses hard deletes, deleted IDs never appear in neighbor lists. Replaced with
glue::CopyIds.Move
DeletionCheckTraits Back todiskann-providersThese traits were temporarily in
diskann/src/graph/glue.rsduring extraction. Now restored todiskann-providerswithpub(crate)visibility.Scope
diskann-providersAPI Back topub(crate)Reverted methods on
TableDeleteProviderAsyncfrompubback topub(crate). Removed unused re-exports.Other Cleanups
new_emptyremoved —newrequires start points upfront.PassThroughaccessor replaced withMap(avoids full refetch during prune).SearchAccessorError→Infallible(search accessor cannot fail).create_test_quantizerinquant.rs).Poly<dyn Quantizer>instead ofFixedChunkPQTable.Dparameter removed,new_emptyremoved,newrequires start points.GetErroris nowAccessError(ranked) instead ofANNError.Net Impact
diskann-bftreeis a self-contained crate with two generic parameters (T,Q)FullPrecisionandQuantized(withRerank)DeletionCheck/AsDeletionCheck/RemoveDeletedIdsAndCopydiskann-providersAPI surface reducedOpen Items
NeighborAccessortrait.Debugimpls indiskann-quantizationfirst.