perf(brute_force): skip csr_to_coo on inner-product filtered search by maxwbuckley · Pull Request #2128 · rapidsai/cuvs

maxwbuckley · 2026-05-26T15:52:57Z

Summary

Small perf cleanup for the filtered brute-force CSR path in knn_brute_force.cuh.

When sparsity ≥ 0.9, filtered brute-force takes the sparse CSR / masked_matmul path. As part of setup it builds a per-nonzero rows array via raft::sparse::convert::csr_to_coo.
That rows array is only consumed by cuvs::neighbors::detail::epilogue_on_csr, which only runs for L2Expanded / L2SqrtExpanded / CosineExpanded (it combines masked inner products with precomputed norms).
For InnerProduct, the epilogue is skipped — so the allocation and the csr_to_coo kernel launch were dead work on every IP-metric filtered search hitting the CSR path.

This PR hoists the rmm::device_uvector<IdxT> rows(...) allocation and the csr_to_coo call inside the L2/Cosine branch, next to their only consumer.

What changes

No behavior change for L2Expanded, L2SqrtExpanded, CosineExpanded — same kernels, same order.
For InnerProduct filtered searches that hit the sparse CSR branch: saves one device allocation of nnz * sizeof(IdxT) and one kernel launch + write pass per call.

Test plan

Existing filtered brute-force tests pass (BRUTE_FORCE_TEST, prefiltered variants) for InnerProduct, L2Expanded, L2SqrtExpanded, CosineExpanded.
No diff in output for L2 / Cosine vs. base.

🤖 Generated with Claude Code

…ered search The CSR-path of filtered brute-force builds a per-nonzero `rows` array via `raft::sparse::convert::csr_to_coo`. That array is only consumed by `epilogue_on_csr`, which itself only runs for L2 / L2Sqrt / Cosine to combine the masked inner products with the precomputed norms. For InnerProduct the epilogue is skipped, so the allocation and kernel launch were dead work on every IP-metric filtered search. Move the `rmm::device_uvector<IdxT> rows(...)` allocation and the `csr_to_coo` call inside the L2/Cosine branch alongside their only consumer. No behavior change for L2/L2Sqrt/Cosine. For InnerProduct filtered searches that hit the sparse CSR path (sparsity >= 0.9) this saves one device allocation of `nnz * sizeof(IdxT)` and one kernel launch + write pass per call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Max Buckley <maxwbuckley@gmail.com>

copy-pr-bot · 2026-05-26T15:53:02Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

maxwbuckley · 2026-05-26T23:37:38Z

Local test results

Built and ran the brute-force test suites against this PR cherry-picked onto upstream/main (commit 4addf4e). Built with CUDA 13.2, SM 89, RTX 5090.

NEIGHBORS_TEST (contains brute_force_prefiltered.cu — the suite that exercises the modified brute_force_search_filtered code path):

Suite	Cases	Result
`PrefilteredBruteForceOnBitmapTest_float_int64`	39	✅ pass
`PrefilteredBruteForceOnBitmapTest_half_int64`	39	✅ pass
`PrefilteredBruteForceOnBitsetTest_float_int64`	39	✅ pass
`PrefilteredBruteForceOnBitsetTest_half_int64`	39	✅ pass
`KNNTest` + `RandomBruteForceKNNTest` + `SparseKNNTest` + `RefineTest` + `NNTest` (rest of the binary)	157	✅ pass
NEIGHBORS_TEST total	313/313	✅

NEIGHBORS_ANN_BRUTE_FORCE_TEST (dense ANN brute-force smoke):

Suite	Cases	Result
`AnnBruteForceTest_float`	39	✅ pass
`AnnBruteForceTest_half_float`	39	✅ pass
total	78/78	✅

The 156 PrefilteredBruteForce* cases cover the full (metric × bitmap-or-bitset × float-or-half × sparsity) matrix, including:

All four metrics that this PR's if guard discriminates on: InnerProduct (the path that now skips the csr_to_coo + rows allocation), L2Expanded, L2SqrtExpanded, CosineExpanded (paths that still hit epilogue_on_csr with a now-locally-allocated rows buffer).
Sparsities at 0.01, 0.1, 0.2, 0.4, 0.5 — i.e. both the dense tiled_brute_force_knn branch (sparsity < sparse_filter_threshold = 0.9) and the sparse CSR / masked_matmul branch (sparsity ≥ 0.9). The PR only touches the sparse branch.

Total: 391 / 391 passed, 0 failures. ~17 min wall on the prefilter half.

coderabbitai · 2026-05-26T23:40:33Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: dbb26125-9d4f-426e-b767-cbd477fa9ad7

📥 Commits

Reviewing files that changed from the base of the PR and between 4addf4e and 23059ea.

📒 Files selected for processing (1)

cpp/src/neighbors/detail/knn_brute_force.cuh

📝 Walkthrough

Summary by CodeRabbit

Refactor
- Optimized the filtered brute-force search path to defer CSR to COO conversion to the post-processing step, improving efficiency in the epilogue handling for L2 and Cosine distance metrics.

Walkthrough

The PR optimizes the filtered brute-force search path by deferring the construction of the COO rows buffer. The buffer is now built only when needed—just before the CSR epilogue call—rather than unconditionally earlier. dataset_view construction has been reordered to occur before the deferred rows buffer computation.

Changes

Defer COO rows buffer construction in brute-force filtered search

Layer / File(s)	Summary
Defer COO rows buffer construction to CSR epilogue `cpp/src/neighbors/detail/knn_brute_force.cuh`	The `rows` buffer construction from the early section is removed, and is now built only at the point just before the CSR epilogue call with a comment indicating it is only needed for L2/Cosine epilogue, while `dataset_view` construction is moved earlier.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

cjnolet

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and accurately describes the main optimization: skipping unnecessary csr_to_coo conversion for inner-product filtered searches on the sparse CSR path.
Description check	✅ Passed	The description is well-related to the changeset, clearly explaining why the change is made, what it affects, and what the expected outcomes are.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-project-automation Bot added this to Unstructured Data Processing May 26, 2026

aamijar assigned maxwbuckley May 26, 2026

aamijar added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels May 26, 2026

aamijar moved this to In Progress in Unstructured Data Processing May 26, 2026

maxwbuckley mentioned this pull request May 26, 2026

[FEA] Roaring Bitmap support #1972

Open

maxwbuckley marked this pull request as ready for review May 26, 2026 23:37

maxwbuckley requested a review from a team as a code owner May 26, 2026 23:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(brute_force): skip csr_to_coo on inner-product filtered search#2128

perf(brute_force): skip csr_to_coo on inner-product filtered search#2128
maxwbuckley wants to merge 1 commit into
rapidsai:mainfrom
maxwbuckley:perf/brute-force-skip-csr-to-coo-on-ip

maxwbuckley commented May 26, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

maxwbuckley commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maxwbuckley commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

Test plan

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

maxwbuckley commented May 26, 2026

Local test results

Uh oh!

coderabbitai Bot commented May 26, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maxwbuckley commented May 26, 2026 •

edited

Loading