PoC: Python UDFs (for IVF Flat) via numba-cuda-mlir on top of JIT/LTO#2133
Draft
dantegd wants to merge 3 commits into
Draft
PoC: Python UDFs (for IVF Flat) via numba-cuda-mlir on top of JIT/LTO#2133dantegd wants to merge 3 commits into
dantegd wants to merge 3 commits into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a personal PoC exploring the idea of using the recently released
numba-cuda-mliras a Python frontend on top of the existing cuVS JIT/LTO infrastructure to add python UDF capabilities to cuVS.The goal is to validate the end-to-end shape rather than propose a finalized public API: define an IVF Flat metric in Python, lower it to LTO-IR, package it as a cuVS device UDF artifact, pass it through Python/C/C++, and link it into the IVF Flat JIT/LTO search path.
This also includes an "expert" CUDA/C++ source-string path via
ivf_flat.cuda_source_metric(...), which helps compare the new Python/LTO-IR flow against the existing JIT/LTO UDF mechanism.Yep, that’s a cleaner framing. Replace the separate Example API and Validation sections with this combined Example section:
Example
This PoC includes an end-to-end demo at:
Python metric UDF with a CuPy capture
Output:
Expert CUDA/C++ source-string metric
Output:
Design Notes / ABI Framing
The Python UDF API in this PoC should be read as a coordinate-wise accumulator ABI, not as a fully general metric API.
The current supported shape is:
This maps cleanly onto the existing IVF Flat scan kernel and the existing C++ UDF model, where the kernel already accumulates a distance one coordinate update at a time. In the previous C++ source UDF path, the practical shape was based on
x,y, andacc; this PoC addsctxso Python UDFs can access limited contextual state such asctx.dimand one captured CUDA array.This is intentionally not “the one true UDF API forever.” It is useful for metrics and transforms that can be expressed as independent coordinate updates plus an accumulator, including L2-like metrics, weighted L2, simple per-dimension transforms, and other custom distances that fit the existing ANN fine-scoring loop.
More general metrics should get a separate ABI/version rather than overloading this coordinate-wise ABI until it becomes confusing. A future block/vector-level ABI might look more like:
or use a block-level/kernel-adapter shape. That would be needed for metrics that require custom reductions, multiple passes, normalization over the whole vector, shared memory, synchronization, control flow across dimensions, or richer per-query/per-vector state.
That future API would require real kernel support for how user code participates in loading, reducing, shared memory, synchronization, and output selection. This PR does not imply plug-and-play block-level primitives yet.
What Changed
cuvsDeviceUDFdescriptors and conversion helpers.ivf_flat.metric(...)support that compiles Python metric update functions to LTO-IR usingnumba-cuda-mlir.ivf_flat.cuda_source_metric(...)for expert CUDA/C++ source-string metrics.SearchParamsin Python/C/C++ to accept an optional metric UDF artifact.compute_distcall path.examples/experimental/ivf_flat_udf_e2e_demo.py.Current PoC Scope
rapids.cuvs.ivf_flat.metric.v1.order="min"andinitial=0.0.float32CUDA array.ctx.<capture_name>[ctx.dim].numba-cuda-mlirfor the Python-to-LTO-IR path.Potential Future Directions
order,initial, and coarse-routing combinations.float32capture.Test Coverage
This PR adds coverage for:
numba-cuda-mlirbackend compilation to LTO-IR.