Skip to content

coverage: add cuda.core test_utils.py tests for DLPack/StridedMemoryView#2181

Open
rluo8 wants to merge 8 commits into
NVIDIA:mainfrom
rluo8:coverage-test-utils
Open

coverage: add cuda.core test_utils.py tests for DLPack/StridedMemoryView#2181
rluo8 wants to merge 8 commits into
NVIDIA:mainfrom
rluo8:coverage-test-utils

Conversation

@rluo8

@rluo8 rluo8 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

Split out from #2168 per review. Adds tests targeting
cuda/core/experimental/_memoryview.pyx:

  • DLPack export round-trip over all NumPy-native dtypes and
    scalar/zero-volume shapes
  • versioned/unversioned capsule + deleter paths
  • from_dlpack: ambiguous-stream and unsupported-device errors, TypeError -> unversioned-import fallback
  • CAI path: device-pointer + stream-ordering view, deprecated init path, has_dlpack=False proxy, device-accessible view, None-dtype repr
  • DLPack C exchange-API helpers

Coverage change:
_memoryview.pyx coverage 65.82% -> 85.73%.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

@leofang leofang requested review from mdboom and seberg June 9, 2026 14:15
@leofang leofang added P1 Medium priority - Should do test Improvements or additions to tests labels Jun 9, 2026
@leofang leofang added this to the cuda.core v1.1.0 milestone Jun 9, 2026

@seberg seberg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd wait for @mdboom to have a brief look. As we discussed yesterday, I am not sure about all tests but I guess the improved coverage is really nice.
FWIW, in these changes I don't really see trivial gains for parametrization.

That said, I left a few comments and especially the TMD tests seem like a lot of effort to add coverage that should already exists (but you need the support to see that in the coverage).
I think it may make sense to add one test (which could use funky arguments) to check that an error is given when it isn't available. But I wouldn't add monkeypatch tests just to get coverage up here, I think.
(We should cover this with real tests not with dummy tests.)

The DLPack tests are pretty complex unittests with ctypes, I guess they are fine (this is unused, if there are bugs mirrored in implementation and tests we won't even notice, though -- which is a bit deceptive; it seemed fine but I'd have to look closer to be confident).
Either way, I feel this file is big enough and those tests complicated enough, that it may be good to split DLPack tests into a new one? (test_utils_dlpack.py or a new utils folder?)

Comment thread cuda_core/tests/test_utils.py Outdated

The real ``_from_tiled`` requires a device-accessible, 16-byte-aligned view
on TMA-capable hardware (sm90+), so we replace the (module-level) class the
method imports with a recorder and assert the assembled call instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but can't we just skip the test if the hardware isn't available? And if it is, I expect we are already testing this?

I feel this test belongs into test_tensormap as a single test that checks that tensormap instantiation fails gracefully when TMA isn't available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review, @seberg ! I'll update based on these comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @seberg . I have updated the test cases. Dropped both monkeypatch tests, and added one real test test_as_tensor_map_host_view_rejected_without_tma to test_tensor_map.py.

assert view.ptr == int(buffer.handle)
assert view.device_id == dev.device_id
assert view.shape == (8,)
dev.default_stream.sync()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this test a bit deceptive since it adds coverage without actually checking stream order correctness. (It does check for device correctness of course).
Now, testing stream-order correctness is annoying (you need either large buffers, or launch many kernels on the first stream and then e.g. a copy-to-host on after export and see that all of them finished)
So maybe it's OK, but the bot oversells the test by a lot.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I've updated the docstring to be honest about what the test does and doesn't check.

Comment thread cuda_core/tests/test_utils.py Outdated
"""The deprecated ``StridedMemoryView(obj)`` constructor routes a CAI-only
object through the CAI branch (warn + ``view_as_cai``), not the DLPack one."""
obj = _make_cuda_array_interface_obj(shape=(4,), strides=None, typestr="<f4", data=(0, False))
with pytest.deprecated_call(match="deprecated"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with pytest.deprecated_call(match="deprecated"):
with pytest.deprecated_call(match="CUDA-array-interface-supporting"):

if we bother to match a string, try to make it unique.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment thread cuda_core/tests/test_utils.py Outdated
#
# Drive the C function pointers exposed by the capsule the way a native
# consumer would, exercising the StridedMemoryView exchange-API implementation.
# ---------------------------------------------------------------------------

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this, but I am starting to think this should be in it's own test_dlpack file, there is a lot of complexity here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved all the DLPack/exchange-API tests into a new file test_utils_dlpack.py.

Comment thread cuda_core/tests/test_utils.py Outdated
def _exchange_api_cause(exc):
"""Underlying exception raised by the noexcept C fn (surfaced by ctypes as
SystemError, with the real error chained as __cause__ or __context__)."""
return exc.value.__cause__ or exc.value.__context__

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unpacking the SystemError is a bad workaround for not supplying an errcheck function to the ctypes definition.
We shouldn't test for SystemError (unless we set them ourselves, which we don't).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I dropped the SystemError unpacking. The exchange-API function pointers are now declared with ctypes.PYFUNCTYPE instead of CFUNCTYPE, so ctypes surfaces the Python exception set by the Cython function directly. The tests now assert the intended TypeError/RuntimeError/NotImplementedError instead of inspecting SystemError.cause/context.

Comment thread cuda_core/tests/test_utils.py Outdated
api = _get_exchange_api()
out = ctypes.c_void_p(123)
with pytest.raises(SystemError) as exc:
api.managed_tensor_allocator(None, ctypes.byref(out), None, None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
api.managed_tensor_allocator(None, ctypes.byref(out), None, None)
# Currently sets a Python error when `SetErr` isn't passed
api.managed_tensor_allocator(None, ctypes.byref(out), None, None)

This is OK, honestly, I am not sure if there is a point in even forcing this function to be non-NULL on the dlpack side if we just always make an error, but whatever.

I just prefer mentioning in the test that we know it's a weird test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the test with suggested comments.

Comment thread cuda_core/tests/test_utils.py Outdated
@@ -1094,3 +1094,425 @@ def test_strided_memory_view_dtype_roundtrip_all(dtype):
pytest.skip(f"NumPy does not export {np.dtype(dtype)} via DLPack: {e}")
view = StridedMemoryView.from_dlpack(src, stream_ptr=-1)
assert view.dtype == np.dtype(dtype) # .dtype triggers dtype_dlpack_to_numpy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well delete this test and move the comment about bfloat16 to _NUMPY_NATIVE_DLPACK_DTYPES your new test seems basically identical but expands it a bit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Deleted the test, and moved the bfloat16 comment to _NUMPY_NATIVE_DLPACK_DTYPES.

Comment thread cuda_core/tests/test_utils.py Outdated
np.complex64,
np.complex128,
np.bool_,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
)
)
if ml_dtypes is not None:
# Supported on NumPy 2.5 and ml_dtypes (probably) 0.5.5+
_NUMPY_NATIVE_DLPACK_DTYPES += (ml_dtypes.bfloat16,)

Although then the "native" is slightly off :). But we have the try/except that should make this skipped and NumPy 2.5 + next ml_dtypes version this should actually work.
(In fact, we can probably add more dtypes if we want to.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I added ml_dtypes.bfloat16 to _NUMPY_NATIVE_DLPACK_DTYPES when ml_dtypes is available. And unsupported NumPy/ml_dtypes combinations are still skipped by the per-test DLPack probe.

Comment thread cuda_core/tests/test_utils.py Outdated
TypeError fallback in ``view_as_dlpack`` and an *unversioned* capsule import."""

def __init__(self, arr):
self._arr = arr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self._arr = arr
self._arr = StridedMemoryView.from_any_interface(arr)

I would suggest to use a StridedMemoryView here rather than NumPy. Should do the same thing, but means we don't care if NumPy changes the behavior (which eventually it may) and stops exporting version 0.x (unversioned).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Used a StridedMemoryView instead of NumPy, and passed stream_ptr=-1.

@rwgk

rwgk commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Drive-by suggestion: @rluo8 could you please try adopting #2232 here, then ask your agent to add the tags? It'll be interesting to see if it works as intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module P1 Medium priority - Should do test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants