cuda.core: add CUDAArray, TextureObject, SurfaceObject, MipmappedArray (texture/surface API, #467) by rparolin · Pull Request #2095 · NVIDIA/cuda-python

rparolin · 2026-05-15T19:01:32Z

Summary

Implements #467 (TextureObject / SurfaceObject) plus the supporting texture/surface stack on cuda.core. The API shape and naming were settled in the design issue #2188.

CUDAArray + ArrayFormat — opaque, hardware-laid-out GPU allocations backing textures/surfaces (renamed from Array per Design: cuda.core Texture/Surface API surface #2188 decision 1).
MipmappedArray — wraps CUmipmappedArray; get_level returns a non-owning CUDAArray level view whose parent is kept alive structurally.
TextureObject + TextureDescriptor — bindless texture handle with the full sampling-state surface (filter/read/address modes, border color, mipmap clamps, anisotropy, sRGB, seamless cubemap).
SurfaceObject — bindless surface handle for kernel-side typed load/store; requires CUDAArray(is_surface_load_store=True) (renamed per Design: cuda.core Texture/Surface API surface #2188 decision 6).
ResourceDescriptor — four factories (from_array, from_mipmapped_array, from_linear, from_pitch2d) covering all texture-eligible arms of CUDA_RESOURCE_DESC.

All of these are exposed under the cuda.core.textures namespace (not the flat cuda.core namespace) — see Review feedback below.

Design decisions reflected (#2188)

#	Decision	Status
1	`Array` → `CUDAArray`; keep `ArrayFormat`	✅
2	Copy-only interop (`copy_from`/`copy_to`), documented; no allocation helper	✅
3	All four `ResourceDescriptor` factories	✅
4	Folded `format` + `num_channels`	✅
5	Keep the descriptor type split (`@dataclass` vs `from_*` factories)	✅
6	`is_surface_load_store` property and constructor keyword	✅
7	Split the `gl_interop_*` examples into a follow-up PR	⏳ see Remaining

Review feedback addressed (#2188)

Copy API parity with Buffer (@Andy-Jost): CUDAArray.copy_from / copy_to now accept Stream | GraphBuilder (via Stream_accept), so array copies can be captured into a CUDA graph; and copy_to returns dst.
Module grouping (@Andy-Jost): the texture/surface types are grouped under cuda.core.textures (grouped-only — no flat cuda.core exports; pre-release, so no deprecation needed).

Resource lifetime

All four types route CUDA resource lifetime through cuda.core's C++ std::shared_ptr resource-handle layer (_cpp/resource_handles.{hpp,cpp}): each owning handle's deleter calls the matching cu*Destroy with the GIL released, and inter-resource dependencies (mipmap level → parent, texture/surface → backing) are embedded structurally in the box. No raw handles, _owning flags, Python-reference keepalives, or __dealloc__ — consistent with the rest of cuda.core (_cpp/DESIGN.md).

What's included

Public API under cuda.core.textures; docs in docs/source/api.rst (Textures and surfaces), including the copy-only interop contract.
Examples kept here as representative end-to-end coverage:
- examples/texture_sample.py — allocate a 2D CUDAArray, bind a LINEAR/CLAMP TextureObject, verify POINT-exact + bilinear sampling from a kernel.
- examples/gl_interop_mipmap_lod.py — MipmappedArray pyramid built per-level via SurfaceObject, sampled through a TextureObject with trilinear (LINEAR mipmap) filtering and LOD bias (covers the MipmappedArray / get_level headline path).
- examples/gl_interop_fluid.py (cuda.core CUDAArray/TextureObject/SurfaceObject) and examples/gl_interop_fluid_numba_cuda_mlir.py (numba-cuda-mlir port, in its own pixi env) — Stable-Fluids.
The remaining gl_interop_* showcase is split out (see Remaining).
65 texture/surface unit tests: happy paths, negative-path validation for every raise site, boundary cases, structural-lifetime (level outlives dropped parent), idempotent close(), copy_to-returns-dst, and graph-capture of array copies.

Test plan

texture/surface suite — 65 passed.
pixi run full cuda_core suite — 3451 passed, 214 skipped, 3 xfailed.
Cython ABI test (tests/cython/) — passed.
All examples/ byte-compile; the numba-cuda-mlir port runs end-to-end.
CI green on NVIDIA runners.

Remaining (follow-up PRs)

The other 14 standalone gl_interop_*.py demos have been moved to the feature/cuda-core-texture-examples-467 branch for continued iteration and will land in a separate examples PR (Design: cuda.core Texture/Surface API surface #2188 decision 7).
Layered / cubemap / sparse CUDAArray variants (currently NotImplementedError-deferred).
Descriptor round-trip introspection (cuTexObjectGetResourceDesc, etc.).

🤖 Generated with Claude Code

Introduce a Pythonic wrapper around CUarray as a prerequisite for TextureObject / SurfaceObject support. This initial slice covers plain 1D/2D/3D allocations via cuArrayCreate / cuArray3DCreate, with an opt-in surface_load_store flag for binding as a SurfaceObject. Layered, cubemap, sparse, and texture-gather variants are intentionally deferred. _from_handle is provided for graphics-interop borrowing and queries shape, format, and channel count from the driver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Full-array async copies between an Array and either a Buffer or any buffer-protocol host object (numpy, bytes, bytearray, array.array). Implemented as a single cuMemcpy3DAsync path so 1D/2D/3D arrays share one code path. Also exposes a size_bytes property used to size matching host or device buffers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…IA#467) Wraps cuTexObjectCreate with a Pythonic descriptor pair: - ResourceDescriptor.from_array(array) is the only resource kind supported in this initial slice; from_linear and from_pitch2d will follow once Buffer carries format/channel metadata. - TextureDescriptor mirrors CUDA_TEXTURE_DESC: per-axis AddressMode, FilterMode, ReadMode, normalized coords, sRGB, border color, mipmap params, anisotropy. - TextureObject holds a strong ref to the ResourceDescriptor (and transitively the backing Array) for the lifetime of the handle to prevent dangling-pointer kernel launches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Completes the second half of NVIDIA#467 alongside the existing TextureObject: - SurfaceObject wraps cuSurfObjectCreate / cuSurfObjectDestroy. Unlike a texture it has no sampling state (no filter mode, no addressing, no normalization); kernels read and write through it with integer pixel coordinates. - Track CUDA_ARRAY3D_SURFACE_LDST on Array as a new surface_load_store property, populated in both Array.from_descriptor and Array._from_handle. SurfaceObject.from_array validates this upfront rather than letting the driver surface CUDA_ERROR_INVALID_VALUE late. - Add a convenience SurfaceObject.from_array shortcut next to from_descriptor so the common case skips building a ResourceDescriptor by hand. Covered by tests/test_texture_surface.py (14 tests: array shape/format/ flag plumbing, texture + surface creation, surface_load_store validation, unsupported-resource-kind guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Widens the texture-resource surface to cover the two Buffer-backed variants from CUDA_RESOURCE_DESC: - ResourceDescriptor.from_linear(buffer, *, format, num_channels, size_bytes=None) wraps a Buffer as a typed 1D fetch. Defaults size_bytes to buffer.size; validates against it. - ResourceDescriptor.from_pitch2d(buffer, *, format, num_channels, width, height, pitch_bytes) wraps a Buffer as a row-pitched 2D image. Validates pitch_bytes >= width * element_size and pitch_bytes * height <= buffer.size; the driver enforces its own CU_DEVICE_ATTRIBUTE_TEXTURE_PITCH_ALIGNMENT on top. - TextureObject.from_descriptor handles the three resType branches (ARRAY, LINEAR, PITCH2D); SurfaceObject continues to require an array-backed resource. - ResourceDescriptor gains format/num_channels read-only properties (None for array-backed) and a kind-aware __repr__. Tests: 9 new (linear/pitch2D creation, validation paths, surface rejection of non-array resources) on top of the existing 14. Full test-core suite green (3287 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wire the newly public Array, ArrayFormat, TextureObject, SurfaceObject, ResourceDescriptor, TextureDescriptor, AddressMode, FilterMode, and ReadMode symbols into the cuda.core Sphinx reference under a new "Textures and surfaces" section in api.rst. No source docstring changes; documentation is rendered via the existing autosummary templates and the enum_documenter extension. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…VIDIA#467) Introduces a MipmappedArray cdef class wrapping CUmipmappedArray with the same lifetime model as Array (close/__dealloc__/context-manager). Levels are obtained via get_level(L), which returns a non-owning Array that holds a strong ref back to the parent MipmappedArray via a new Array._parent_ref slot, ensuring level views cannot outlive the underlying storage. Surfaces continue to require a single-Array backing; the existing kind != "array" check in SurfaceObject.from_descriptor naturally rejects mipmapped resources (covered by a new test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

End-to-end example that builds a 2D Array with a known pattern, binds it as a bindless TextureObject with LINEAR/CLAMP/non-normalized sampling, and launches a kernel that samples both texel-center and half-integer coordinates. Verifies POINT-exact returns at texel centers and analytical bilinear blends at half-pixel positions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Safety and correctness: - Validate buffer sizes against array extent in Array.copy_from/copy_to; undersized host or device Buffer inputs were previously silent stomps via cuMemcpy3DAsync. Both branches now raise ValueError before issuing the copy. - Zero the underlying handle BEFORE calling cuXxxDestroy in close() for Array, MipmappedArray, TextureObject, SurfaceObject. Prevents a double-destroy via __dealloc__ if the driver call raises. - ResourceDescriptor.from_linear: require size_bytes >= element_size and size_bytes % element_size == 0; previously accepted zero and arbitrary non-multiples. - Reject bool in num_channels across Array, MipmappedArray, and the two Buffer-backed ResourceDescriptor factories (True was silently treated as 1 channel). API polish: - Rename TextureObject.from_descriptor params resource_desc/texture_desc to resource/texture_descriptor so they match the .resource and .texture_descriptor properties; same rename in SurfaceObject. Both factories are now keyword-only, consistent with Array.from_descriptor and MipmappedArray.from_descriptor. - Add four ResourceDescriptor properties (size_bytes, width, height, pitch_bytes) so values shown in __repr__ are reachable programmatically. - Add MipmappedArray to docs/source/api.rst (was exported but unlinked). - Align error message style across new files: type(x).__name__ instead of type(x); include got <type> in three previously-bare TypeErrors in TextureObject.from_descriptor. Refactor: - Extract _get_current_context_ptr and _get_current_device_id to cuda_utils.{pxd,pyx} and share across all four new files (was duplicated four times). Generic error message keeps the helper reusable for the 9+ remaining duplicate sites in cuda.core. - Hoist the buffer-protocol path in _fill_linear_endpoint into a new _fill_host_endpoint helper. Original function becomes a thin Buffer-vs-host router. - Type Array._format and MipmappedArray._format as cydriver.CUarray_format instead of int (was a comment-typed int; now C-level type-checked). - Drop unused `field` import from _texture.pyx. Tests (+28, total 62 in this file): - Undersized host/device buffer rejection in Array.copy_from/copy_to. - ResourceDescriptor.from_linear rejects size_bytes=0 and non-multiples. - _normalize_address_modes unit tests now make explicit assertions instead of only smoke-testing TextureObject creation. - Negative-path coverage for Array.from_descriptor (bad format, non- iterable shape, zero dim), MipmappedArray.from_descriptor, all TextureObject.from_descriptor validation branches (filter_mode, read_mode, mipmap_filter_mode, max_anisotropy, border_color length), address-mode normalization (scalar non-AddressMode, empty/4-entry tuples, mixed-type entries), ResourceDescriptor.from_pitch2d, and copy_from/copy_to non-Stream rejection. - TextureObject and SurfaceObject keepalive lifetime tests verifying the _source_ref chain holds after gc.collect() (mirrors the existing MipmappedArray level keepalive test). - copy_from must not mutate the source buffer (round-trip test now also asserts list(src) is unchanged). Example: - texture_sample.py uses `with` blocks for Array and TextureObject so the user-facing demo shows the idiomatic context-manager pattern rather than manual try/finally. Full cuda_core suite: 3326 passed, 199 skipped, 2 xfailed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

copy-pr-bot · 2026-05-15T19:01:36Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

leofang

We should have a design review as both @mdboom and @Andy-Jost suggested in the past meetings. At least, the API surface should be sketched in the issue before a PR is fired up. Having major feature PRs vibed without design discussion makes it very easy to merge regrettable changes.

Array is in particular a terrible name.

These graphical examples demonstrate the new Array, TextureObject, SurfaceObject, MipmappedArray, and ResourceDescriptor APIs in increasing order of complexity. All use the existing GraphicsResource + GL PBO pattern for display (matching gl_interop_plasma.py); CI is gated on has_display so headless runners skip them. Minimum-API examples: - gl_interop_image_show.py Hello-world for the stack: 64x64 Array, TextureObject, key F toggles POINT/LINEAR. Read this file first. - gl_interop_texture_filter.py POINT vs LINEAR side-by-side on one Array with two TextureObjects; mouse pan/zoom, key M cycles AddressMode. Simulation examples (Array + SurfaceObject + TextureObject ping-pong): - gl_interop_reaction_diffusion.py Gray-Scott with FLOAT32 x 2 channels; LINEAR + WRAP for toroidal diffusion. - gl_interop_lenia.py Continuous-state CA with bell-curve convolution; FLOAT32 x 1 channel. - gl_interop_fire.py Canonical Doom fire (37-color indexed palette, UINT8 intensity 0..36, gather equivalent of the original scatter algorithm); exercises ArrayFormat.UINT8. - gl_interop_ocean.py Animated Gerstner-wave ocean with normal mapping via finite-difference texture reads and Phong + Fresnel shading. Visualization examples: - gl_interop_mandelbrot.py Real-time deep-zoom using a 1D Array as a color LUT (TextureObject for palette lookup, not simulation). - gl_interop_mipmap_lod.py Procedural mipmap pyramid built with a SurfaceObject per level; trilinear sampling via tex2DLod and TextureDescriptor mipmap fields. - gl_interop_sdf_volume.py 3D ray-marched gyroid via a 128^3 Array, surf3Dwrite for bake, tex3D for trilinear SDF sampling. Only example exercising the 3D side of the API. Every public symbol added in this PR is exercised by at least one of these examples. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rparolin · 2026-06-09T23:07:00Z

Opened #2188 to host the design discussion (API surface + Array naming + interop contract) ahead of merging this, per review feedback. Let's settle the design there; I'll split the gl_interop_* examples into a separate PR. xref #467.

…ad_store (NVIDIA#467) Applies design decisions resolved in NVIDIA#2188: - #1: rename public `Array` class to `CUDAArray` (PEP 8 CapWords; aligns with CuPy's `cupy.cuda.texture.CUDAarray`). `ArrayFormat` left unchanged (open detail). - NVIDIA#6: rename the bool property `surface_load_store` -> `is_surface_load_store` to follow the repo's `is_<x>` convention. Constructor keyword `surface_load_store=` kept as-is (open detail). Private field `_surface_load_store` unchanged. GL interop examples retained (decision NVIDIA#7 reversed) and updated to the new names. Verified: cuda.core builds in the cu12 env and the renamed public API imports (`CUDAArray`, `is_surface_load_store` present; old `Array`/`surface_load_store` gone). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # cuda_core/tests/example_tests/test_basic_examples.py

Post-merge cleanup so pre-commit.ci passes on the texture/surface stack: - Switch ReadMode/AddressMode/FilterMode/ArrayFormat to `from enum import IntEnum` so stubgen-pyx preserves the IntEnum base in the generated stubs (qualified `enum.IntEnum` was dropped, making members infer as `int` and failing mypy assignment checks). - Annotate TextureDescriptor.border_color as `tuple[float, ...] | None` (disallow_any_generics flagged the bare `tuple`). - Prefix unused pyglet event-handler args with `_` (ARG001) and lowercase in-function locals (N806) across the GL interop examples; drop dead pre-try buffer inits in texture_sample.main (F841). - Commit the auto-generated .pyi stubs for the new _array/_texture/_surface/ _mipmapped_array modules. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…store Closes the open detail on design issue NVIDIA#2188 item NVIDIA#6: the read-back property is already `is_surface_load_store`, so rename the `from_descriptor` keyword on both `CUDAArray` and `MipmappedArray` to match, giving one symmetric name for set and read-back (following the existing `StridedMemoryView(is_readonly=...)` precedent). Updates call sites in tests and GL examples, the SurfaceObject error message + docstrings, and regenerates the .pyi stubs. The unrelated GraphicsResource `"surface_load_store"` register-flag string is left untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rparolin · 2026-06-10T22:41:30Z

`cuda.core` texture / surface examples

Stable Fluids (gl_interop_fluid.py) — a Jos Stam fluid solver whose velocity and dye fields are CUDAArrays bound both as a TextureObject (hardware-filtered reads) and a SurfaceObject (writes); semi-Lagrangian advection is a single tex2D linear fetch.

Mipmap / LOD (gl_interop_mipmap_lod.py) — a MipmappedArray pyramid built per level via SurfaceObject writes, then sampled through a TextureObject with trilinear (LINEAR-mipmap) filtering and an adjustable LOD bias.

The two shots below were part of a larger set of texture/surface showcase examples that may be integrated in a later PR:

Caustics (gl_interop_caustics.py) — animated water caustics; TextureObject sampling with MIRROR addressing, NORMALIZED_FLOAT reads, sRGB, and anisotropic filtering.

Clouds (gl_interop_clouds.py) — volumetric clouds raymarched against a 3D CUDAArray sampled as a volume TextureObject with WRAP addressing.

Add seven new cuda.core GL-interop examples exercising the new CUDAArray / MipmappedArray / SurfaceObject / TextureObject / GraphicsResource APIs, each centered on a distinct feature and verified on-GPU: - gl_interop_fluid.py Stable-Fluids ink: LINEAR advection, float4 dye, frame-rate-independent stepping - gl_interop_physarum.py slime-mold: Buffer agents, surface deposit / texture sense, direction-hued veins - gl_interop_clouds.py 3D CUDAArray + tex3D trilinear volumetric raymarch with HG forward-scattering - gl_interop_particles.py VBO interop (from_gl_buffer) + baked curl-noise TextureObject, additive points - gl_interop_bloom.py MipmappedArray get_level + per-level surface downsample + tex2DLod composite, live LOD - gl_interop_jfa_voronoi.py POINT-filtered JFA, AddressMode.BORDER + border_color sentinel - gl_interop_caustics.py UINT8 background sampled LINEAR + MIRROR + sRGB + max_anisotropy, chromatic dispersion Each example documents which cuda.core APIs it uses via a code->API comment map and a live config string in the window caption. All seven are registered (display-gated) in test_basic_examples.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…de type - Extract shared _validate_format_channels / _validate_array_shape helpers in _array.pyx; adopt them in CUDAArray, MipmappedArray, and the texture from_linear/from_pitch2d factories (removes 4x num_channels, 4x format, and 2x shape duplicate validators). - ResourceDescriptor docstring now lists from_mipmapped_array (was 3 of 4 factories). - TextureDescriptor.address_mode annotated AddressMode | tuple[AddressMode, ...] instead of object (CLAUDE.md: avoid Any). - Regenerate .pyi stubs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The _context field (raw intptr_t) was captured in CUDAArray, MipmappedArray, SurfaceObject, and TextureObject constructors but never read. It could not be safely used to gate destruction (it is not a refcounted context-handle ref like Stream._h_context), so it was pure dead state plus an extra cuCtxGetCurrent per construction. - Drop the _context slot from all four .pxd files (and the now-unused intptr_t cimports). - Drop the self._context assignments and the _get_current_context_ptr cimport from all four .pyx files. - Remove the now-orphaned _get_current_context_ptr helper from cuda_utils (.pyx + .pxd); _tensor_map.pyx keeps its own local copy and is unaffected. _get_current_device_id stays (still used for the .device property). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add examples/gl_interop_fluid_numba_cuda_mlir.py, a numba-cuda port of gl_interop_fluid.py. numba has no texture/surface support, so the same Stable Fluids solver runs on plain linear device arrays with a hand-written bilinear sampler; the physics and pipeline are unchanged from the CUDA C++ version, only the memory model and read path differ. It targets the MLIR numba-cuda backend (numba-cuda-mlir), which tracks the current cuda.bindings/cuda.core API. Classic numba-cuda lags the editable dev tip (cuda.core.graph relocation per NVIDIA#1858, property-style kernel attributes), so it cannot launch kernels against it. Declare numba-cuda-mlir as a linux-gated pypi dependency of the test feature so `pixi run python examples/...` works in the default env. Upstream ships only manylinux cp311-cp314 wheels (no win-64), hence the linux target gating. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Apply ruff-format and suppress a false-positive RUF046 on the int(math.floor(...)) index conversion: stock numba.cuda's math.floor returns a float (unlike CPython), so the int() cast is required for array indexing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…exture-surface-467

…es in default) numba-cuda-mlir's pypi dependency closure pulls *published* cuda-core / cuda-bindings wheels, which overwrite the editable dev path builds in site-packages and strip the texture/surface modules. That broke every gl_interop_* texture example (e.g. gl_interop_caustics.py) in the default environment. Remove numba-cuda-mlir from the test/default feature and give it a dedicated, isolated `numba-mlir` environment that intentionally uses published cuda-core (not the local-deps path builds). Its [cu13] extra makes the env self-contained (nvvm/cudart/nvrtc from nvidia pypi wheels), so no conda CUDA toolkit or CUDA_HOME wiring is needed. default: dev path cuda.core -> texture/surface examples work numba-mlir: published cuda.core -> gl_interop_fluid_numba_cuda_mlir.py works via `pixi run -e numba-mlir python examples/...` Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…le layer The four texture/surface types (CUDAArray, MipmappedArray, TextureObject, SurfaceObject) managed their CUDA resources with raw driver handles, ad-hoc _owning flags, Python-reference keepalives (_parent_ref / _source_ref), and __dealloc__ methods that re-called cu*Destroy and swallowed errors. That diverged from the rest of cuda.core, where every resource-owning type routes lifetime through the C++ std::shared_ptr handle layer (see _cpp/DESIGN.md), re-introducing the GC-ordering, interpreter-shutdown, and double-free hazards that layer exists to prevent. Bring all four into the handle architecture: - _cpp/resource_handles.{hpp,cpp}: add ArrayHandle, MipmappedArrayHandle, TexObjectHandle, SurfObjectHandle. CUtexObject/CUsurfObject are both `unsigned long long` (as is CUdeviceptr), so the latter two are TaggedHandle- wrapped to keep each handle a distinct C++ type (mirrors NVVM/nvJitLink). Boxes embed structural dependencies (mipmap-level -> parent mipmap, texture -> type-erased backing, surface -> backing array); deleters call the matching cu*Destroy with the GIL released; create_* return an empty handle + thread-local error on failure. Add as_cu/as_intptr/as_py overloads and 9 driver function pointers. - _resource_handles.{pxd,pyx}: declare the new aliases, create_* functions, and accessors; populate the new driver pointers from cydriver.__pyx_capi__. - _array/_mipmapped_array/_texture/_surface (.pyx+.pxd): store a *Handle; close() is now self._handle.reset(); all four __dealloc__ removed; _owning and _parent_ref dropped (ownership and the parent dependency live in the box); _source_ref/_texture_desc retained for introspection only. Creation goes through create_*_handle + HANDLE_RETURN(get_last_error()). - tests: rewrite the mip-level keepalive test to assert behavior (a level CUDAArray survives a round-trip copy after its parent MipmappedArray is dropped and GC'd, proving the structural keepalive) and add an idempotent- close guard for all four types. Public API is unchanged. Verified: texture/surface suite 63 passed; full cuda_core suite 3449 passed / 228 skipped / 3 xfailed; cython ABI test passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ore.textures namespace Three items from the NVIDIA#2188 design review (Andy-Jost): A+B. CUDAArray.copy_from / copy_to now match the Buffer copy API: - Accept Stream | GraphBuilder (via Stream_accept), so array copies can be captured into a CUDA graph — previously only a concrete Stream was accepted. - copy_to returns the destination object, for parity with Buffer.copy_to. The internal _copy3d helper now takes an already-coerced Stream. C. Group the texture/surface object model under a new cuda.core.textures namespace instead of the flat cuda.core namespace, before v1.1.0 ships (it is hard to move post-release). Adds cuda/core/textures.py re-exporting CUDAArray, ArrayFormat, MipmappedArray, ResourceDescriptor, TextureDescriptor, TextureObject, SurfaceObject, AddressMode, FilterMode, ReadMode; removes the flat exports from cuda.core.__init__ (grouped-only — no deprecation needed pre-release); migrates the texture/surface examples, texture_sample, and the docs api.rst section to the new namespace. Tests: update the stream-rejection match; add copy_to-returns-dst and a GraphBuilder-capture test for CUDAArray copies. Verified: texture/surface suite 65 passed; full cuda_core suite 3449 passed / 230 skipped / 3 xfailed; all examples byte-compile; cuda.core.textures exports the 10 symbols and cuda.core.CUDAArray is gone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…decision 2) Decision 2 of NVIDIA#2188 resolved to "ship copy_from/copy_to only, and document the copy-only contract." The code shipped; this adds the documentation. State plainly in the CUDAArray class docstring (and the .pyi stub) and the api.rst "Textures and surfaces" section that a CUDAArray has an opaque, hardware-defined layout with no linear device pointer, so it cannot expose __cuda_array_interface__ / DLPack or share memory zero-copy; data is moved in and out only by copying via copy_from / copy_to against a linear Buffer or a host buffer-protocol object, and there is no allocation helper. Docs only; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…VIDIA#2188 decision 7) Per NVIDIA#2188 decision 7, the texture/surface example showcase is orthogonal to the core API and lands in a follow-up. Move the 15 standalone gl_interop_*.py demos (bloom, caustics, clouds, fire, image_show, jfa_voronoi, lenia, mandelbrot, mipmap_lod, ocean, particles, physarum, reaction_diffusion, sdf_volume, texture_filter) onto the feature/cuda-core-texture-examples-467 branch for continued iteration and a future examples PR. Kept here as the representative end-to-end coverage of the new API: - examples/texture_sample.py (canonical sampling smoke test), - examples/gl_interop_fluid.py (cuda.core CUDAArray/TextureObject/SurfaceObject), - examples/gl_interop_fluid_numba_cuda_mlir.py (numba-cuda-mlir port) + its isolated pixi env. Drop the moved examples from the example-test requirements map; gl_interop_fluid and the pre-existing gl_interop_plasma stay wired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MipmappedArray is a headline type of this PR but was only exercised by unit tests after the example split. Pull gl_interop_mipmap_lod.py back in so the mipmap pyramid + get_level + trilinear (LINEAR mipmap) sampling with LOD bias path has an end-to-end demonstration; re-wire it in the example-test requirements map (has_display-gated). The other gl_interop_*.py demos stay on feature/cuda-core-texture-examples-467 for the follow-up examples PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-17T18:23:18Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-2095/
https://nvidia.github.io/cuda-python/pr-preview/pr-2095/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-2095/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-2095/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

`@pytest.mark.parametrize("in_arr,", _cpu_array_samples())` has a stray trailing comma in the names string. Older pytest tolerated it, but pytest 8.4+ (used in CI) treats `"in_arr,"` as a request to unpack each argvalue across multiple names — so a 3-element sample array becomes "3 values for 1 name" and the whole test session aborts at collection ("1 error during collection"), failing every GPU test job. Drop the trailing comma so each array sample is one parameter. Pre-existing (introduced in NVIDIA#1894, also on main); fixing here to unblock CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Drop unused `intptr_t` cimports in _mipmapped_array.pyx / _surface.pyx (flagged by cython-lint; the handle ints now go through as_intptr). - Regenerate the .pyi stubs (stubgen-pyx) to match the texture/surface .pyx after the resource-handle + copy-API changes. - isort/ruff-format fixups (cuda.core.textures import ordering, etc.). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rparolin · 2026-06-17T19:23:14Z

@leofang @mdboom @Andy-Jost @kkraus14 — this is ready for review. 🎉

Design signed off in Design: cuda.core Texture/Surface API surface #2188, and all the review feedback there is addressed: Array → CUDAArray and is_surface_load_store (decisions 1 & 6), copy-API parity with Buffer (copy_from/copy_to accept Stream | GraphBuilder; copy_to returns dst), and the texture/surface types are grouped under the new cuda.core.textures namespace (per @Andy-Jost's suggestion).
Resource lifetime now routes through cuda.core's C++ std::shared_ptr handle layer (structural dependencies in the box, GIL-guarded deleters, no __dealloc__/_parent_ref) — consistent with the rest of cuda.core.
Scope is the core API plus representative examples (texture_sample, gl_interop_mipmap_lod, and both Stable-Fluids ports); the remaining gl_interop_* showcase is split into a follow-up PR (decision 7).
CI is green — full GPU test matrix (linux-64 / aarch64 / win, py3.10–3.14t, CUDA 12.9 / 13.0 / 13.3), docs build, CodeQL/analyze, and pre-commit.ci all passing.

Example screenshots are in the comment above. Thanks!

rparolin and others added 10 commits May 13, 2026 13:51

Merge branch 'main' into feature/cuda-core-texture-surface-467

e653b68

rparolin added this to the cuda.core v1.1.0 milestone May 15, 2026

rparolin added feature New feature or request cuda.core Everything related to the cuda.core module labels May 15, 2026

leofang reviewed May 15, 2026

View reviewed changes

rparolin changed the title ~~cuda.core: add Array, TextureObject, SurfaceObject, MipmappedArray (#467)~~ [DO NOT REVIEW] cuda.core: add Array, TextureObject, SurfaceObject, MipmappedArray (#467) May 15, 2026

rparolin mentioned this pull request Jun 9, 2026

Design: cuda.core Texture/Surface API surface #2188

Open

rparolin linked an issue Jun 10, 2026 that may be closed by this pull request

Design: cuda.core Texture/Surface API surface #2188

Open

rparolin and others added 4 commits June 9, 2026 18:01

Merge branch 'main' into feature/cuda-core-texture-surface-467

fbe880a

# Conflicts: # cuda_core/tests/example_tests/test_basic_examples.py

rparolin and others added 5 commits June 10, 2026 15:44

caustics improvements

d8c2db6

rparolin and others added 3 commits June 12, 2026 11:30

Merge remote-tracking branch 'upstream/main' into feature/cuda-core-t…

8016ece

…exture-surface-467

rparolin self-assigned this Jun 16, 2026

rparolin and others added 3 commits June 16, 2026 18:07

rparolin changed the title ~~[DO NOT REVIEW] cuda.core: add Array, TextureObject, SurfaceObject, MipmappedArray (#467)~~ cuda.core: add CUDAArray, TextureObject, SurfaceObject, MipmappedArray (texture/surface API, #467) Jun 17, 2026

rparolin requested a review from Andy-Jost June 17, 2026 17:37

rparolin marked this pull request as ready for review June 17, 2026 17:39

rparolin and others added 2 commits June 17, 2026 11:25

rparolin requested a review from leofang June 17, 2026 19:23

rparolin mentioned this pull request Jun 17, 2026

cuda.core: document the canonical resource-lifetime pattern in AGENTS.md #2234

Open

Merge branch 'main' into feature/cuda-core-texture-surface-467

cafcc3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda.core: add CUDAArray, TextureObject, SurfaceObject, MipmappedArray (texture/surface API, #467)#2095

cuda.core: add CUDAArray, TextureObject, SurfaceObject, MipmappedArray (texture/surface API, #467)#2095
rparolin wants to merge 31 commits into
NVIDIA:mainfrom
rparolin:feature/cuda-core-texture-surface-467

rparolin commented May 15, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

leofang left a comment •

edited

Loading

Uh oh!

rparolin commented Jun 9, 2026

Uh oh!

rparolin commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rparolin commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rparolin commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions reflected (#2188)

Review feedback addressed (#2188)

Resource lifetime

What's included

Test plan

Remaining (follow-up PRs)

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

leofang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rparolin commented Jun 9, 2026

Uh oh!

rparolin commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cuda.core texture / surface examples

Uh oh!

github-actions Bot commented Jun 17, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rparolin commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rparolin commented May 15, 2026 •

edited

Loading

leofang left a comment •

edited

Loading

rparolin commented Jun 10, 2026 •

edited

Loading

`cuda.core` texture / surface examples