Summary
Several cuda.core APIs accept stream=None and silently fall back to default_stream() (or NULL). This makes the stream choice implicit and environment-dependent (CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM), which is error-prone. Users should always pass a stream explicitly — including device.default_stream when they want the default.
Design rule
stream should be a keyword-only argument with no default value (*, stream: Stream), so callers must always pass a stream explicitly.
APIs that already follow this convention:
Buffer.copy_to(dst=None, *, stream: Stream | GraphBuilder)
Buffer.copy_from(src, *, stream: Stream | GraphBuilder)
Buffer.fill(value, *, stream: Stream | GraphBuilder)
APIs that need to change
stream=None → default_stream() (implicit fallback, remove it)
MemoryPool.allocate() / deallocate()
GraphMemoryResource.allocate() / deallocate()
GraphicsResource.map()
LegacyPinnedMemoryResource.allocate()
_SynchronousMemoryResource.allocate()
Device.allocate() (delegates to the above)
stream=None → NULL / legacy default (same issue)
Kernel.max_potential_cluster_size(config, stream=None)
Kernel.max_active_clusters(config, stream=None)
stream should become keyword-only
GraphBuilder.launch(self, stream: Stream) — currently positional, should be (self, *, stream: Stream)
APIs that are fine (no change needed)
launch(stream, config, kernel, *args) — stream is the 1st positional arg by design
Buffer.close(stream=None) — None means "reuse the original stream", by design
GraphicsResource.unmap(stream=None) / close(stream=None) — same, reuses the mapping stream
Proposed implementation
- Have
Stream_accept() raise TypeError when stream is None, so the check is centralized and cannot be forgotten.
- Remove the
= None default from every affected API signature.
- Make
stream keyword-only where it isn't already (except launch() where it's intentionally positional).
Prior art: CCCL's cccl-runtime
CCCL's modern CUDA runtime (libcudacxx, cuda:: namespace) follows the same explicit-stream principle:
The CUDA default (NULL) stream is not exposed as a first-class runtime object because it is tied to implicit per-device state and encourages hidden dependencies.
— docs/libcudacxx/runtime/cudart_interactions.rst
Concretely:
cuda::launch(stream, config, kernel, args...) requires an explicit stream_ref as the first parameter, with no default.
- The
cuda::mr::resource concept requires allocate(cuda::stream_ref, ...) / deallocate(cuda::stream_ref, ...) — no fallback to a default stream.
cuda::stream_ref deletes constructors from int and nullptr, and deprecates the parameterless (default stream) constructor.
Summary
Several
cuda.coreAPIs acceptstream=Noneand silently fall back todefault_stream()(or NULL). This makes the stream choice implicit and environment-dependent (CUDA_PYTHON_CUDA_PER_THREAD_DEFAULT_STREAM), which is error-prone. Users should always pass a stream explicitly — includingdevice.default_streamwhen they want the default.Design rule
streamshould be a keyword-only argument with no default value (*, stream: Stream), so callers must always pass a stream explicitly.APIs that already follow this convention:
Buffer.copy_to(dst=None, *, stream: Stream | GraphBuilder)Buffer.copy_from(src, *, stream: Stream | GraphBuilder)Buffer.fill(value, *, stream: Stream | GraphBuilder)APIs that need to change
stream=None→default_stream()(implicit fallback, remove it)MemoryPool.allocate()/deallocate()GraphMemoryResource.allocate()/deallocate()GraphicsResource.map()LegacyPinnedMemoryResource.allocate()_SynchronousMemoryResource.allocate()Device.allocate()(delegates to the above)stream=None→ NULL / legacy default (same issue)Kernel.max_potential_cluster_size(config, stream=None)Kernel.max_active_clusters(config, stream=None)streamshould become keyword-onlyGraphBuilder.launch(self, stream: Stream)— currently positional, should be(self, *, stream: Stream)APIs that are fine (no change needed)
launch(stream, config, kernel, *args)— stream is the 1st positional arg by designBuffer.close(stream=None)— None means "reuse the original stream", by designGraphicsResource.unmap(stream=None)/close(stream=None)— same, reuses the mapping streamProposed implementation
Stream_accept()raiseTypeErrorwhenstream is None, so the check is centralized and cannot be forgotten.= Nonedefault from every affected API signature.streamkeyword-only where it isn't already (exceptlaunch()where it's intentionally positional).Prior art: CCCL's cccl-runtime
CCCL's modern CUDA runtime (libcudacxx,
cuda::namespace) follows the same explicit-stream principle:Concretely:
cuda::launch(stream, config, kernel, args...)requires an explicitstream_refas the first parameter, with no default.cuda::mr::resourceconcept requiresallocate(cuda::stream_ref, ...)/deallocate(cuda::stream_ref, ...)— no fallback to a default stream.cuda::stream_refdeletes constructors fromintandnullptr, and deprecates the parameterless (default stream) constructor.