Add nightly CI for optional-dependency testing (PyTorch, numba-cuda) by leofang · Pull Request #1987 · NVIDIA/cuda-python

leofang · 2026-04-29T02:14:51Z

Add a nightly CI pipeline that tests cuda-python wheels against optional dependencies (PyTorch and numba-cuda) without rebuilding wheels. Wheels are downloaded from the latest successful CI run on main.

Design

ci-nightly.yml: New orchestrator workflow (2 AM UTC daily + workflow_dispatch for manual testing). Finds the latest successful CI run on main and passes its run-id to the existing test workflows.
test-wheel-linux/windows.yml: Extended with two new inputs:
- run-id: enables actions/download-artifact to pull wheels from a different workflow run (defaults to github.run_id for backward compatibility)
- test-mode: standard (default, current behavior), nightly-pytorch, or nightly-numba-cuda
test-matrix.yml: New nightly: entries with a MODE field. The orchestrator uses the existing matrix_filter input to select by mode.
run-tests: New nightly-install mode that installs all wheels without running standard tests.

Test matrix (14 jobs)

PyTorch (8 jobs: 4 linux-64 + 4 win-64)

TORCH_VER	TORCH_CUDA	CUDA_VER	Platform
latest	cu126	12.9.1	linux, windows
latest	cu130	13.2.1	linux, windows
2.9.1	cu126	12.9.1	linux, windows
2.9.1	cu130	13.2.1	linux, windows

Tests: cuda_core/tests/test_utils.py (SMV/DLPack interop) + cuda_core/tests/example_tests/ (pytorch_example)

numba-cuda (6 jobs: 2 linux-64 + 2 linux-aarch64 + 2 win-64)

CUDA_VER	Platforms
12.9.1	linux-64, linux-aarch64, win-64
13.2.1	linux-64, linux-aarch64, win-64

Tests: python -m numba_cuda.numba.cuda.tests (numba-cuda's bundled test suite)

How to test this PR

Merge or push to branch
Go to Actions → "CI: Nightly optional-deps" → "Run workflow"
Optionally supply a specific run-id from a recent successful CI run

Standard CI (ci.yml) is unaffected — test-mode defaults to standard and run-id defaults to github.run_id.

-- Leo's bot

…ba-cuda) Add ci-nightly.yml that downloads wheels from the latest successful CI run on main and tests them against PyTorch and numba-cuda, without rebuilding. Key changes: - ci-nightly.yml: new orchestrator (schedule 2 AM UTC + workflow_dispatch) - test-wheel-linux/windows.yml: add run-id input for cross-run artifact downloads, and test-mode input (standard/nightly-pytorch/nightly-numba-cuda) with conditional test steps - ci/test-matrix.yml: add nightly entries with MODE field (4 pytorch + 6 numba-cuda across linux-64, linux-aarch64, win-64) - ci/tools/run-tests: add nightly-install mode that installs all wheels without running standard tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

copy-pr-bot · 2026-04-29T02:14:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

leofang · 2026-04-29T02:19:38Z

/ok to test 4aadce2

- Add concurrency group matching ci.yml's pattern - Replace jq one-liner with explicit cancelled/failure checks per ci.yml's battle-tested pattern (see long comment there for rationale) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove before merging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

leofang · 2026-04-29T02:35:53Z

/ok to test ac5238c

Full history is not needed — we only read ci/versions.yml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Artifact names embed the commit SHA from the build that created them. When the nightly workflow downloads artifacts from a different CI run, it must use that run's SHA — not github.sha (the nightly run's own SHA) — to construct the correct artifact names. - ci-nightly.yml: resolve head_sha from the source CI run via `gh run view --json headSha`, pass it to test workflows - test-wheel-linux/windows.yml: add `sha` input (defaults to github.sha for backward compatibility), use it in env-vars Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

leofang · 2026-04-29T02:44:12Z

/ok to test 9286598

github-actions · 2026-04-29T03:00:55Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1987/
https://nvidia.github.io/cuda-python/pr-preview/pr-1987/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1987/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1987/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

leofang · 2026-04-29T03:30:38Z

/ok to test 8720de0

leofang · 2026-04-29T03:49:11Z

/ok to test 6976f8a

- Install ALL wheels (pathfinder + bindings + core) and optional dep (torch/numba-cuda) in a single pip call so pip resolves everything together and avoids costly reinstall cycles from version conflicts - Fix "Display structure" step: show only artifact files (cuda_python*.whl, cuda_pathfinder/) instead of ls -lahR . which lists the entire repo - Fix numba-cuda test command: python -m numba.runtests numba.cuda.tests - Install Visual C++ Redistributable on Windows before PyTorch (pytorch/pytorch#166628) - run-tests now does pip list at the end of nightly installs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

leofang · 2026-04-29T04:05:23Z

/ok to test fc1dc5d

leofang · 2026-04-29T04:23:56Z

/ok to test 3653e7a

leofang · 2026-04-29T05:10:41Z

/ok to test bf62c2b

leofang · 2026-04-29T05:39:19Z

/ok to test 5d653ce

leofang · 2026-04-29T06:01:06Z

/ok to test 8586cf7

leofang · 2026-04-29T13:51:22Z

/ok to test 6953cdd

leofang · 2026-04-29T17:23:36Z

/ok to test 294eee4

leofang · 2026-04-29T18:08:16Z

/ok to test 4f409a7

CUDA_VER in the test environment should match TORCH_CUDA in major.minor. BUILD_CUDA_VER (from build-ctk-ver input) is used for artifact names, so CUDA_VER can differ. - cu126 → CUDA_VER: 12.6.3 (was 12.9.1) - cu130 → CUDA_VER: 13.0.2 (was 13.2.1) For CUDA 12 entries, USE_BACKPORT_BINDINGS kicks in automatically since BUILD_CUDA_MAJOR (13) \!= TEST_CUDA_MAJOR (12), pulling bindings from the backport branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

leofang · 2026-04-29T18:55:00Z

/ok to test edeaa76

leofang · 2026-04-29T20:23:43Z

The failing numba-cuda tests will be fixed by a new release (where the fix is included, NVIDIA/numba-cuda#873).

The failing PyTorch tests will be fixed by #1988.

CUDA_VER in the test environment should match TORCH_CUDA in major.minor. BUILD_CUDA_VER (from build-ctk-ver input) is used for artifact names, so CUDA_VER can differ. - cu126 → CUDA_VER: 12.6.3 (was 12.9.1) - cu130 → CUDA_VER: 13.0.2 (was 13.2.1) For CUDA 12 entries, USE_BACKPORT_BINDINGS kicks in automatically since BUILD_CUDA_MAJOR (13) \!= TEST_CUDA_MAJOR (12), pulling bindings from the backport branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The indentation bug in test_linker.py was fixed in the latest numba-cuda release, so the workaround patch is no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…python into ci-nightly-optdeps

leofang · 2026-04-30T22:13:14Z

/ok to test 833490e

@rwgk

…1999) * Fix torch-incompatible assertions in TestViewCudaArrayInterfaceGPU The _check_view method in TestViewCudaArrayInterfaceGPU was missed during the tensor bridge refactor (#1894) and still used raw numpy attributes (in_arr.size, in_arr.strides, in_arr.flags, etc.) that don't work with torch tensors. Use the _arr_* helpers that #1894 added for torch/numpy compatibility. Caught by the nightly optional-dependency CI (#1987). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix strides assertion for torch CAI: allow explicit C-contiguous strides torch's __cuda_array_interface__ always reports strides, even for C-contiguous tensors. Use the same assertion pattern as the other _check_view methods: allow strides to equal the C-contiguous values instead of requiring None. Verified locally: 7/7 torch CAI tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Unify strides assertion pattern across all _check_view methods Use the same if/else pattern with `in (None, strides_in_counts)` in all three _check_view methods for consistency. Previously TestViewCPU and TestViewCudaArrayInterfaceGPU used a one-liner that was harder to read and behaved slightly differently. Verified locally: 66/66 tests pass across TestViewCPU, TestViewGPU, and TestViewCudaArrayInterfaceGPU (including all torch variants). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address review: flip strides assertion, add _arr_dtype, merge main Per @rwgk's review: - Flip strides check to branch on view.strides (all 3 _check_view) - Add _arr_dtype helper using __cuda_array_interface__["typestr"] for torch tensors, restore dtype assertion in CAI _check_view - Merge main to pick up #1998 (numba flags fix) Verified locally: 76/76 tests pass across all three test classes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

leofang · 2026-05-01T20:42:12Z

/ok to test 2a749d5

github-actions Bot added the CI/CD CI/CD infrastructure label Apr 29, 2026

leofang self-assigned this Apr 29, 2026

leofang and others added 2 commits April 29, 2026 02:33

Temporarily add push trigger to ci-nightly.yml for testing

ac5238c

Remove before merging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

leofang and others added 2 commits April 29, 2026 02:39

Use shallow clone (fetch-depth: 1) in ci-nightly.yml

cb5aefa

Full history is not needed — we only read ci/versions.yml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

leofang force-pushed the ci-nightly-optdeps branch from a279179 to 8720de0 Compare April 29, 2026 03:26

leofang force-pushed the ci-nightly-optdeps branch from 8720de0 to 6976f8a Compare April 29, 2026 03:47

leofang force-pushed the ci-nightly-optdeps branch from 6976f8a to 0b7cc50 Compare April 29, 2026 03:54

leofang force-pushed the ci-nightly-optdeps branch from fc1dc5d to 3653e7a Compare April 29, 2026 04:21

leofang force-pushed the ci-nightly-optdeps branch from 3653e7a to bf62c2b Compare April 29, 2026 05:06

leofang force-pushed the ci-nightly-optdeps branch 3 times, most recently from 0c98a26 to 5d653ce Compare April 29, 2026 05:38

leofang force-pushed the ci-nightly-optdeps branch 3 times, most recently from 24ea333 to 8586cf7 Compare April 29, 2026 06:00

leofang force-pushed the ci-nightly-optdeps branch 2 times, most recently from 8586cf7 to 6953cdd Compare April 29, 2026 13:47

leofang mentioned this pull request Apr 29, 2026

Fix tensor bridge DLL import failure on Windows #1988

Merged

leofang force-pushed the ci-nightly-optdeps branch from 6953cdd to 294eee4 Compare April 29, 2026 17:22

leofang force-pushed the ci-nightly-optdeps branch from 294eee4 to 4f409a7 Compare April 29, 2026 17:54

leofang mentioned this pull request Apr 29, 2026

Fix indentation bug in test_linker.py NVIDIA/numba-cuda#875

Merged

leofang force-pushed the ci-nightly-optdeps branch from 4f409a7 to edeaa76 Compare April 29, 2026 18:53

leofang commented Apr 30, 2026

View reviewed changes

Comment thread .github/workflows/test-wheel-windows.yml

Comment thread .github/workflows/test-wheel-linux.yml

Comment thread .github/workflows/test-wheel-windows.yml

Comment thread ci/tools/patch-numba-cuda Outdated

leofang force-pushed the ci-nightly-optdeps branch from ea811c1 to 6f04205 Compare April 30, 2026 19:32

leofang and others added 3 commits April 30, 2026 19:49

Merge remote-tracking branch 'origin/main' into ci-nightly-optdeps

da50070

Remove numba-cuda test_linker.py patch (fixed upstream)

cec1d8d

The indentation bug in test_linker.py was fixed in the latest numba-cuda release, so the workaround patch is no longer needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'ci-nightly-optdeps' of https://github.com/leofang/cuda-…

833490e

…python into ci-nightly-optdeps

leofang mentioned this pull request Apr 30, 2026

Fix torch-incompatible assertions in TestViewCudaArrayInterfaceGPU #1999

Merged

Merge branch 'main' into ci-nightly-optdeps

2a749d5

Conversation

leofang commented Apr 29, 2026

Design

Test matrix (14 jobs)

PyTorch (8 jobs: 4 linux-64 + 4 win-64)

numba-cuda (6 jobs: 2 linux-64 + 2 linux-aarch64 + 2 win-64)

How to test this PR

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

leofang commented Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leofang commented Apr 30, 2026

Uh oh!

leofang commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant