-
Notifications
You must be signed in to change notification settings - Fork 71
Move CUDA interop behind native opt-in #1067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
AnastaZIuk
wants to merge
27
commits into
vk_cuda_interop
Choose a base branch
from
cuInteropBS
base: vk_cuda_interop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
f4ce3dc
Move CUDA interop behind extension target
AnastaZIuk 78845ae
Address CUDA interop review cleanup
AnastaZIuk ab9a7e5
Simplify CUDA interop smoke CMake
AnastaZIuk bf8eeb3
Clean CUDA interop smoke usage requirements
AnastaZIuk f701ac6
Export CUDA interop package target
AnastaZIuk a520d57
Use CUDAToolkit package targets
AnastaZIuk 4bddc57
Require CUDA version via CMake
AnastaZIuk 6f68e66
Split CUDA interop native surface
AnastaZIuk 49bcb2c
Add native CUDA accessor overloads
AnastaZIuk d85657e
Document CUDA interop target split
AnastaZIuk 6e8c4f9
Trim CUDA interop README wording
AnastaZIuk 881e9b8
Move CUDA interop into Nabla
AnastaZIuk 5dd1134
Document CUDA interop accessor model
AnastaZIuk e514df7
Inline CUDA interop stubs
AnastaZIuk e53c838
Refine CUDA interop boundary
AnastaZIuk 1417905
Add CUDA interop runtime header discovery
AnastaZIuk 045432e
Tighten CUDA interop native helpers
AnastaZIuk 8a119dd
Hide CUDA interop native state construction
AnastaZIuk e018545
Clean up CUDA runtime header discovery
AnastaZIuk c6ef6ee
Move CUDA interop API back into video
AnastaZIuk d559a2c
Move smart pointer helpers into core
AnastaZIuk 38705b9
Use CUDA interop accessors
AnastaZIuk 23e6ef5
Use explicit CUDA compile log
AnastaZIuk a640183
Trim CUDA interop API surface
AnastaZIuk 5bf0e2d
Keep CUDA SDK layouts private
AnastaZIuk d745421
Simplify CUDA interop helper
AnastaZIuk ffba3d4
Update CUDA interop examples pointer
AnastaZIuk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| function(nbl_target_link_cuda_interop TARGET_NAME SCOPE) | ||
| if(NOT SCOPE MATCHES "^(PRIVATE|PUBLIC|INTERFACE)$") | ||
| set(SCOPE PRIVATE) | ||
| endif() | ||
| cmake_parse_arguments(_NBL_CUDA_INTEROP "" "RUNTIME_JSON" "INCLUDE_DIRS" ${ARGN}) | ||
| target_link_libraries("${TARGET_NAME}" ${SCOPE} Nabla::ext::CUDAInterop) | ||
| set(_include_dir_entries "") | ||
| foreach(_include_dir IN LISTS _NBL_CUDA_INTEROP_INCLUDE_DIRS CUDAToolkit_INCLUDE_DIRS) | ||
| if(_include_dir) | ||
| file(TO_CMAKE_PATH "${_include_dir}" _include_dir) | ||
| list(APPEND _include_dir_entries " \"${_include_dir}\"") | ||
| endif() | ||
| endforeach() | ||
| list(JOIN _include_dir_entries "," _include_dirs_json) | ||
| set(_runtime_json [=[ | ||
| { | ||
| "cudaRuntimeIncludeDirs": [ | ||
| @_include_dirs_json@ | ||
| ] | ||
| } | ||
| ]=]) | ||
| string(CONFIGURE "${_runtime_json}" _runtime_json @ONLY) | ||
| set(_runtime_json_path "$<TARGET_FILE_DIR:${TARGET_NAME}>/nbl_cuda_interop_runtime.json") | ||
| if(_NBL_CUDA_INTEROP_RUNTIME_JSON) | ||
| set(_runtime_json_path "${_NBL_CUDA_INTEROP_RUNTIME_JSON}") | ||
| endif() | ||
| file(GENERATE OUTPUT "${_runtime_json_path}" CONTENT "${_runtime_json}" TARGET "${TARGET_NAME}") | ||
| endfunction() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Submodule examples_tests
updated
4 files
| +1 −1 | 67_RayQueryGeometry/main.cpp | |
| +3 −1 | 76_CudaInterop/CMakeLists.txt | |
| +49 −101 | 76_CudaInterop/main.cpp | |
| +4 −2 | CMakeLists.txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,214 @@ | ||
| // Copyright (C) 2018-2020 - DevSH Graphics Programming Sp. z O.O. | ||
| // This file is part of the "Nabla Engine". | ||
| // For conditions of distribution and use, see copyright notice in nabla.h | ||
| #ifndef _NBL_EXT_CUDA_INTEROP_NATIVE_H_INCLUDED_ | ||
| #define _NBL_EXT_CUDA_INTEROP_NATIVE_H_INCLUDED_ | ||
|
|
||
| #include "nbl/video/CUDAInterop.h" | ||
|
|
||
| #include "nbl/asset/ICPUBuffer.h" | ||
| #include "nbl/system/DynamicFunctionCaller.h" | ||
|
|
||
| #include <string> | ||
|
|
||
| #include "cuda.h" | ||
| #include "nvrtc.h" | ||
| #if CUDA_VERSION < 13000 | ||
| #error "Need CUDA 13.0 SDK or higher." | ||
| #endif | ||
|
|
||
| namespace nbl::video::cuda_native | ||
| { | ||
|
|
||
| inline constexpr int MinimumCUDADriverVersion = 13000; | ||
| inline constexpr int MinimumNVRTCMajorVersion = MinimumCUDADriverVersion/1000; | ||
|
|
||
| using LibLoader = system::DefaultFuncPtrLoader; | ||
|
|
||
| NBL_SYSTEM_DECLARE_DYNAMIC_FUNCTION_CALLER_CLASS(CUDA,LibLoader | ||
| ,cuCtxCreate_v4 | ||
| ,cuDevicePrimaryCtxRetain | ||
| ,cuDevicePrimaryCtxRelease | ||
| ,cuDevicePrimaryCtxSetFlags | ||
| ,cuDevicePrimaryCtxGetState | ||
| ,cuCtxDestroy_v2 | ||
| ,cuCtxEnablePeerAccess | ||
| ,cuCtxGetApiVersion | ||
| ,cuCtxGetCurrent | ||
| ,cuCtxGetDevice | ||
| ,cuCtxGetSharedMemConfig | ||
| ,cuCtxPopCurrent_v2 | ||
| ,cuCtxPushCurrent_v2 | ||
| ,cuCtxSetCacheConfig | ||
| ,cuCtxSetCurrent | ||
| ,cuCtxSetSharedMemConfig | ||
| ,cuCtxSynchronize | ||
| ,cuDeviceComputeCapability | ||
| ,cuDeviceCanAccessPeer | ||
| ,cuDeviceGetCount | ||
| ,cuDeviceGet | ||
| ,cuDeviceGetAttribute | ||
| ,cuDeviceGetLuid | ||
| ,cuDeviceGetUuid_v2 | ||
| ,cuDeviceTotalMem_v2 | ||
| ,cuDeviceGetName | ||
| ,cuDriverGetVersion | ||
| ,cuEventCreate | ||
| ,cuEventDestroy_v2 | ||
| ,cuEventElapsedTime | ||
| ,cuEventQuery | ||
| ,cuEventRecord | ||
| ,cuEventSynchronize | ||
| ,cuFuncGetAttribute | ||
| ,cuFuncSetCacheConfig | ||
| ,cuGetErrorName | ||
| ,cuGetErrorString | ||
| ,cuGraphicsMapResources | ||
| ,cuGraphicsResourceGetMappedPointer_v2 | ||
| ,cuGraphicsResourceGetMappedMipmappedArray | ||
| ,cuGraphicsSubResourceGetMappedArray | ||
| ,cuGraphicsUnmapResources | ||
| ,cuGraphicsUnregisterResource | ||
| ,cuInit | ||
| ,cuLaunchKernel | ||
| ,cuMemAlloc_v2 | ||
| ,cuMemcpyDtoD_v2 | ||
| ,cuMemcpyDtoH_v2 | ||
| ,cuMemcpyHtoD_v2 | ||
| ,cuMemcpyDtoDAsync_v2 | ||
| ,cuMemcpyDtoHAsync_v2 | ||
| ,cuMemcpyHtoDAsync_v2 | ||
| ,cuMemGetAddressRange_v2 | ||
| ,cuMemFree_v2 | ||
| ,cuMemFreeHost | ||
| ,cuMemGetInfo_v2 | ||
| ,cuMemHostAlloc | ||
| ,cuMemHostRegister_v2 | ||
| ,cuMemHostUnregister | ||
| ,cuMemsetD32_v2 | ||
| ,cuMemsetD32Async | ||
| ,cuMemsetD8_v2 | ||
| ,cuMemsetD8Async | ||
| ,cuModuleGetFunction | ||
| ,cuModuleGetGlobal_v2 | ||
| ,cuModuleLoadDataEx | ||
| ,cuModuleLoadFatBinary | ||
| ,cuModuleUnload | ||
| ,cuOccupancyMaxActiveBlocksPerMultiprocessor | ||
| ,cuPointerGetAttribute | ||
| ,cuStreamAddCallback | ||
| ,cuStreamCreate | ||
| ,cuStreamDestroy_v2 | ||
| ,cuStreamQuery | ||
| ,cuStreamSynchronize | ||
| ,cuStreamWaitEvent | ||
| ,cuSurfObjectCreate | ||
| ,cuSurfObjectDestroy | ||
| ,cuTexObjectCreate | ||
| ,cuTexObjectDestroy | ||
| ,cuImportExternalMemory | ||
| ,cuDestroyExternalMemory | ||
| ,cuExternalMemoryGetMappedBuffer | ||
| ,cuMemUnmap | ||
| ,cuMemAddressFree | ||
| ,cuMemGetAllocationGranularity | ||
| ,cuMemAddressReserve | ||
| ,cuMemCreate | ||
| ,cuMemExportToShareableHandle | ||
| ,cuMemMap | ||
| ,cuMemRelease | ||
| ,cuMemSetAccess | ||
| ,cuMemImportFromShareableHandle | ||
| ,cuLaunchHostFunc | ||
| ,cuDestroyExternalSemaphore | ||
| ,cuImportExternalSemaphore | ||
| ,cuSignalExternalSemaphoresAsync | ||
| ,cuWaitExternalSemaphoresAsync | ||
| ,cuLogsRegisterCallback | ||
| ); | ||
|
|
||
| NBL_SYSTEM_DECLARE_DYNAMIC_FUNCTION_CALLER_CLASS(NVRTC,LibLoader, | ||
| nvrtcGetErrorString, | ||
| nvrtcVersion, | ||
| nvrtcAddNameExpression, | ||
| nvrtcCompileProgram, | ||
| nvrtcCreateProgram, | ||
| nvrtcDestroyProgram, | ||
| nvrtcGetLoweredName, | ||
| nvrtcGetPTX, | ||
| nvrtcGetPTXSize, | ||
| nvrtcGetProgramLog, | ||
| nvrtcGetProgramLogSize | ||
| ); | ||
|
|
||
| struct SCUDADeviceInfo | ||
| { | ||
| CUdevice handle = {}; | ||
| CUuuid uuid = {}; | ||
| }; | ||
|
|
||
| struct SExportableMemoryCreationParams | ||
| { | ||
| size_t size; | ||
| uint32_t alignment; | ||
| CUmemLocationType location; | ||
| }; | ||
|
|
||
| struct SPTXResult | ||
| { | ||
| core::smart_refctd_ptr<asset::ICPUBuffer> ptx; | ||
| nvrtcResult result; | ||
| }; | ||
|
|
||
| // Opt-in native CUDA API. The declarations below are implemented by the Nabla library. | ||
| // This header is intentionally the only public path that includes CUDA SDK types. | ||
| class NBL_API2 CCUDAHandlerAccessor | ||
| { | ||
| public: | ||
| static const CUDA& getCUDAFunctionTable(const CCUDAHandler& handler); | ||
| static const NVRTC& getNVRTCFunctionTable(const CCUDAHandler& handler); | ||
| static bool defaultHandleResult(CUresult result, const system::logger_opt_ptr& logger); | ||
| static bool defaultHandleResult(const CCUDAHandler& handler, CUresult result); | ||
| static bool defaultHandleResult(const CCUDAHandler& handler, nvrtcResult result); | ||
| static const core::vector<SCUDADeviceInfo>& getAvailableDevices(const CCUDAHandler& handler); | ||
| static nvrtcResult createProgram(CCUDAHandler& handler, nvrtcProgram* prog, std::string&& source, const char* name, const int headerCount=0, const char* const* headerContents=nullptr, const char* const* includeNames=nullptr); | ||
| static nvrtcResult compileProgram(const CCUDAHandler& handler, nvrtcProgram prog, core::SRange<const char* const> options); | ||
| static nvrtcResult getProgramLog(const CCUDAHandler& handler, nvrtcProgram prog, std::string& log); | ||
| static SPTXResult getPTX(const CCUDAHandler& handler, nvrtcProgram prog); | ||
| static SPTXResult compileDirectlyToPTX( | ||
| CCUDAHandler& handler, std::string&& source, const char* filename, core::SRange<const char* const> nvrtcOptions, | ||
| std::string& log, const int headerCount=0, const char* const* headerContents=nullptr, const char* const* includeNames=nullptr | ||
| ); | ||
| }; | ||
|
|
||
| class NBL_API2 CCUDADeviceAccessor | ||
| { | ||
| public: | ||
| static CUdevice getInternalObject(const CCUDADevice& device); | ||
| static CUcontext getContext(const CCUDADevice& device); | ||
| static size_t roundToGranularity(const CCUDADevice& device, CUmemLocationType location, size_t size); | ||
| static core::smart_refctd_ptr<CCUDAExportableMemory> createExportableMemory(CCUDADevice& device, SExportableMemoryCreationParams&& params); | ||
| }; | ||
|
|
||
| class NBL_API2 CCUDAExportableMemoryAccessor | ||
| { | ||
| public: | ||
| static CUdeviceptr getDeviceptr(const CCUDAExportableMemory& memory); | ||
| }; | ||
|
|
||
| class NBL_API2 CCUDAImportedMemoryAccessor | ||
| { | ||
| public: | ||
| static CUexternalMemory getInternalObject(const CCUDAImportedMemory& memory); | ||
| static CUresult getMappedBuffer(const CCUDAImportedMemory& memory, CUdeviceptr* mappedBuffer); | ||
| }; | ||
|
|
||
| class NBL_API2 CCUDAImportedSemaphoreAccessor | ||
| { | ||
| public: | ||
| static CUexternalSemaphore getInternalObject(const CCUDAImportedSemaphore& semaphore); | ||
| }; | ||
|
|
||
| } | ||
|
|
||
| #endif | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accessors make no sense just move all the
nbl/video/CCUDA*.hto the extension