Skip to content

viktor-thodin: Implemented CUDA matrix multiplication#14

Open
VThodin wants to merge 1 commit into
parallelcomputingabo:mainfrom
VThodin:Viktor-Thodin
Open

viktor-thodin: Implemented CUDA matrix multiplication#14
VThodin wants to merge 1 commit into
parallelcomputingabo:mainfrom
VThodin:Viktor-Thodin

Conversation

@VThodin
Copy link
Copy Markdown

@VThodin VThodin commented May 31, 2025

Homework-3 solution.
To get Cuda working I had to do some changes in Build, Execution, Deployment and manually add Cmake options for DCUDAToolkit_ROOT and -DCmake_CUDA_Architecture. After that there where no difficulties with getting it working.

The tile size that I found performed more consistent where 16 and when testing with 32 it was usually slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant