Skip to content

Moinul-Laskar: Implemented CUDA matrix multiplication#7

Open
priyoislam wants to merge 1 commit into
parallelcomputingabo:mainfrom
priyoislam:Moinul-laskar
Open

Moinul-Laskar: Implemented CUDA matrix multiplication#7
priyoislam wants to merge 1 commit into
parallelcomputingabo:mainfrom
priyoislam:Moinul-laskar

Conversation

@priyoislam
Copy link
Copy Markdown

  1. Initially encountered the error: CUDA error: the provided PTX was compiled with an unsupported toolchain. This was resolved by changing CUDA_ARCHITECTURES only to 80 in CMakeLists.txt
  2. A batch script was written to run all 10 test cases in sequence using one A100 GPUs
  3. Benchmarked CUDA implementations across 10 test cases and compared against CPU results (naive, blocked, and parallel) from Assignment 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant