Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
[*]
cpp_indent_braces=false
cpp_indent_multi_line_relative_to=innermost_parenthesis
cpp_indent_within_parentheses=indent
cpp_indent_preserve_within_parentheses=false
cpp_indent_case_labels=false
cpp_indent_case_contents=true
cpp_indent_case_contents_when_block=false
cpp_indent_lambda_braces_when_parameter=true
cpp_indent_goto_labels=one_left
cpp_indent_preprocessor=leftmost_column
cpp_indent_access_specifiers=false
cpp_indent_namespace_contents=true
cpp_indent_preserve_comments=false
# Changed from ignore to remove
cpp_new_line_before_open_brace_namespace=remove
cpp_new_line_before_open_brace_type=remove
cpp_new_line_before_open_brace_function=remove
cpp_new_line_before_open_brace_block=remove
cpp_new_line_before_open_brace_lambda=remove
# ---
cpp_new_line_scope_braces_on_separate_lines=false
cpp_new_line_close_brace_same_line_empty_type=false
cpp_new_line_close_brace_same_line_empty_function=false
cpp_new_line_before_catch=true
cpp_new_line_before_else=true
cpp_new_line_before_while_in_do_while=false
cpp_space_before_function_open_parenthesis=remove
cpp_space_within_parameter_list_parentheses=false
cpp_space_between_empty_parameter_list_parentheses=false
cpp_space_after_keywords_in_control_flow_statements=true
cpp_space_within_control_flow_statement_parentheses=false
cpp_space_before_lambda_open_parenthesis=false
cpp_space_within_cast_parentheses=false
cpp_space_after_cast_close_parenthesis=false
cpp_space_within_expression_parentheses=false
cpp_space_before_block_open_brace=true
cpp_space_between_empty_braces=false
cpp_space_before_initializer_list_open_brace=false
cpp_space_within_initializer_list_braces=true
cpp_space_preserve_in_initializer_list=true
cpp_space_before_open_square_bracket=false
cpp_space_within_square_brackets=false
cpp_space_before_empty_square_brackets=false
cpp_space_between_empty_square_brackets=false
cpp_space_group_square_brackets=true
cpp_space_within_lambda_brackets=false
cpp_space_between_empty_lambda_brackets=false
cpp_space_before_comma=false
cpp_space_after_comma=true
cpp_space_remove_around_member_operators=true
cpp_space_before_inheritance_colon=true
cpp_space_before_constructor_colon=true
cpp_space_remove_before_semicolon=true
cpp_space_after_semicolon=false
cpp_space_remove_around_unary_operator=true
cpp_space_around_binary_operator=insert
cpp_space_around_assignment_operator=insert
cpp_space_pointer_reference_alignment=left
cpp_space_around_ternary_operator=insert
cpp_wrap_preserve_blocks=one_liners
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
cmake_minimum_required(VERSION 3.18)
cmake_minimum_required(VERSION 3.15)
project(app LANGUAGES CXX CUDA)

# Set C++ and CUDA standards
Expand Down
31 changes: 31 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,3 +205,34 @@ git push origin student-name

Good luck, and enjoy accelerating matrix multiplication with CUDA!

# Steps to run the code
1. Copy files to Dione
2. Access Dione
3. Run the following commands

```bash
module load cuda
module load GCC
module load cmake
nvcc -arch=sm_70 main.cu -o main -lm
srun -p gpu --mem=10G -t 1:00:00 ./main <test_case>
```

For both naive and tiled matmul a threadsPerBlock of 16 by 16 was used.
For tiled matmul a tile size of 16 was used.
File transfer time was considered for both performance measurements.

# Results
| Test Case | Dimensions ($m \times n \times p$) | Naive CPU (s) | Blocked CPU (s) | Parallel CPU (s) | Naive CUDA (s) | Tiled CUDA (s) | Tiled CUDA Speedup (vs. Naive CUDA) | Tiled CUDA Speedup (vs. Parallel CPU) |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| Case 0 | 64 × 64 × 64 | 0.001092 | 0.001221 | 0.000386 | 0.000104448 | 0.000069568 | 1.50x | 5.55x |
| Case 1 | 128 × 64 × 128 | 0.003803 | 0.004292 | 0.001009 | 0.000164928 | 0.000141472 | 1.17x | 7.13x |
| Case 2 | 100 × 128 × 56 | 0.002663 | 0.002945 | 0.000933999 | 0.000132320 | 0.000114848 | 1.15x | 8.13x |
| Case 3 | 128 × 64 × 128 | 0.00387 | 0.00427 | 0.001542 | 0.000165760 | 0.000124928 | 1.33x | 12.34x |
| Case 4 | 32 × 128 × 32 | 0.000464 | 0.000565 | 0.00024 | 0.000093344 | 0.000080416 | 1.16x | 2.98x |
| Case 5 | 200 × 100 × 256 | 0.018409 | 0.021625 | 0.004873 | 0.000314528 | 0.000287104 | 1.10x | 16.97x |
| Case 6 | 256 × 256 × 256 | 0.059081 | 0.068912 | 0.014825 | 0.000478688 | 0.000484800 | 0.99x | 30.58x |
| Case 7 | 256 × 300 × 256 | 0.070174 | 0.082439 | 0.019428 | 0.000489696 | 0.000509376 | 0.96x | 38.14x |
| Case 8 | 64 × 128 × 64 | 0.001954 | 0.002169 | 0.000576 | 0.000123552 | 0.000092864 | 1.33x | 6.20x |
| Case 9 | 256 × 256 × 257 | 0.059597 | 0.06816 | 0.013344 | 0.000472896 | 0.000455456 | 1.04x | 29.30x |

Loading