Skip to content

meteharun: Implemented optimized matrix multiplication#3

Open
meteharun wants to merge 1 commit into
parallelcomputingabo:mainfrom
meteharun:meteharun
Open

meteharun: Implemented optimized matrix multiplication#3
meteharun wants to merge 1 commit into
parallelcomputingabo:mainfrom
meteharun:meteharun

Conversation

@meteharun
Copy link
Copy Markdown

Summary

Implemented and benchmarked three matrix multiplication methods:

  • Naive
  • Blocked (cache-optimized, block_size = 32)
  • Parallel (using OpenMP)

Optimizations

  • Used tiling in blocked version to improve cache locality
  • Used #pragma omp parallel for to parallelize row-wise computation in the parallel version
  • Rounded float values to 2 decimals to ensure correct validation

Performance Results

Included detailed benchmark table in README.md.

Challenges

  • For smaller matrix sizes, blocked matmul had overhead and performed worse
  • Had to adjust file paths due to CLion running from the cmake-build-debug directory
  • Minor issues with OpenMP scoping resolved with default(none) and shared(...)

All tests pass. Speedup for large matrices especially in parallel option is significant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant