Moh'd Abu Quttain: Implemented optimized matrix multiplication by MohdFawaz · Pull Request #30 · parallelcomputingabo/Homework-2

MohdFawaz · 2025-05-10T20:33:21Z

For more explanation, kindly refer to the first comment section in main.cpp, and also in the readme file under tables.

I implemented three versions of the multiply:

-Naive triple‐loop for a correctness baseline.

-Cache‐blocked (B=16) to improve data locality by working on 16×16 tiles; this gave a modest ~1.05–1.10× speedup on large matrices.

-OpenMP‐parallel naive loops with #pragma omp parallel for collapse(2) and OMP_NUM_THREADS=8, yielding ~4.7–5.2× speedup on my M1 Air.

Challenges I faced included:

OpenMP support on macOS: AppleClang doesn’t ship with OpenMP, so I had to switch to Homebrew’s GCC/G++.

Block‐size tuning: too large or small blocks can regress performance; I found B=16 best for example on the triple loop 256³ in case 6 on my cache hierarchy.

Limited blocking gain: aggressive hardware prefetch and large caches on M1 reduce the benefit of tiling for medium‐sized problems.

When I scaled up to 8 threads on my M1 Air, I ran into a few limits:

Heterogeneous cores: the M1 has 4 “performance” and 4 “efficiency” cores. With OMP_NUM_THREADS=8, half my threads land on slower Icestorm cores, so I only saw ~4–5× instead of 8×.

OpenMP overhead: spawning/joining 8 threads and managing schedules adds non-trivial overhead, especially on small tiles.

To go around this, I used collapse(2) with schedule(static) to balance work evenly, and pinned the thread count to exactly 8 to avoid oversubscription and running overhead.

… batch of solution files)

…d batch of solution files)

MohdFawaz added 5 commits May 10, 2025 23:17

Moh'd Abu Quttain: Implemented optimized matrix multiplication (first…

92ecc12

… batch of solution files)

Moh'd Abu Quttain: Implemented optimized matrix multiplication (secon…

8690292

…d batch of solution files)

Update README.md

b952020

Update README.md

6c72437

Update README.md

172e8c2

MohdFawaz changed the title ~~Moh'd Abu Quttain~~ Moh'd Abu Quttain: Implemented optimized matrix multiplication May 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moh'd Abu Quttain: Implemented optimized matrix multiplication#30

Moh'd Abu Quttain: Implemented optimized matrix multiplication#30
MohdFawaz wants to merge 5 commits into
parallelcomputingabo:mainfrom
MohdFawaz:Moh'd-Abu-Quttain

MohdFawaz commented May 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MohdFawaz commented May 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant