Skip to content

Moh'd Abu Quttain: Implemented optimized matrix multiplication#30

Open
MohdFawaz wants to merge 5 commits into
parallelcomputingabo:mainfrom
MohdFawaz:Moh'd-Abu-Quttain
Open

Moh'd Abu Quttain: Implemented optimized matrix multiplication#30
MohdFawaz wants to merge 5 commits into
parallelcomputingabo:mainfrom
MohdFawaz:Moh'd-Abu-Quttain

Conversation

@MohdFawaz
Copy link
Copy Markdown

For more explanation, kindly refer to the first comment section in main.cpp, and also in the readme file under tables.

I implemented three versions of the multiply:

-Naive triple‐loop for a correctness baseline.

-Cache‐blocked (B=16) to improve data locality by working on 16×16 tiles; this gave a modest ~1.05–1.10× speedup on large matrices.

-OpenMP‐parallel naive loops with #pragma omp parallel for collapse(2) and OMP_NUM_THREADS=8, yielding ~4.7–5.2× speedup on my M1 Air.

Challenges I faced included:

OpenMP support on macOS: AppleClang doesn’t ship with OpenMP, so I had to switch to Homebrew’s GCC/G++.

Block‐size tuning: too large or small blocks can regress performance; I found B=16 best for example on the triple loop 256³ in case 6 on my cache hierarchy.

Limited blocking gain: aggressive hardware prefetch and large caches on M1 reduce the benefit of tiling for medium‐sized problems.

When I scaled up to 8 threads on my M1 Air, I ran into a few limits:

Heterogeneous cores: the M1 has 4 “performance” and 4 “efficiency” cores. With OMP_NUM_THREADS=8, half my threads land on slower Icestorm cores, so I only saw ~4–5× instead of 8×.

OpenMP overhead: spawning/joining 8 threads and managing schedules adds non-trivial overhead, especially on small tiles.

To go around this, I used collapse(2) with schedule(static) to balance work evenly, and pinned the thread count to exactly 8 to avoid oversubscription and running overhead.

@MohdFawaz MohdFawaz changed the title Moh'd Abu Quttain Moh'd Abu Quttain: Implemented optimized matrix multiplication May 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant