Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Matrix Multiplication
Search
Moro
November 14, 2018
Programming
0
8
Matrix Multiplication
Parallel Computing in Shared Memory using OpenMP - Matrix Multiplication problem.
Moro
November 14, 2018
Tweet
Share
More Decks by Moro
See All by Moro
MockK and Truth - Unit Tests - Android
gabrielbmoro
0
150
More Accessible Apps - Android
gabrielbmoro
0
9
Variables and Tips - Android
gabrielbmoro
0
10
Migrating an Existing App to Compose - Android
gabrielbmoro
0
12
Recycler View and Performance - Android
gabrielbmoro
0
12
Repository Pattern and Productivity - Android
gabrielbmoro
0
13
What is new in Android Jetpack?
gabrielbmoro
0
18
List Users - Android
gabrielbmoro
0
5
Working with Collections - Kotlin
gabrielbmoro
0
12
Other Decks in Programming
See All in Programming
Context is King? 〜Verifiability時代とコンテキスト設計 / Beyond "Context is King"
rkaga
10
1.6k
Honoを使ったリモートMCPサーバでAIツールとの連携を加速させる!
tosuri13
1
140
Fragmented Architectures
denyspoltorak
0
110
クラウドに依存しないS3を使った開発術
simesaba80
0
230
20251212 AI 時代的 Legacy Code 營救術 2025 WebConf
mouson
0
250
ZJIT: The Ruby 4 JIT Compiler / Ruby Release 30th Anniversary Party
k0kubun
1
380
2年のAppleウォレットパス開発の振り返り
muno92
PRO
0
180
ELYZA_Findy AI Engineering Summit登壇資料_AIコーディング時代に「ちゃんと」やること_toB LLMプロダクト開発舞台裏_20251216
elyza
2
1.1k
今こそ知るべき耐量子計算機暗号(PQC)入門 / PQC: What You Need to Know Now
mackey0225
3
320
CSC307 Lecture 04
javiergs
PRO
0
640
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
340
CSC307 Lecture 01
javiergs
PRO
0
670
Featured
See All Featured
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
65
36k
A Soul's Torment
seathinner
5
2.1k
Six Lessons from altMBA
skipperchong
29
4.1k
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
82
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
0
1.1k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
69
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
290
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.3k
The Curse of the Amulet
leimatthew05
0
7.5k
Statistics for Hackers
jakevdp
799
230k
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
560
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
115
100k
Transcript
Matrix Multiplication Parallel Computing in Shared Memory using OpenMP Gabriel
Moro - KNOWLEDGE TRANSFER - KT, Porto Alegre - November 2018
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Parallel OpenMP Model A C T1 T2 T3
Parallel OpenMP Model A C T1 T2 T3
Turing - Processor - 4 x Intel Xeon X7550 Nehalem
- 32 physical cores - HyperThreading - Memory - 128GB DDR3 - GPPD-UFRGS
Version: normal_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: normal_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: continuos_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: continuos_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: tiling_seq register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Version: tiling_par register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { #pragma omp parallel for private(i,j,k,tmp) schedule(static) for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Links - Top 500: https://www.top500.org/lists/2018/11/ - Green 500: https://www.top500.org/green500/lists/2018/11/ -
NAS Parallel Benchmark: https://www.nas.nasa.gov/publications/npb.html
Thanks! https://github.com/tido4410/knowledge-transfer-gbmoro.git Gabriel Moro - Matrix Multiplication - OpenMP -
KT, Porto Alegre - November 2018