Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Matrix Multiplication
Search
Moro
November 14, 2018
Programming
10
0
Share
Matrix Multiplication
Parallel Computing in Shared Memory using OpenMP - Matrix Multiplication problem.
Moro
November 14, 2018
More Decks by Moro
See All by Moro
MockK and Truth - Unit Tests - Android
gabrielbmoro
0
150
More Accessible Apps - Android
gabrielbmoro
0
10
Variables and Tips - Android
gabrielbmoro
0
10
Migrating an Existing App to Compose - Android
gabrielbmoro
0
13
Recycler View and Performance - Android
gabrielbmoro
0
14
Repository Pattern and Productivity - Android
gabrielbmoro
0
14
What is new in Android Jetpack?
gabrielbmoro
0
19
List Users - Android
gabrielbmoro
0
5
Working with Collections - Kotlin
gabrielbmoro
0
16
Other Decks in Programming
See All in Programming
なぜあなたのコードには「コシ」がないのか?〜AI時代に問う、最後まで美味しい設計と戦略〜 #phpconkagawa / phpconkagawa2026
shogogg
0
200
Agentic Elixir
whatyouhide
0
450
cloudnative conference 2026 flyle
azihsoyn
0
180
過去のレビュー知見をSkillsで資産化した話
pkshadeck
PRO
1
1.9k
AWSはOSSをどのように 考えているのか?
akihisaikeda
0
120
20260514 - build with ai 2026 - build LINE Bot with Gemini CLI
line_developers_tw
PRO
0
440
Firefoxにコントリビューションして得られた学び
ken7253
2
160
Making the RBS Parser Faster
soutaro
0
710
サーバーレスで作る、動画データ管理基盤
oyasumipants
0
120
要はバランスからの卒業 #yumemi_grow
kajitack
0
160
〜バイブコーディングを超えて〜 チームで実験し続けたAI駆動開発
tigertora7571
0
200
From Formal Specification to Property Based Test
ohbarye
0
2.5k
Featured
See All Featured
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
280
Designing for humans not robots
tammielis
254
26k
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
2k
What does AI have to do with Human Rights?
axbom
PRO
1
2.1k
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
790
Skip the Path - Find Your Career Trail
mkilby
1
120
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.8k
How to build a perfect <img>
jonoalderson
1
5.5k
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.5k
Git: the NoSQL Database
bkeepers
PRO
432
67k
Documentation Writing (for coders)
carmenintech
77
5.3k
Transcript
Matrix Multiplication Parallel Computing in Shared Memory using OpenMP Gabriel
Moro - KNOWLEDGE TRANSFER - KT, Porto Alegre - November 2018
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Parallel OpenMP Model A C T1 T2 T3
Parallel OpenMP Model A C T1 T2 T3
Turing - Processor - 4 x Intel Xeon X7550 Nehalem
- 32 physical cores - HyperThreading - Memory - 128GB DDR3 - GPPD-UFRGS
Version: normal_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: normal_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: continuos_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: continuos_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: tiling_seq register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Version: tiling_par register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { #pragma omp parallel for private(i,j,k,tmp) schedule(static) for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Links - Top 500: https://www.top500.org/lists/2018/11/ - Green 500: https://www.top500.org/green500/lists/2018/11/ -
NAS Parallel Benchmark: https://www.nas.nasa.gov/publications/npb.html
Thanks! https://github.com/tido4410/knowledge-transfer-gbmoro.git Gabriel Moro - Matrix Multiplication - OpenMP -
KT, Porto Alegre - November 2018