Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Matrix Multiplication
Search
Moro
November 14, 2018
Programming
0
6
Matrix Multiplication
Parallel Computing in Shared Memory using OpenMP - Matrix Multiplication problem.
Moro
November 14, 2018
Tweet
Share
More Decks by Moro
See All by Moro
MockK and Truth - Unit Tests - Android
gabrielbmoro
0
130
More Accessible Apps - Android
gabrielbmoro
0
7
Variables and Tips - Android
gabrielbmoro
0
9
Migrating an Existing App to Compose - Android
gabrielbmoro
0
8
Recycler View and Performance - Android
gabrielbmoro
0
10
Repository Pattern and Productivity - Android
gabrielbmoro
0
10
What is new in Android Jetpack?
gabrielbmoro
0
14
List Users - Android
gabrielbmoro
0
4
Working with Collections - Kotlin
gabrielbmoro
0
10
Other Decks in Programming
See All in Programming
Wallet API, Verifier APIで実現するIDカード on iPhoneの世界
shitamori1272
1
310
開発を加速する共有Swift Package実践
elmetal
PRO
0
280
『ドメイン駆動設計をはじめよう』中核の業務領域
masuda220
PRO
5
830
私の考える初学者がBlazorできるまでの学習方法
tomokusaba
1
240
Scala におけるコンパイラエラーとの付き合い方
chencmd
0
100
eBPF Japan Meetup のご紹介
kentatada
0
140
Modular Monolith Go Server with GraphQL Federation + gRPC
110y
1
540
iOSDC 2024
auramagi
3
590
ドメイン駆動設計を実践するために必要なもの
bikisuke
3
280
暴走のウホーレン 〜想いってのはvimrcにしないと伝わらないんだぜ〜 / iosdc_japan_2024
uhooi
1
240
ESLint Rule により事業, 技術ドメインに沿った制約と誓約を敷衍させるアプローチのすゝめ
shinyaigeek
1
2.7k
Swiftコードバトル必勝法
toshi0383
0
140
Featured
See All Featured
Ruby is Unlike a Banana
tanoku
96
10k
Done Done
chrislema
180
16k
KATA
mclloyd
27
13k
Facilitating Awesome Meetings
lara
49
5.9k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
354
29k
It's Worth the Effort
3n
182
27k
Product Roadmaps are Hard
iamctodd
PRO
48
10k
Statistics for Hackers
jakevdp
793
220k
Unsuck your backbone
ammeep
667
57k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
109
6.9k
BBQ
matthewcrist
83
9.1k
Thoughts on Productivity
jonyablonski
65
4.2k
Transcript
Matrix Multiplication Parallel Computing in Shared Memory using OpenMP Gabriel
Moro - KNOWLEDGE TRANSFER - KT, Porto Alegre - November 2018
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Parallel OpenMP Model A C T1 T2 T3
Parallel OpenMP Model A C T1 T2 T3
Turing - Processor - 4 x Intel Xeon X7550 Nehalem
- 32 physical cores - HyperThreading - Memory - 128GB DDR3 - GPPD-UFRGS
Version: normal_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: normal_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: continuos_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: continuos_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: tiling_seq register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Version: tiling_par register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { #pragma omp parallel for private(i,j,k,tmp) schedule(static) for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Links - Top 500: https://www.top500.org/lists/2018/11/ - Green 500: https://www.top500.org/green500/lists/2018/11/ -
NAS Parallel Benchmark: https://www.nas.nasa.gov/publications/npb.html
Thanks! https://github.com/tido4410/knowledge-transfer-gbmoro.git Gabriel Moro - Matrix Multiplication - OpenMP -
KT, Porto Alegre - November 2018