Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Matrix Multiplication
Search
Moro
November 14, 2018
Programming
0
8
Matrix Multiplication
Parallel Computing in Shared Memory using OpenMP - Matrix Multiplication problem.
Moro
November 14, 2018
Tweet
Share
More Decks by Moro
See All by Moro
MockK and Truth - Unit Tests - Android
gabrielbmoro
0
150
More Accessible Apps - Android
gabrielbmoro
0
9
Variables and Tips - Android
gabrielbmoro
0
10
Migrating an Existing App to Compose - Android
gabrielbmoro
0
11
Recycler View and Performance - Android
gabrielbmoro
0
12
Repository Pattern and Productivity - Android
gabrielbmoro
0
13
What is new in Android Jetpack?
gabrielbmoro
0
17
List Users - Android
gabrielbmoro
0
5
Working with Collections - Kotlin
gabrielbmoro
0
12
Other Decks in Programming
See All in Programming
都市をデータで見るってこういうこと PLATEAU属性情報入門
nokonoko1203
1
580
なぜ「共通化」を考え、失敗を繰り返すのか
rinchoku
1
570
WebViewの現在地 - SwiftUI時代のWebKit - / The Current State Of WebView
marcy731
0
100
High-Level Programming Languages in AI Era -Human Thought and Mind-
hayat01sh1da
PRO
0
570
今ならAmazon ECSのサービス間通信をどう選ぶか / Selection of ECS Interservice Communication 2025
tkikuc
20
3.7k
Create a website using Spatial Web
akkeylab
0
310
LINEヤフー データグループ紹介
lycorp_recruit_jp
0
1.2k
CursorはMCPを使った方が良いぞ
taigakono
1
190
iOSアプリ開発で 関数型プログラミングを実現する The Composable Architectureの紹介
yimajo
2
220
A2A プロトコルを試してみる
azukiazusa1
2
1.2k
VS Code Update for GitHub Copilot
74th
1
470
NPOでのDevinの活用
codeforeveryone
0
450
Featured
See All Featured
The Invisible Side of Design
smashingmag
300
51k
Stop Working from a Prison Cell
hatefulcrawdad
270
20k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
Optimizing for Happiness
mojombo
379
70k
Visualization
eitanlees
146
16k
Reflections from 52 weeks, 52 projects
jeffersonlam
351
20k
Building Adaptive Systems
keathley
43
2.6k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Into the Great Unknown - MozCon
thekraken
39
1.9k
VelocityConf: Rendering Performance Case Studies
addyosmani
331
24k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
657
60k
Scaling GitHub
holman
459
140k
Transcript
Matrix Multiplication Parallel Computing in Shared Memory using OpenMP Gabriel
Moro - KNOWLEDGE TRANSFER - KT, Porto Alegre - November 2018
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Parallel OpenMP Model A C T1 T2 T3
Parallel OpenMP Model A C T1 T2 T3
Turing - Processor - 4 x Intel Xeon X7550 Nehalem
- 32 physical cores - HyperThreading - Memory - 128GB DDR3 - GPPD-UFRGS
Version: normal_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: normal_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: continuos_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: continuos_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: tiling_seq register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Version: tiling_par register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { #pragma omp parallel for private(i,j,k,tmp) schedule(static) for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Links - Top 500: https://www.top500.org/lists/2018/11/ - Green 500: https://www.top500.org/green500/lists/2018/11/ -
NAS Parallel Benchmark: https://www.nas.nasa.gov/publications/npb.html
Thanks! https://github.com/tido4410/knowledge-transfer-gbmoro.git Gabriel Moro - Matrix Multiplication - OpenMP -
KT, Porto Alegre - November 2018