Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Matrix Multiplication
Search
Moro
November 14, 2018
Programming
0
8
Matrix Multiplication
Parallel Computing in Shared Memory using OpenMP - Matrix Multiplication problem.
Moro
November 14, 2018
Tweet
Share
More Decks by Moro
See All by Moro
MockK and Truth - Unit Tests - Android
gabrielbmoro
0
150
More Accessible Apps - Android
gabrielbmoro
0
9
Variables and Tips - Android
gabrielbmoro
0
10
Migrating an Existing App to Compose - Android
gabrielbmoro
0
12
Recycler View and Performance - Android
gabrielbmoro
0
12
Repository Pattern and Productivity - Android
gabrielbmoro
0
13
What is new in Android Jetpack?
gabrielbmoro
0
18
List Users - Android
gabrielbmoro
0
5
Working with Collections - Kotlin
gabrielbmoro
0
12
Other Decks in Programming
See All in Programming
MCPでVibe Working。そして、結局はContext Eng(略)/ Working with Vibe on MCP And Context Eng
rkaga
5
2.2k
Vue・React マルチプロダクト開発を支える Vite
andpad
0
110
RDoc meets YARD
okuramasafumi
4
170
AI Coding Agentのセキュリティリスク:PRの自己承認とメルカリの対策
s3h
0
200
Improving my own Ruby thereafter
sisshiki1969
1
160
もうちょっといいRubyプロファイラを作りたい (2025)
osyoyu
1
420
Compose Multiplatform × AI で作る、次世代アプリ開発支援ツールの設計と実装
thagikura
0
140
Zendeskのチケットを Amazon Bedrockで 解析した
ryokosuge
3
290
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
340
go test -json そして testing.T.Attr / Kyoto.go #63
utgwkk
3
290
rage against annotate_predecessor
junk0612
0
160
Design Foundational Data Engineering Observability
sucitw
3
190
Featured
See All Featured
Git: the NoSQL Database
bkeepers
PRO
431
66k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Imperfection Machines: The Place of Print at Facebook
scottboms
268
13k
Fireside Chat
paigeccino
39
3.6k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.9k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Statistics for Hackers
jakevdp
799
220k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
13k
A better future with KSS
kneath
239
17k
Art, The Web, and Tiny UX
lynnandtonic
302
21k
Being A Developer After 40
akosma
90
590k
Transcript
Matrix Multiplication Parallel Computing in Shared Memory using OpenMP Gabriel
Moro - KNOWLEDGE TRANSFER - KT, Porto Alegre - November 2018
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Parallel OpenMP Model A C T1 T2 T3
Parallel OpenMP Model A C T1 T2 T3
Turing - Processor - 4 x Intel Xeon X7550 Nehalem
- 32 physical cores - HyperThreading - Memory - 128GB DDR3 - GPPD-UFRGS
Version: normal_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: normal_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: continuos_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: continuos_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: tiling_seq register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Version: tiling_par register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { #pragma omp parallel for private(i,j,k,tmp) schedule(static) for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Links - Top 500: https://www.top500.org/lists/2018/11/ - Green 500: https://www.top500.org/green500/lists/2018/11/ -
NAS Parallel Benchmark: https://www.nas.nasa.gov/publications/npb.html
Thanks! https://github.com/tido4410/knowledge-transfer-gbmoro.git Gabriel Moro - Matrix Multiplication - OpenMP -
KT, Porto Alegre - November 2018