Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Matrix Multiplication
Search
Moro
November 14, 2018
Programming
0
9
Matrix Multiplication
Parallel Computing in Shared Memory using OpenMP - Matrix Multiplication problem.
Moro
November 14, 2018
Tweet
Share
More Decks by Moro
See All by Moro
MockK and Truth - Unit Tests - Android
gabrielbmoro
0
150
More Accessible Apps - Android
gabrielbmoro
0
9
Variables and Tips - Android
gabrielbmoro
0
10
Migrating an Existing App to Compose - Android
gabrielbmoro
0
12
Recycler View and Performance - Android
gabrielbmoro
0
12
Repository Pattern and Productivity - Android
gabrielbmoro
0
13
What is new in Android Jetpack?
gabrielbmoro
0
18
List Users - Android
gabrielbmoro
0
5
Working with Collections - Kotlin
gabrielbmoro
0
15
Other Decks in Programming
See All in Programming
AIに任せる範囲を安全に広げるためにやっていること
fukucheee
0
130
ベクトル検索のフィルタを用いた機械学習モデルとの統合 / python-meetup-fukuoka-06-vector-attr
monochromegane
2
390
Swift ConcurrencyでよりSwiftyに
yuukiw00w
0
260
どんと来い、データベース信頼性エンジニアリング / Introduction to DBRE
nnaka2992
1
280
Fundamentals of Software Engineering In the Age of AI
therealdanvega
1
250
LangChain4jとは一味違うLangChain4j-CDI
kazumura
1
180
AI Assistants for Your Angular Solutions
manfredsteyer
PRO
0
130
Agentic AI: Evolution oder Revolution
mobilelarson
PRO
0
160
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
240
SourceGeneratorのマーカー属性問題について
htkym
0
190
モックわからないマン卒業記 ~振る舞いを起点に見直した、フロントエンドテストにおけるモックの使いどころ~
tasukuwatanabe
2
110
DSPy入門 Pythonで実現する自動プロンプト最適化 〜人手によるプロンプト調整からの卒業〜
seaturt1e
1
690
Featured
See All Featured
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
630
AI: The stuff that nobody shows you
jnunemaker
PRO
3
380
Utilizing Notion as your number one productivity tool
mfonobong
4
250
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
230
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
630
GraphQLとの向き合い方2022年版
quramy
50
14k
Optimising Largest Contentful Paint
csswizardry
37
3.6k
Music & Morning Musume
bryan
47
7.1k
A designer walks into a library…
pauljervisheath
210
24k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
130
What's in a price? How to price your products and services
michaelherold
247
13k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.4k
Transcript
Matrix Multiplication Parallel Computing in Shared Memory using OpenMP Gabriel
Moro - KNOWLEDGE TRANSFER - KT, Porto Alegre - November 2018
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Parallel OpenMP Model A C T1 T2 T3
Parallel OpenMP Model A C T1 T2 T3
Turing - Processor - 4 x Intel Xeon X7550 Nehalem
- 32 physical cores - HyperThreading - Memory - 128GB DDR3 - GPPD-UFRGS
Version: normal_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: normal_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: continuos_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: continuos_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: tiling_seq register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Version: tiling_par register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { #pragma omp parallel for private(i,j,k,tmp) schedule(static) for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Links - Top 500: https://www.top500.org/lists/2018/11/ - Green 500: https://www.top500.org/green500/lists/2018/11/ -
NAS Parallel Benchmark: https://www.nas.nasa.gov/publications/npb.html
Thanks! https://github.com/tido4410/knowledge-transfer-gbmoro.git Gabriel Moro - Matrix Multiplication - OpenMP -
KT, Porto Alegre - November 2018