Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Matrix Multiplication
Search
Moro
November 14, 2018
Programming
0
9
Matrix Multiplication
Parallel Computing in Shared Memory using OpenMP - Matrix Multiplication problem.
Moro
November 14, 2018
Tweet
Share
More Decks by Moro
See All by Moro
MockK and Truth - Unit Tests - Android
gabrielbmoro
0
150
More Accessible Apps - Android
gabrielbmoro
0
9
Variables and Tips - Android
gabrielbmoro
0
10
Migrating an Existing App to Compose - Android
gabrielbmoro
0
12
Recycler View and Performance - Android
gabrielbmoro
0
12
Repository Pattern and Productivity - Android
gabrielbmoro
0
13
What is new in Android Jetpack?
gabrielbmoro
0
18
List Users - Android
gabrielbmoro
0
5
Working with Collections - Kotlin
gabrielbmoro
0
15
Other Decks in Programming
See All in Programming
Laravel Nightwatchの裏側 - Laravel公式Observabilityツールを支える設計と実装
avosalmon
1
250
メッセージングを利用して時間的結合を分離しよう #phperkaigi
kajitack
3
340
見せてもらおうか、 OpenSearchの性能とやらを!
shunta27
1
140
Vuetify 3 → 4 何が変わった?差分と移行ポイント10分まとめ
koukimiura
0
190
どんと来い、データベース信頼性エンジニアリング / Introduction to DBRE
nnaka2992
1
330
Windows on Ryzen and I
seosoft
0
400
Xdebug と IDE による デバッグ実行の仕組みを見る / Exploring-How-Debugging-Works-with-Xdebug-and-an-IDE
shin1x1
0
150
夢の無限スパゲッティ製造機 -実装篇- #phpstudy
o0h
PRO
0
160
生成 AI 時代のスナップショットテストってやつを見せてあげますよ(α版)
ojun9
0
310
PHPのバージョンアップ時にも役立ったAST(2026年版)
matsuo_atsushi
0
250
Nuxt Server Components
wattanx
0
130
20260228_JAWS_Beginner_Kansai
takuyay0ne
5
620
Featured
See All Featured
What does AI have to do with Human Rights?
axbom
PRO
1
2.1k
Optimizing for Happiness
mojombo
378
71k
How GitHub (no longer) Works
holman
316
150k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
440
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
110k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
89
Between Models and Reality
mayunak
2
240
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Technical Leadership for Architectural Decision Making
baasie
3
300
First, design no harm
axbom
PRO
2
1.1k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
330
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Transcript
Matrix Multiplication Parallel Computing in Shared Memory using OpenMP Gabriel
Moro - KNOWLEDGE TRANSFER - KT, Porto Alegre - November 2018
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Matrix Multiplication A B C
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Ways to improve the performance to this algorithm - Algorithm
complexity - Parallelism - Shared Memory - Distributed Memory
Parallel OpenMP Model A C T1 T2 T3
Parallel OpenMP Model A C T1 T2 T3
Turing - Processor - 4 x Intel Xeon X7550 Nehalem
- 32 physical cores - HyperThreading - Memory - 128GB DDR3 - GPPD-UFRGS
Version: normal_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: normal_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i][k] * B[k][j]; C[i][j] = tmp; } }
Version: continuos_seq for(i=0;i < size; i++) { for(j=0;j < size;
j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: continuos_par #pragma omp parallel for private(i,j,k,tmp) for(i=0;i < size;
i++) { for(j=0;j < size; j++) { tmp=0; for(k=0; k < size; k++) tmp = tmp + A[i * size + k] * B[k * size + j]; C[i * size + j] = tmp; } }
Version: tiling_seq register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Version: tiling_par register int jj,kk,i,j,k; double tmp=0; for(jj=0;jj < size;
jj=jj+block) { for(kk=0; kk < size; kk=kk+block) { #pragma omp parallel for private(i,j,k,tmp) schedule(static) for(i=0; i < size; i++) { for(j=jj; j < min(jj+block, size); j++) { tmp=0; for(k=kk; k < min(kk+block,size); k++) { tmp = tmp + A[i][k] * B[k][j]; } R[i][j] = tmp; } } } }
Links - Top 500: https://www.top500.org/lists/2018/11/ - Green 500: https://www.top500.org/green500/lists/2018/11/ -
NAS Parallel Benchmark: https://www.nas.nasa.gov/publications/npb.html
Thanks! https://github.com/tido4410/knowledge-transfer-gbmoro.git Gabriel Moro - Matrix Multiplication - OpenMP -
KT, Porto Alegre - November 2018