Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
huydx
March 18, 2016
Programming
0
940
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2k
web audio api (htmlday osaka)
huydx
1
1k
Other Decks in Programming
See All in Programming
Let's learn code review
riofujimon
2
340
Anthropic Cookbook のおすすめレシピ
schroneko
7
970
PostmanでAPIの動作確認が楽になった話
h455h1
0
170
AWS CDKコントリビュートTIPS / aws-cdk-contribution-tips
gotok365
2
190
try!Swift Tokyo 2024 参加報告 LT
akidon0000
1
220
DMMプラットフォームがTiDB Cloudを採用した背景
pospome
8
4.1k
#phpcon_odawara オープン・クローズドなテストフィクスチャを求めて / open closed test fixtures
77web
3
230
コーンフレークから始める モデリング会話入門
ogurotakayuki
0
370
PHPはいつから死んでいるかの調査
chiroruxx
1
400
Fragment Composition of GraphQL
quramy
7
1k
Build Apps for iOS, Android & Desktop in 100% Kotlin With Compose Multiplatform (mDevCamp 2024)
zsmb
0
340
TYPO3 v13 – The road to LTS: What's new and new APIs
luisasofie_xoxo
0
200
Featured
See All Featured
GitHub's CSS Performance
jonrohan
1025
450k
The Power of CSS Pseudo Elements
geoffreycrofte
60
5k
The Brand Is Dead. Long Live the Brand.
mthomps
49
29k
Rails Girls Zürich Keynote
gr2m
91
13k
How STYLIGHT went responsive
nonsquared
92
4.8k
Fireside Chat
paigeccino
21
2.6k
Automating Front-end Workflow
addyosmani
1356
200k
Rebuilding a faster, lazier Slack
samanthasiow
73
8.2k
The World Runs on Bad Software
bkeepers
PRO
61
6.7k
Learning to Love Humans: Emotional Interface Design
aarron
267
39k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
9
8.3k
What's in a price? How to price your products and services
michaelherold
237
11k
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ