Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.1k
web audio api (htmlday osaka)
huydx
1
1.1k
Other Decks in Programming
See All in Programming
Navigation 2 を 3 に移行する(予定)ためにやったこと
yokomii
0
330
Improving my own Ruby thereafter
sisshiki1969
1
160
Kiroで始めるAI-DLC
kaonash
2
610
Amazon RDS 向けに提供されている MCP Server と仕組みを調べてみた/jawsug-okayama-2025-aurora-mcp
takahashiikki
1
110
さようなら Date。 ようこそTemporal! 3年間先行利用して得られた知見の共有
8beeeaaat
3
1.5k
旅行プランAIエージェント開発の裏側
ippo012
2
920
Navigating Dependency Injection with Metro
zacsweers
3
2.5k
アルテニア コンサル/ITエンジニア向け 採用ピッチ資料
altenir
0
110
今から始めるClaude Code入門〜AIコーディングエージェントの歴史と導入〜
nokomoro3
0
210
CloudflareのChat Agent Starter Kitで簡単!AIチャットボット構築
syumai
2
510
The Past, Present, and Future of Enterprise Java with ASF in the Middle
ivargrimstad
0
160
testingを眺める
matumoto
1
140
Featured
See All Featured
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
Automating Front-end Workflow
addyosmani
1370
200k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
The World Runs on Bad Software
bkeepers
PRO
70
11k
Done Done
chrislema
185
16k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.8k
Being A Developer After 40
akosma
90
590k
The Power of CSS Pseudo Elements
geoffreycrofte
77
6k
4 Signs Your Business is Dying
shpigford
184
22k
KATA
mclloyd
32
14k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.1k
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ