Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.2k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
守る「だけ」の優しいEMを抜けて、 事業とチームを両方見る視点を身につけた話
maroon8021
3
1.3k
野球解説AI Agentを開発してみた - 2026/02/27 LayerX社内LT会資料
shinyorke
PRO
0
360
Fundamentals of Software Engineering In the Age of AI
therealdanvega
2
290
20260228_JAWS_Beginner_Kansai
takuyay0ne
5
610
[SF Ruby Feb'26] The Silicon Heel
palkan
0
120
モダンOBSプラグイン開発
umireon
0
170
テレメトリーシグナルが導くパフォーマンス最適化 / Performance Optimization Driven by Telemetry Signals
seike460
PRO
2
120
PHP 7.4でもOpenTelemetryゼロコード計装がしたい! / PHPerKaigi 2026
arthur1
1
380
技術検証結果の整理と解析をAIに任せよう!
keisukeikeda
0
130
メタプログラミングで実現する「コードを仕様にする」仕組み/nikkei-tech-talk43
nikkei_engineer_recruiting
0
210
脱 雰囲気実装!AgentCoreを良い感じにWEBアプリケーションに組み込むために
takuyay0ne
3
380
The free-lunch guide to idea circularity
hollycummins
0
330
Featured
See All Featured
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.4k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
140
Game over? The fight for quality and originality in the time of robots
wayneb77
1
140
30 Presentation Tips
portentint
PRO
1
260
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
89
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.2k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
690
Navigating Weather and Climate Data
rabernat
0
140
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
120
Un-Boring Meetings
codingconduct
0
230
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
91
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ