Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.2k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
How to stabilize UI tests using XCTest
akkeylab
0
140
技術検証結果の整理と解析をAIに任せよう!
keisukeikeda
0
130
クライアントワークでSREをするということ。あるいは事業会社におけるSREと同じこと・違うこと
nnaka2992
1
360
コーディングルールの鮮度を保ちたい / keep-fresh-go-internal-conventions
handlename
0
230
CSC307 Lecture 15
javiergs
PRO
0
260
AI活用のコスパを最大化する方法
ochtum
0
280
Fundamentals of Software Engineering In the Age of AI
therealdanvega
2
290
AI Assistants for Your Angular Solutions
manfredsteyer
PRO
0
150
守る「だけ」の優しいEMを抜けて、 事業とチームを両方見る視点を身につけた話
maroon8021
3
1.3k
Understanding Apache Lucene - More than just full-text search
spinscale
0
140
OTP を自動で入力する裏技
megabitsenmzq
0
120
Angular-Apps smarter machen mit Gen AI: Lokal und offlinefähig - Hands-on Workshop!
christianliebel
PRO
0
130
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
160
Done Done
chrislema
186
16k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
230
The Cult of Friendly URLs
andyhume
79
6.8k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
860
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.2k
First, design no harm
axbom
PRO
2
1.1k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
300
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ