Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
huydx
March 18, 2016
Programming
0
980
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2k
web audio api (htmlday osaka)
huydx
1
1.1k
Other Decks in Programming
See All in Programming
これでLambdaが不要に?!Step FunctionsのJSONata対応について
iwatatomoya
2
3.7k
20年もののレガシープロダクトに 0からPHPStanを入れるまで / phpcon2024
hirobe1999
0
500
useSyncExternalStoreを使いまくる
ssssota
6
1.1k
From Translations to Multi Dimension Entities
alexanderschranz
2
130
rails statsで大解剖 🔍 “B/43流” のRailsの育て方を歴史とともに振り返ります
shoheimitani
2
940
快速入門可觀測性
blueswen
0
370
アクターシステムに頼らずEvent Sourcingする方法について
j5ik2o
4
290
「とりあえず動く」コードはよい、「読みやすい」コードはもっとよい / Code that 'just works' is good, but code that is 'readable' is even better.
mkmk884
3
470
Semantic Kernelのネイティブプラグインで知識拡張をしてみる
tomokusaba
0
180
Scalaから始めるOpenFeature入門 / Scalaわいわい勉強会 #4
arthur1
1
340
今年のアップデートで振り返るCDKセキュリティのシフトレフト/2024-cdk-security-shift-left
tomoki10
0
210
php-conference-japan-2024
tasuku43
0
320
Featured
See All Featured
[RailsConf 2023] Rails as a piece of cake
palkan
53
5k
Faster Mobile Websites
deanohume
305
30k
Adopting Sorbet at Scale
ufuk
73
9.1k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Why Our Code Smells
bkeepers
PRO
335
57k
It's Worth the Effort
3n
183
28k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5.1k
Navigating Team Friction
lara
183
15k
How to Ace a Technical Interview
jacobian
276
23k
Thoughts on Productivity
jonyablonski
67
4.4k
Fireside Chat
paigeccino
34
3.1k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
32
2.7k
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ