Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.1k
web audio api (htmlday osaka)
huydx
1
1.1k
Other Decks in Programming
See All in Programming
0から始めるモジュラーモノリス-クリーンなモノリスを目指して
sushi0120
1
280
TROCCO×dbtで実現する人にもAIにもやさしいデータ基盤
nealle
0
240
バイブコーディング × 設計思考
nogu66
0
120
未来を拓くAI技術〜エージェント開発とAI駆動開発〜
leveragestech
2
150
ライブ配信サービスの インフラのジレンマ -マルチクラウドに至ったワケ-
mirrativ
1
230
あのころの iPod を どうにか再生させたい
orumin
2
2.5k
Understanding Kotlin Multiplatform
l2hyunwoo
0
260
What's new in Adaptive Android development
fornewid
0
140
画像コンペでのベースラインモデルの育て方
tattaka
3
1.7k
Constant integer division faster than compiler-generated code
herumi
2
660
STUNMESH-go: Wireguard NAT穿隧工具的源起與介紹
tjjh89017
0
380
Claude Codeで実装以外の開発フロー、どこまで自動化できるか?失敗と成功
ndadayo
2
160
Featured
See All Featured
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
The World Runs on Bad Software
bkeepers
PRO
70
11k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
We Have a Design System, Now What?
morganepeng
53
7.7k
Testing 201, or: Great Expectations
jmmastey
45
7.6k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
460
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.8k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.4k
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Visualization
eitanlees
146
16k
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ