Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
huydx
March 18, 2016
Programming
1k
0
Share
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.2k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
〜バイブコーディングを超えて〜 チームで実験し続けたAI駆動開発
tigertora7571
0
170
From Formal Specification to Property Based Test
ohbarye
0
470
Surviving Black Friday: 329 billion requests with Falcon!
ioquatix
0
1.9k
Road to RubyKaigi: Play Hard(ware)
makicamel
1
500
Agentic Elixir
whatyouhide
0
420
第3木曜LT会 #28
tinykitten
PRO
0
120
iOS機能開発のAI環境と起きた変化
ryunakayama
0
190
PHP で mp3 プレイヤーを実装しよう
m3m0r7
PRO
0
290
Spec Driven Development | AI Summit Vilnius
danielsogl
PRO
1
120
Spec-driven Development: How AI Changes Everything (And Nothing)
simas
PRO
0
410
PHPer、Cloudflare に引っ越す
suguruooki
1
120
tRPCの概要と少しだけパフォーマンス
misoton665
2
240
Featured
See All Featured
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
How STYLIGHT went responsive
nonsquared
100
6.1k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
780
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
1.2k
Navigating Team Friction
lara
192
16k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
120
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
1
340
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
140
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
Unsuck your backbone
ammeep
672
58k
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ