Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.2k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
PHPで TLSのプロトコルを実装してみる
higaki_program
0
400
CSC307 Lecture 14
javiergs
PRO
0
480
What Spring Developers Should Know About Jakarta EE
ivargrimstad
0
560
Codexに役割を持たせる 他のAIエージェントと組み合わせる実務Tips
o8n
4
1.4k
maplibre-gl-layers - 地図に移動体たくさん表示したい
kekyo
PRO
0
410
Windows on Ryzen and I
seosoft
0
350
20260313 - Grafana & Friends Taipei #1 - Kubernetes v1.36 的開發雜記:那些困在 Alpha 加護病房太久的 Metrics
tico88612
0
230
Codex CLIのSubagentsによる並列API実装 / Parallel API Implementation with Codex CLI Subagents
takatty
2
240
Claude Codeセッション現状確認 2026福岡 / fukuoka-aicoding-00-beacon
monochromegane
4
450
ふつうのRubyist、ちいさなデバイス、大きな一年 / Ordinary Rubyists, Tiny Devices, Big Year
chobishiba
1
500
ふつうの Rubyist、ちいさなデバイス、大きな一年
bash0c7
0
1.1k
今からFlash開発できるわけないじゃん、ムリムリ! (※ムリじゃなかった!?)
arkw
0
140
Featured
See All Featured
Un-Boring Meetings
codingconduct
0
230
エンジニアに許された特別な時間の終わり
watany
106
240k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.2k
Git: the NoSQL Database
bkeepers
PRO
432
67k
A designer walks into a library…
pauljervisheath
210
24k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9k
How to Think Like a Performance Engineer
csswizardry
28
2.5k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
73k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
Documentation Writing (for coders)
carmenintech
77
5.3k
Writing Fast Ruby
sferik
630
63k
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ