Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.2k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
AIに仕事を丸投げしたら、本当に楽になれるのか
dip_tech
PRO
0
180
encoding/json/v2のUnmarshalはこう変わった:内部実装で見る設計改善
kurakura0916
0
290
Python’s True Superpower
hynek
0
200
モジュラモノリスにおける境界をGoのinternalパッケージで守る
magavel
0
3.4k
Head of Engineeringが現場で回した生産性向上施策 2025→2026
gessy0129
0
210
atmaCup #23でAIコーディングを活用した話
ml_bear
4
730
株式会社 Sun terras カンパニーデック
sunterras
0
2k
Go 1.26でのsliceのメモリアロケーション最適化 / Go 1.26 リリースパーティ #go126party
mazrean
1
330
「ブロックテーマでは再現できない」は本当か?
inc2734
0
1.1k
ご飯食べながらエージェントが開発できる。そう、Agentic Engineeringならね。
yokomachi
1
280
CopilotKit + AG-UIを学ぶ
nearme_tech
PRO
1
120
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
190
Featured
See All Featured
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.4k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
380
Being A Developer After 40
akosma
91
590k
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
140
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
0
150
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.1k
Mobile First: as difficult as doing things right
swwweet
225
10k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
Speed Design
sergeychernyshev
33
1.6k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
810
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4k
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ