Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.1k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
10年もののAPIサーバーにおけるCI/CDの改善の奮闘
mbook
0
830
Things You Thought You Didn’t Need To Care About That Have a Big Impact On Your Job
hollycummins
0
230
Web フロントエンドエンジニアに開かれる AI Agent プロダクト開発 - Vercel AI SDK を観察して AI Agent と仲良くなろう! #FEC余熱NIGHT
izumin5210
3
530
Devoxx BE - Local Development in the AI Era
kdubois
0
130
そのpreloadは必要?見過ごされたpreloadが技術的負債として爆発した日
mugitti9
2
3.4k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
130
Android16 Migration Stories ~Building a Pattern for Android OS upgrades~
reoandroider
0
110
開発生産性を上げるための生成AI活用術
starfish719
3
1.1k
overlayPreferenceValue で実現する ピュア SwiftUI な AdMob ネイティブ広告
uhucream
0
180
NixOS + Kubernetesで構築する自宅サーバーのすべて
ichi_h3
0
860
アメ車でサンノゼを走ってきたよ!
s_shimotori
0
220
(Extension DC 2025) Actor境界を越える技術
teamhimeh
1
250
Featured
See All Featured
Leading Effective Engineering Teams in the AI Era
addyosmani
6
430
Testing 201, or: Great Expectations
jmmastey
45
7.7k
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
657
61k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
The Language of Interfaces
destraynor
162
25k
Building a Modern Day E-commerce SEO Strategy
aleyda
44
7.8k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Done Done
chrislema
185
16k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
189
55k
How to Ace a Technical Interview
jacobian
280
24k
Writing Fast Ruby
sferik
629
62k
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ