Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.1k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
Flutterで分数(Fraction)を表示する方法
koukimiura
0
140
React Nativeならぬ"Vue Native"が実現するかも?_新世代マルチプラットフォーム開発フレームワークのLynxとLynxのVue.js対応を追ってみよう_Vue Lynx
yut0naga1_fa
1
250
TFLintカスタムプラグインで始める Terraformコード品質管理
bells17
2
350
SODA - FACT BOOK(JP)
sodainc
1
8.6k
Domain-centric? Why Hexagonal, Onion, and Clean Architecture Are Answers to the Wrong Question
olivergierke
3
950
Android16 Migration Stories ~Building a Pattern for Android OS upgrades~
reoandroider
0
130
Pythonに漸進的に型をつける
nealle
1
120
釣り地図SNSにおける有料機能の実装
nokonoko1203
0
200
組込みだけじゃない!TinyGo で始める無料クラウド開発入門
otakakot
2
360
Go言語の特性を活かした公式MCP SDKの設計
hond0413
1
460
Claude Agent SDK を使ってみよう
hyshu
0
1.4k
contribution to astral-sh/uv
shunsock
0
500
Featured
See All Featured
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.2k
4 Signs Your Business is Dying
shpigford
185
22k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
Navigating Team Friction
lara
190
15k
The Power of CSS Pseudo Elements
geoffreycrofte
79
6k
Being A Developer After 40
akosma
91
590k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
The Pragmatic Product Professional
lauravandoore
36
7k
GitHub's CSS Performance
jonrohan
1032
470k
BBQ
matthewcrist
89
9.8k
Agile that works and the tools we love
rasmusluckow
331
21k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
630
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ