Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.2k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
まだ間に合う!Claude Code元年をふりかえる
nogu66
5
890
Cell-Based Architecture
larchanjo
0
140
Full-Cycle Reactivity in Angular: SignalStore mit Signal Forms und Resources
manfredsteyer
PRO
0
170
JETLS.jl ─ A New Language Server for Julia
abap34
2
440
0→1 フロントエンド開発 Tips🚀 #レバテックMeetup
bengo4com
0
350
ZJIT: The Ruby 4 JIT Compiler / Ruby Release 30th Anniversary Party
k0kubun
0
260
クラウドに依存しないS3を使った開発術
simesaba80
0
150
公共交通オープンデータ × モバイルUX 複雑な運行情報を 『直感』に変換する技術
tinykitten
PRO
0
160
認証・認可の基本を学ぼう前編
kouyuume
0
270
GISエンジニアから見たLINKSデータ
nokonoko1203
0
180
從冷知識到漏洞,你不懂的 Web,駭客懂 - Huli @ WebConf Taiwan 2025
aszx87410
2
3k
TestingOsaka6_Ozono
o3
0
170
Featured
See All Featured
Color Theory Basics | Prateek | Gurzu
gurzu
0
150
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
2
2.8k
Believing is Seeing
oripsolob
0
15
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
400
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.4k
The Invisible Side of Design
smashingmag
302
51k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
130
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
37
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
130
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
29
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ