Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
huydx
March 18, 2016
Programming
0
1k
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
Tweet
Share
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.2k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
なるべく楽してバックエンドに型をつけたい!(楽とは言ってない)
hibiki_cube
0
140
AI時代のキャリアプラン「技術の引力」からの脱出と「問い」へのいざない / tech-gravity
minodriven
21
7.4k
CSC307 Lecture 08
javiergs
PRO
0
670
360° Signals in Angular: Signal Forms with SignalStore & Resources @ngLondon 01/2026
manfredsteyer
PRO
0
130
AIによるイベントストーミング図からのコード生成 / AI-powered code generation from Event Storming diagrams
nrslib
2
1.9k
Fluid Templating in TYPO3 14
s2b
0
130
Best-Practices-for-Cortex-Analyst-and-AI-Agent
ryotaroikeda
1
110
LLM Observabilityによる 対話型音声AIアプリケーションの安定運用
gekko0114
2
430
今から始めるClaude Code超入門
448jp
8
9k
24時間止められないシステムを守る-医療ITにおけるランサムウェア対策の実際
koukimiura
1
120
CSC307 Lecture 05
javiergs
PRO
0
500
CSC307 Lecture 07
javiergs
PRO
1
560
Featured
See All Featured
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
150
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
130
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1k
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.2k
The Cost Of JavaScript in 2023
addyosmani
55
9.5k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
430
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
140
Building the Perfect Custom Keyboard
takai
2
690
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.2k
Paper Plane
katiecoart
PRO
0
46k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
83
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
120
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ