Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
spark shuffle 勉強会
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
huydx
March 18, 2016
Programming
1.1k
0
Share
spark shuffle 勉強会
spark shuffle 勉強会
huydx
March 18, 2016
More Decks by huydx
See All by huydx
Finagle もろもろ
huydx
0
2.2k
web audio api (htmlday osaka)
huydx
1
1.2k
Other Decks in Programming
See All in Programming
サークル参加から学ぶ、小さな事業の回し方
yuzneri
0
230
今さら聞けないCancellationToken
htkym
0
180
How We Practice Exploratory Testing in Iterative Development( #scrumniigata ) / 反復開発の中で、探索的テストをどう実施しているか
teyamagu
PRO
3
1.1k
開発体験を左右するライブラリの API 設計 - GraphQL スキーマ構築ライブラリから考える #tskaigi
izumin5210
2
520
Agentic AI & UI: Arcitecture, HITL, Emerging Standards
manfredsteyer
PRO
0
130
Agentic UI beyond Chats Architecture Patterns & Open Standards @ngMunich 05/2026
manfredsteyer
PRO
0
140
横断組織出身のQAEがインプロセスQAEでつまずいたこと・活かせたこと
ty89
0
180
Sans tests, vos agents ne sont pas fiables
nabondance
0
150
Composerを使ったサプライチェーン攻撃の様子を眺めてみる #phpstudy
o0h
PRO
1
110
AI Agent と正しく分析するための環境作り
yoshyum
2
590
「OSSがあるなら自作するな」は AI時代も正しいか ── Build vs Adopt の新しい判断基準
kumorn5s
7
2.9k
AWSはOSSをどのように 考えているのか?
akihisaikeda
1
140
Featured
See All Featured
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
1
220
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
220
We Are The Robots
honzajavorek
0
230
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
190
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Become a Pro
speakerdeck
PRO
31
5.9k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
1.5k
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
110
Believing is Seeing
oripsolob
1
130
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.4k
Transcript
SparkͷShuffleपΓ @huydx
Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊ͷதؒϨΠϠʔ ʮShuffleʯͱݺͿ •
ShuffleͰ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • SparkReduceδϣϒʹඞཁͳσʔλϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
͍ͭShuffle͕ൃੜ͢Δ • Join • Cogroup • *ByKeyΦϖϨʔγϣϯ
Shuffleͷ • ShuffleϑΝΠϧ • Mapͷ͕MɺReduceͷ͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ͕ M * R (M
= 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δͷ • ௨৴͕ॏ͍
Shuffleͷղܾ • ShuffleϑΝΠϧɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ͑ΒΕΔ •
Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045
Shuffleͷղܾ • SortͷΞϧΰϦζϜબ • https://databricks.com/blog/2014/10/10/spark- petabyte-sort.html • TimsortΛ࣮͢Δ • ৭ʑͳιʔτΞϧΰϦζϜͷΉ߹ΘͤͰฏ
ۉWorst CaseύʔϑΥϚϯεΛݮΒ͢
Shuffleͷղܾ • ωοτϫʔΫϞδϡʔϧΛվળ • https://issues.apache.org/jira/browse/ SPARK-2468 • Netty ϕʔεσʔλసૹͷ࣮ (FileChannel.transferToͰzero
copyʣ