Upgrade to Pro — share decks privately, control downloads, hide ads and more …

spark shuffle 勉強会

huydx
March 18, 2016

spark shuffle 勉強会

spark shuffle 勉強会

huydx

March 18, 2016
Tweet

More Decks by huydx

Other Decks in Programming

Transcript

  1. Shuffleʹ͍ͭͯ • Map - Reduce Ϟσϧ • Mapஈ֊͔ΒReduceஈ֊΁ͷதؒϨΠϠʔ͸ ʮShuffleʯͱݺͿ •

    ShuffleͰ͸ • Spark͕PullϞσϧʢ·ͣσΟεΫʹ݁Ռॻ͍ ͯɺReduceδϣϒ͕औΓʹߦ͘ʣ • Spark͸Reduceδϣϒʹඞཁͳσʔλ͸ϝϞϦ ϑΟοτ͠ͳ͍ͱ͍͚ͳ͍
  2. Shuffleͷ໰୊ • ShuffleϑΝΠϧ਺ • Mapͷ਺͕MɺReduceͷ਺͕Rͱͨ͠ΒσΟεΫʹॻ͘ ϑΝΠϧ਺͕ M * R (M

    = 5000, R = 1024 ͩͱ 500ສϑΝ Πϧʂʣ • Reduce͢Δͱ͖ʹιʔτΞϧΰϦζϜ͕ඞཁ • ฒྻʹιʔτ͢Δඞཁ͕Ͱ͖Δ΋ͷ • ௨৴͕ॏ͍
  3. Shuffleͷ໰୊ղܾ • ShuffleϑΝΠϧ਺ɿ • O(M * R) ͡Όͳͯ͘ O(R)·Ͱ཈͑ΒΕΔ •

    Hashed base shuffle(ҰͭͷRͻͱͭͷϑΝΠϧʣ͡Όͳͯ͘ Sort base shuffle • ࢀߟɿhttps://issues.apache.org/jira/secure/attachment/ 12637642/Consolidating%20Shuffle%20Files%20in %20Spark.pdf • https://issues.apache.org/jira/browse/SPARK-2045