Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scalaで実装した 分散処理システムの超概要

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Scalaで実装した 分散処理システムの超概要

Avatar for Takashi Funato

Takashi Funato

April 11, 2018

More Decks by Takashi Funato

Other Decks in Technology

Transcript

  1. ͱ͸͍͑ • ॲཧͷ੔߹ੑ • ͦΕͧΕͷܭࢉ݁Ռͷ੔߹ੑ • ॲཧ଎౓ͷޮ཰Խ • ॲཧͷ෼ׂ •

    ॲཧର৅σʔλͷ෼഑ɺऩूɺूܭ • ௨৴ϓϩτίϧ΍εέʔϥϏϦςΟɺ଱ো֐ੑ…. • ઐ໳͡Όͳ͍ͷͰৄ͘͠͸…
  2. ॲཧܥ • ѻ͏σʔλ͸͍ΘΏΔϏοάσʔλ • ਺ेςϥόΠτ ʙ ϖλόΠτ • ηϯαʔɺυΩϡϝϯτɺϩάɺఱจֶɺେؾԽֶɺήϊϜ etc

    • ҰൠతͳRDBͰ͸ॲཧ͖͠Εͳ͘ͳͬͨ • ઐ༻ͷϋʔυ΢ΣΞ΍ιϑτ΢ΣΞ • ඇৗʹߴՁ • Google͕MapReduceΛൃද • ͦΕΛ΋ͱʹHadoop͕࡞ΒΕOSSԽ
  3. Sparkͷઆ໌ - σʔλೖग़ྗ • ༷ʑͳσʔλͷೖग़ྗʹରԠ • Hadoop Distributed File System(HDFS)

    • Cassandra • MongoDB • Couchbase • Amazon S3 • RDBʢJDBCͰ઀ଓͰ͖Δ΋ͷͰ͋Ε͹ʣ • IOपΓͷࣗ࡞
  4. Sparkͷઆ໌ - ѻ͏σʔλܗࣜ • ༷ʑͳσʔλܗࣜʹରԠ • CSVʢTSVʣ • JSON •

    Text • ParquetɺORC • ΧϥϜφϑΥʔϚοτʢྻํ޲σʔλʣ • ReadɺWriteΛࣗ࡞΋
  5. Sparkͷઆ໌ - ॲཧͷجຊ2 • SQLϥΠΫͳΠϯλʔϑΣʔε͕༻ҙ͞Ε͍ͯΔ • Spark SQL • Readͨ͠σʔλʹରͯ͠Ճ޻

    • FilterɺGroupByɺAvgɺOrderByɺMaxɺMinɺCount • JoinɺUnion • ෳ਺ͷಡΈࠐΜͩσʔλʹରͯ͠ • https://spark.apache.org/docs/latest/api/scala/ index.html#org.apache.spark.sql.Dataset