Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scalaで実装した 分散処理システムの超概要

Scalaで実装した 分散処理システムの超概要

Takashi Funato

April 11, 2018
Tweet

More Decks by Takashi Funato

Other Decks in Technology

Transcript

  1. ͱ͸͍͑ • ॲཧͷ੔߹ੑ • ͦΕͧΕͷܭࢉ݁Ռͷ੔߹ੑ • ॲཧ଎౓ͷޮ཰Խ • ॲཧͷ෼ׂ •

    ॲཧର৅σʔλͷ෼഑ɺऩूɺूܭ • ௨৴ϓϩτίϧ΍εέʔϥϏϦςΟɺ଱ো֐ੑ…. • ઐ໳͡Όͳ͍ͷͰৄ͘͠͸…
  2. ॲཧܥ • ѻ͏σʔλ͸͍ΘΏΔϏοάσʔλ • ਺ेςϥόΠτ ʙ ϖλόΠτ • ηϯαʔɺυΩϡϝϯτɺϩάɺఱจֶɺେؾԽֶɺήϊϜ etc

    • ҰൠతͳRDBͰ͸ॲཧ͖͠Εͳ͘ͳͬͨ • ઐ༻ͷϋʔυ΢ΣΞ΍ιϑτ΢ΣΞ • ඇৗʹߴՁ • Google͕MapReduceΛൃද • ͦΕΛ΋ͱʹHadoop͕࡞ΒΕOSSԽ
  3. Sparkͷઆ໌ - σʔλೖग़ྗ • ༷ʑͳσʔλͷೖग़ྗʹରԠ • Hadoop Distributed File System(HDFS)

    • Cassandra • MongoDB • Couchbase • Amazon S3 • RDBʢJDBCͰ઀ଓͰ͖Δ΋ͷͰ͋Ε͹ʣ • IOपΓͷࣗ࡞
  4. Sparkͷઆ໌ - ѻ͏σʔλܗࣜ • ༷ʑͳσʔλܗࣜʹରԠ • CSVʢTSVʣ • JSON •

    Text • ParquetɺORC • ΧϥϜφϑΥʔϚοτʢྻํ޲σʔλʣ • ReadɺWriteΛࣗ࡞΋
  5. Sparkͷઆ໌ - ॲཧͷجຊ2 • SQLϥΠΫͳΠϯλʔϑΣʔε͕༻ҙ͞Ε͍ͯΔ • Spark SQL • Readͨ͠σʔλʹରͯ͠Ճ޻

    • FilterɺGroupByɺAvgɺOrderByɺMaxɺMinɺCount • JoinɺUnion • ෳ਺ͷಡΈࠐΜͩσʔλʹରͯ͠ • https://spark.apache.org/docs/latest/api/scala/ index.html#org.apache.spark.sql.Dataset