Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scalaで実装した 分散処理システムの超概要

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Scalaで実装した 分散処理システムの超概要

Avatar for Takashi Funato

Takashi Funato

April 11, 2018

More Decks by Takashi Funato

Other Decks in Technology

Transcript

  1. ͱ͸͍͑ • ॲཧͷ੔߹ੑ • ͦΕͧΕͷܭࢉ݁Ռͷ੔߹ੑ • ॲཧ଎౓ͷޮ཰Խ • ॲཧͷ෼ׂ •

    ॲཧର৅σʔλͷ෼഑ɺऩूɺूܭ • ௨৴ϓϩτίϧ΍εέʔϥϏϦςΟɺ଱ো֐ੑ…. • ઐ໳͡Όͳ͍ͷͰৄ͘͠͸…
  2. ॲཧܥ • ѻ͏σʔλ͸͍ΘΏΔϏοάσʔλ • ਺ेςϥόΠτ ʙ ϖλόΠτ • ηϯαʔɺυΩϡϝϯτɺϩάɺఱจֶɺେؾԽֶɺήϊϜ etc

    • ҰൠతͳRDBͰ͸ॲཧ͖͠Εͳ͘ͳͬͨ • ઐ༻ͷϋʔυ΢ΣΞ΍ιϑτ΢ΣΞ • ඇৗʹߴՁ • Google͕MapReduceΛൃද • ͦΕΛ΋ͱʹHadoop͕࡞ΒΕOSSԽ
  3. Sparkͷઆ໌ - σʔλೖग़ྗ • ༷ʑͳσʔλͷೖग़ྗʹରԠ • Hadoop Distributed File System(HDFS)

    • Cassandra • MongoDB • Couchbase • Amazon S3 • RDBʢJDBCͰ઀ଓͰ͖Δ΋ͷͰ͋Ε͹ʣ • IOपΓͷࣗ࡞
  4. Sparkͷઆ໌ - ѻ͏σʔλܗࣜ • ༷ʑͳσʔλܗࣜʹରԠ • CSVʢTSVʣ • JSON •

    Text • ParquetɺORC • ΧϥϜφϑΥʔϚοτʢྻํ޲σʔλʣ • ReadɺWriteΛࣗ࡞΋
  5. Sparkͷઆ໌ - ॲཧͷجຊ2 • SQLϥΠΫͳΠϯλʔϑΣʔε͕༻ҙ͞Ε͍ͯΔ • Spark SQL • Readͨ͠σʔλʹରͯ͠Ճ޻

    • FilterɺGroupByɺAvgɺOrderByɺMaxɺMinɺCount • JoinɺUnion • ෳ਺ͷಡΈࠐΜͩσʔλʹରͯ͠ • https://spark.apache.org/docs/latest/api/scala/ index.html#org.apache.spark.sql.Dataset