Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scalaで実装した 分散処理システムの超概要

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Scalaで実装した 分散処理システムの超概要

Avatar for Takashi Funato

Takashi Funato

April 11, 2018
Tweet

More Decks by Takashi Funato

Other Decks in Technology

Transcript

  1. ͱ͸͍͑ • ॲཧͷ੔߹ੑ • ͦΕͧΕͷܭࢉ݁Ռͷ੔߹ੑ • ॲཧ଎౓ͷޮ཰Խ • ॲཧͷ෼ׂ •

    ॲཧର৅σʔλͷ෼഑ɺऩूɺूܭ • ௨৴ϓϩτίϧ΍εέʔϥϏϦςΟɺ଱ো֐ੑ…. • ઐ໳͡Όͳ͍ͷͰৄ͘͠͸…
  2. ॲཧܥ • ѻ͏σʔλ͸͍ΘΏΔϏοάσʔλ • ਺ेςϥόΠτ ʙ ϖλόΠτ • ηϯαʔɺυΩϡϝϯτɺϩάɺఱจֶɺେؾԽֶɺήϊϜ etc

    • ҰൠతͳRDBͰ͸ॲཧ͖͠Εͳ͘ͳͬͨ • ઐ༻ͷϋʔυ΢ΣΞ΍ιϑτ΢ΣΞ • ඇৗʹߴՁ • Google͕MapReduceΛൃද • ͦΕΛ΋ͱʹHadoop͕࡞ΒΕOSSԽ
  3. Sparkͷઆ໌ - σʔλೖग़ྗ • ༷ʑͳσʔλͷೖग़ྗʹରԠ • Hadoop Distributed File System(HDFS)

    • Cassandra • MongoDB • Couchbase • Amazon S3 • RDBʢJDBCͰ઀ଓͰ͖Δ΋ͷͰ͋Ε͹ʣ • IOपΓͷࣗ࡞
  4. Sparkͷઆ໌ - ѻ͏σʔλܗࣜ • ༷ʑͳσʔλܗࣜʹରԠ • CSVʢTSVʣ • JSON •

    Text • ParquetɺORC • ΧϥϜφϑΥʔϚοτʢྻํ޲σʔλʣ • ReadɺWriteΛࣗ࡞΋
  5. Sparkͷઆ໌ - ॲཧͷجຊ2 • SQLϥΠΫͳΠϯλʔϑΣʔε͕༻ҙ͞Ε͍ͯΔ • Spark SQL • Readͨ͠σʔλʹରͯ͠Ճ޻

    • FilterɺGroupByɺAvgɺOrderByɺMaxɺMinɺCount • JoinɺUnion • ෳ਺ͷಡΈࠐΜͩσʔλʹରͯ͠ • https://spark.apache.org/docs/latest/api/scala/ index.html#org.apache.spark.sql.Dataset