Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards an Unified API for Spark and the IIoT by Ángel Conde at Big Data Spain 2017

Towards an Unified API for Spark and the IIoT by Ángel Conde at Big Data Spain 2017

Structured Streaming is a game changer for Apache Spark having a unified API for both batch and real-time processing. Moreover, its support for “event time” and watermarking simplifies its deployment on IIoT related projects. In this workshop, we will hands-on Spark´s Structured Streaming API and more specifically on its advantages for the IIoT domain.

https://www.bigdataspain.org/2017/talk/towards-an-unified-api-for-spark-and-the-iiot

Big Data Spain 2017
16th - 17th November Kinépolis Madrid

Big Data Spain

November 23, 2017
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE TOWARDS AN UNIFIED API

    FOR SPARK AND THE IIOT Ángel Conde Manjón/ 16-11-2017
  2. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All

    rights reserved 1 2 3 4 5 Outline 2 Spark Structured Streaming Use Cases & Key Benefits Key Issues Processing IIoT Data The Industrial Internet of Things (IIoT) Demo time
  3. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All

    rights reserved The Industrial Internet of Things (IIoT) : investment is expected to top $60 trillion during the next 15 years. : could add $14.2T to the global economy by 2030 McKinsey: will touch 43% of the global economy by 2025. Gartner : 20 billion IoT things installed by 2020. 3
  4. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All

    rights reserved Use cases & Key Benefits 4
  5. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All

    rights reserved Key issues Processing IIoT Data Late Data & Ordering • Connectivity issues: 2G, 3G. • Protocol support: Data quality • Raw sensor values: broken sensors. • Deal with duplicates: local acquisition systems. 5
  6. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All

    rights reserved Structured Streaming 6 Stream processing on top of SparkSQL engine. Unified API for batch/stream processing. Watermarking & deduplication. Aggregations, UDFs, stateful ops. Joins with static data (Spark 2.3 will support joins between streams). spark.readStream .format(‚kafka‛) .option(‚subscribe‛,‛in‛) .load() .groupBy(‘value’) .agg(count(‚*‛)) .writeStream .format(‚kafka‛) .option(‚topic‛,‛out‛) .trigger(‚1 minute‛) .outputMode(‚update‛)
  7. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All

    rights reserved Watermarking & Late Data 7
  8. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE © 2017. IKERLAN. All

    rights reserved Architecture 10 Digital Platform (PaaS) JSON Filter and routing Aggregates & Raw data Real time processing
  9. IKERLAN. WHERE TECHNOLOGY IS AN ATTITUDE IKERLAN P.º José María

    Arizmendiarrieta, 2 - 20500 Arrasate-Mondragón T. +34 943712400 F. +34 943796944 THANK YOU https://github.com/Neuw84/bds2k17 [email protected] @neuw84