Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Taming IoT Data: Making Sense of Sensors with S...

Taming IoT Data: Making Sense of Sensors with SQL Streaming @VoxxedDays Zurich

as presented at VoxxedDays Zürich 2019, Switzerland

Abstract:

We are living in a data streaming era, yet until recently it has been particularly hard to leverage existing stream processing technologies. On the one hand, because dealing with data in motion has its inherent challenges. On the other hand, most frameworks and APIs which are allowing for stream processing are typically very hard to employ and/or operate - NOT so for KSQL, the newest kid in Apache Kafka's ecosytem. Based on a simplified version of an IoT use case this session gives a gentle introduction into KSQL - Kafka's SQL streaming engine for the masses.

Join this whirlwind tour during which we are discussing a fully-fledged streaming IoT architecture. Concretely, we are going to:

(1) ingest smart home energy data into Apache Kafka, (2) use KSQL for flexible, powerful and scalable SQL-only stream processing, (3) send raw data as well as pre-processed results to an operational NoSQL data store, (4) reactively serve data to clients in near real-time and (5) finally build informative live charts.

Conference Page: https://voxxeddays.com/zurich

YouTube Recording: https://www.youtube.com/watch?v=fqqHG3pT5xQ

Hans-Peter Grahsl

March 19, 2019
Tweet

More Decks by Hans-Peter Grahsl

Other Decks in Programming

Transcript

  1. $ whoami • Hans-Peter Grahsl • working & living in

    Graz • technical trainer at • independent consultant & engineer • associate lecturer • " irregular conference speaker @hpgrahsl | #VDZ19 @VoxxedZurich, 19th March 2019, Switzerland 2
  2. WHAT IS STREAMING ! ❓ ! ❓ @hpgrahsl | #VDZ19

    @VoxxedZurich, 19th March 2019, Switzerland 3
  3. Streaming == BIG DEAL 1. unbounded data sets are prevalent

    ➡ never-ending data streams need purpose-built systems 2. people crave for timely information ➡ stream processing technology aids lower latencies @hpgrahsl | #VDZ19 @VoxxedZurich, 19th March 2019, Switzerland 5
  4. KSQL's Nature • built on top of Kafka Streams •

    SQL only (not embedded) • NO(!) coding skills required • extremely low entry barrier • familiar syntax and semantics • concise and expressive • joins, aggregations, windowing • UD(A)Fs and UDTFs coming soon... @hpgrahsl | #VDZ19 @VoxxedZurich, 19th March 2019, Switzerland 22
  5. KSQL Queries • per-record streaming with milliseconds latency • compiled

    into Kafka Streams applications • follow same execution model • distributed over multiple KSQL servers • two operation modes / deployment options: • interactive vs. headless @hpgrahsl | #VDZ19 @VoxxedZurich, 19th March 2019, Switzerland 29
  6. KSQL interactive mode • KSQL servers accessed via REST API

    • offers ad-hoc stream analytics • share streams & tables across users • used for exploration and during development
  7. KSQL headless mode • streaming queries given by a SQL

    file • KSQL servers process SQL file • use case specific isolation • "locked-down" ➡ NO REST API access • used for production deployments
  8. KSQL wrap-up • streaming with SQL ... and nothing but

    SQL • scalable & fault-tolerant • deployable anywhere: cloud or on prem • viable for use cases of any size (XS ... XXXL) • exactly-once delivery guarantee semantics @hpgrahsl | #VDZ19 @VoxxedZurich, 19th March 2019, Switzerland 53
  9. "If I have been faster it's by streaming on the

    shoulders of Apache Kafka." — my other self
  10. Your obsession tells you to do batching. I tell you

    to walk away and stream with KSQL The choice is yours folks!
  11. THANK YOU Q & A ? https://bit.ly/2FaLr7w @hpgrahsl | #VDZ19

    @VoxxedZurich, 19th March 2019, Switzerland 56