Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Processing IoT data with Apache Kafka, KSQL, and Machine Learning

Processing IoT data with Apache Kafka, KSQL, and Machine Learning

IoT devices generate large amounts of data that must be continuously processed and analyzed. Apache Kafka is a highly scalable open source streaming platform for reading, storing, processing and forwarding large amounts of data from thousands of IoT devices. KSQL is an open source streaming SQL engine based natively on Apache Kafka to enable stream processing for everyone using simple SQL commands.

This talk shows with a scenario from the health care sector how Kafka and KSQL can help to continuously conduct health checks of patients. A live demo shows how machine learning models – trained with frameworks such as TensorFlow, DeepLearning4J or H2O – can be deployed into a runtime-critical and scalable real-time application.

Takeaways:
* Apache Kafka is a streaming platform for reading, storing, processing and forwarding large volumes of data from thousands of IoT devices.
* KSQL allows continuous integration and analysis without external big data clusters and without writing source code.
* Machine learning models can be easily trained and used in the Apache Kafka environment.

Robin Moffatt

June 05, 2018
Tweet

More Decks by Robin Moffatt

Other Decks in Technology

Transcript

  1. @rmoff [email protected]
    https://speakerdeck.com/rmoff/
    Processing IoT data with
    Apache Kafka, KSQL, and
    Machine Learning
    building IoT, Köln
    5th June 2018 / Robin Moffatt

    View Slide

  2. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    $ whoami
    • Developer Advocate @ Confluent
    • Working in data & analytics since 2001
    • Oracle ACE Director & Dev Champion
    • Blogging : http://rmoff.net & http://cnfl.io/rmoff
    • Twitter: @rmoff
    • Geek stuff
    • Beer & Fried Breakfasts
    2
    https://speakerdeck.com/rmoff/

    View Slide

  3. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    What we will talk about…
    •What is Apache Kafka?
    •IoT and Kafka
    •Machine Learning and Kafka
    •Stream Processing with Kafka
    •Put it all together : 

    IoT + Stream Processing + ML + Kafka!
    3

    View Slide

  4. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    What we will talk about…
    •What is Apache Kafka?
    •IoT and Kafka
    •Machine Learning and Kafka
    •Stream Processing with Kafka
    •Put it all together : 

    IoT + Stream Processing + ML + Kafka!
    4

    View Slide

  5. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning

    What is kafka?
    5

    View Slide

  6. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning

    6
    Kafka is a
    streaming platform

    View Slide

  7. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning

    all your data is event
    streams
    7

    View Slide

  8. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 8
    A Sensor
    Reading

    View Slide

  9. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 9
    An Application
    Log Entry

    View Slide

  10. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 10
    A Customer
    Experience

    View Slide

  11. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 11
    A Sale

    View Slide

  12. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 12
    Databases

    View Slide

  13. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 13
    The Streaming Platform

    View Slide

  14. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 14
    Scalable
    Decoupled
    The Streaming Platform
    Realtime

    View Slide

  15. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Simple Data Pipelines - storage in HDFS for batch analytics / processing
    15
    HDFS / S3

    View Slide

  16. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Event Driven Applications
    16
    Stream
    Processing
    App

    View Slide

  17. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Real-time Enrichment and Analytics on What's Happening Now
    17
    Lookup data
    Stream
    Processing
    Event Streams

    View Slide

  18. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Real-time Enrichment and Analytics on What's Happening Now
    18
    Lookup data
    Stream
    Processing
    HDFS / S3
    App
    Event Streams

    View Slide

  19. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Apache Kafka at Large Scale
    19
    https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63921
    https://qconlondon.com/london2018/presentation/cloud-native-and-scalable-kafka-architecture
    (2018) (2018)

    View Slide

  20. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Apache Kafka at Large Scale
    19
    https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63921
    https://qconlondon.com/london2018/presentation/cloud-native-and-scalable-kafka-architecture
    (2018) (2018)

    View Slide

  21. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    What we will talk about…
    •What is Apache Kafka?
    •IoT and Kafka
    •Machine Learning and Kafka
    •Stream Processing with Kafka
    •Put it all together : 

    IoT + Stream Processing + ML + Kafka!
    20

    View Slide

  22. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Kafka
    Cluster
    Apache Kafka
    21
    Reads are a single seek & scan
    Writes are
    append only
    Kafka
    A Distributed Commit Log. Publish and subscribe to 

    streams of records. Highly scalable, high throughput. 

    Supports transactions. Persisted data. Stream processing.

    View Slide

  23. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Producer and Consumer API
    • Send data directly to Kafka using the
    Producer API
    • Configurable for number of acks
    required etc
    • Open-source client libraries available
    for:
    22
    • Java
    • Scala
    • Python
    • C / C++
    • .NET
    • Go

    View Slide

  24. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    • Very lightweight messaging protocol, often used with IoT devices
    • Integrates with Kafka through Kafka Connect
    • https://github.com/jcustenborder/kafka-connect-mqtt
    Stream
    Processing
    HDFS / S3
    App
    MQTT
    23

    View Slide

  25. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    The Connect API of Apache Kafka®
    24
    ✓ Fault tolerant and automatically load balanced
    ✓ Extensible API
    ✓ Single Message Transforms
    ✓ Part of Apache Kafka, included in

    Confluent Open Source
    Reliable and scalable integration of Kafka
    with other systems – no coding required.
    {
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo",
    "table.whitelist": "sales,orders,customers"
    }
    https://docs.confluent.io/current/connect/
    ✓ Centralized management and configuration
    ✓ Support for hundreds of technologies
    including RDBMS, Elasticsearch, HDFS, S3
    ✓ Supports CDC ingest of events from RDBMS
    ✓ Preserves data schema

    View Slide

  26. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Kafka Connect
    25
    Kafka Brokers
    Kafka Connect
    Tasks Workers
    Sources Sinks
    Amazon S3
    syslog
    flat file
    CSV
    JSON
    MQTT
    MQTT

    View Slide

  27. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall
    26
    REST Proxy
    Non-Java Applications
    Native Kafka Java
    Applications
    Schema Registry
    REST / HTTP
    Simplifies administrative
    actions
    Simplifies message creation
    and consumption
    Provides a RESTful
    interface to a Kafka cluster

    View Slide

  28. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    What we will talk about…
    •What is Apache Kafka?
    •IoT and Kafka
    •Machine Learning and Kafka
    •Stream Processing with Kafka
    •Put it all together : 

    IoT + Stream Processing + ML + Kafka!
    27

    View Slide

  29. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Machine Learning
    28
    ... allows computers to find hidden insights without being
    explicitly programmed where to look.

    View Slide

  30. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Machine Learning
    28
    ... allows computers to find hidden insights without being
    explicitly programmed where to look.
    Machine Learning
    • Decision Trees
    • Naïve Bayes
    • Clustering
    • Neural Networks
    • etc.

    View Slide

  31. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Machine Learning
    28
    ... allows computers to find hidden insights without being
    explicitly programmed where to look.
    Machine Learning
    • Decision Trees
    • Naïve Bayes
    • Clustering
    • Neural Networks
    • etc.
    Deep Learning
    • CNN
    • RNN
    • Autoencoder
    • etc.

    View Slide

  32. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Real World Examples of Machine Learning
    29
    Email composition
    Search Results +
    Product Recommendation
    Healthcare screening
    Automated dialogue

    View Slide

  33. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Leverage Machine Learning to Analyze and Act on Critical Business Moments
    30
    Windows of Opportunity

    View Slide

  34. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Leverage Machine Learning to Analyze and Act on Critical Business Moments
    30
    Seconds
    Price
    Optimization
    Fraud
    Detection
    Cross Selling
    Windows of Opportunity

    View Slide

  35. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Leverage Machine Learning to Analyze and Act on Critical Business Moments
    30
    Seconds Minutes
    Price
    Optimization
    Fraud
    Detection
    Cross Selling
    Transportation
    Rerouting
    Customer
    Service
    Windows of Opportunity

    View Slide

  36. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Leverage Machine Learning to Analyze and Act on Critical Business Moments
    30
    Seconds Minutes Hours
    Price
    Optimization
    Predictive
    Maintenance
    Fraud
    Detection
    Cross Selling
    Transportation
    Rerouting
    Customer
    Service
    Inventory
    Management
    Windows of Opportunity

    View Slide

  37. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Languages, Frameworks, and Tools for Machine Learning
    31
    Portable Format
    for Analytics (PFA)

    View Slide

  38. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Languages, Frameworks, and Tools for Machine Learning
    31
    There is no all-rounder ! ML-independent infrastructure needed!
    Portable Format
    for Analytics (PFA)

    View Slide

  39. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Apache Kafka’s Open Source Ecosystem as Infrastructure for Machine Learning
    32

    View Slide

  40. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Apache Kafka’s Open Source Ecosystem as Infrastructure for Machine Learning
    32

    View Slide

  41. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Apache Kafka’s Open Source Ecosystem as Infrastructure for Machine Learning
    32

    View Slide

  42. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Apache Kafka’s Open Source Ecosystem as Infrastructure for Machine Learning
    33
    Kafka
    Streams
    Kafka
    Connect
    Rest Proxy
    Schema Registry
    Go / .NET / Python
    Kafka Producer
    KSQL
    Kafka
    Streams

    View Slide

  43. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Replay-ability – A log never forgets!
    34
    Time
    Model B Model X
    Model A
    Producer
    Distributed Commit Log
    Different models with same data
    Different ML Frameworks
    AutoML compatible
    A/B Testing

    View Slide

  44. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    What we will talk about…
    •What is Apache Kafka?
    •IoT and Kafka
    •Machine Learning and Kafka
    •Stream Processing with Kafka
    •Put it all together : 

    IoT + Stream Processing + ML + Kafka!
    35

    View Slide

  45. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    the KAFKA STREAMS API is a 

    JAVA API to 

    BUILD REAL-TIME APPLICATIONS to 

    POWER THE BUSINESS
    36

    View Slide

  46. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Things Kafka Streams Does
    37

    View Slide

  47. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Things Kafka Streams Does
    37
    Runs
    everywhere

    View Slide

  48. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Things Kafka Streams Does
    37
    Runs
    everywhere
    Clustering done
    for you

    View Slide

  49. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Things Kafka Streams Does
    37
    Runs
    everywhere
    Clustering done
    for you
    Exactly-once
    processing

    View Slide

  50. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Things Kafka Streams Does
    37
    Runs
    everywhere
    Clustering done
    for you
    Exactly-once
    processing
    Event-time
    processing

    View Slide

  51. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Things Kafka Streams Does
    37
    Runs
    everywhere
    Clustering done
    for you
    Exactly-once
    processing
    Event-time
    processing
    Integrated
    database

    View Slide

  52. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Things Kafka Streams Does
    37
    Runs
    everywhere
    Clustering done
    for you
    Exactly-once
    processing
    Event-time
    processing
    Integrated
    database
    Joins, windowing,
    aggregation

    View Slide

  53. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Things Kafka Streams Does
    37
    Runs
    everywhere
    Clustering done
    for you
    Exactly-once
    processing
    Event-time
    processing
    Integrated
    database
    Joins, windowing,
    aggregation
    S/M/L/XL/XXL/XXXL
    sizes

    View Slide

  54. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 38
    App
    Streams
    API
    Not running
    inside brokers!

    View Slide

  55. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 39
    Brokers?
    Nope!
    App
    Streams
    API
    App
    Streams
    API
    App
    Streams
    API
    Same app, many instances

    View Slide

  56. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 40
    Declarative
    Stream
    Language
    Processing
    KSQL
    is a

    View Slide

  57. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 41
    KSQL
    is the
    Streaming
    SQL Engine
    for
    Apache Kafka

    View Slide

  58. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    KSQL in Development and Production
    42
    Interactive KSQL

    for development and testing
    REST
    “Hmm, let me try

    out this idea...”

    View Slide

  59. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    KSQL in Development and Production
    42
    Interactive KSQL

    for development and testing
    Headless KSQL

    for Production
    Desired KSQL queries
    have been identified
    REST
    “Hmm, let me try

    out this idea...”

    View Slide

  60. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    KSQL for Real-Time Monitoring
    43
    • Log data monitoring, tracking and alerting
    • syslog data
    • Sensor / IoT data
    CREATE STREAM SYSLOG_INVALID_USERS AS
    SELECT HOST, MESSAGE
    FROM SYSLOG
    WHERE MESSAGE LIKE '%Invalid user%';
    http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting

    View Slide

  61. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    KSQL for Real-Time Monitoring
    44
    http://cnfl.io/syslog-alerting

    View Slide

  62. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    KSQL for Streaming ETL
    45
    CREATE STREAM vip_actions AS 

    SELECT userid, page, action
    FROM clickstream c
    LEFT JOIN users u
    ON c.userid = u.user_id 

    WHERE u.level = 'Platinum';
    Joining, filtering, and aggregating streams of event data

    View Slide

  63. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    KSQL for Streaming ETL
    46
    Denormalise streams of data from hetrogenous sources
    CSV
    CSV
    CSV
    Kafka Connect
    kafka-connect-spooldir
    Debezium
    Kafka Connect
    kafka-connect-s3
    MySQL S3
    KSQL

    View Slide

  64. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    KSQL for Anomaly Detection
    47
    CREATE TABLE possible_fraud AS

    SELECT card_number, count(*)

    FROM authorization_attempts 

    WINDOW TUMBLING (SIZE 5 SECONDS)

    GROUP BY card_number

    HAVING count(*) > 3;
    Identifying patterns or anomalies in real-time data,
    surfaced in milliseconds

    View Slide

  65. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    KSQL for ML scoring
    48
    CREATE STREAM possible_fraud AS

    SELECT txn_id, card_number

    FROM credit_card_txn_stream

    WHERE ANOMALY(txn_data) > 0.8;
    Apply Machine Learning models to live streams of data

    View Slide

  66. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    What we will talk about…
    •What is Apache Kafka?
    •IoT and Kafka
    •Machine Learning and Kafka
    •Stream Processing with Kafka
    •Put it all together : 

    IoT + Stream Processing + ML + Kafka!
    49

    View Slide

  67. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Use Case:
    Anomaly Detection
    (Sensor Healthcheck)
    Machine Learning Algorithm:
    Autoencoder, built with H2O
    Streaming Platform:
    Apache Kafka and KSQL
    Live Demo – Prebuilt Model Embedded in KSQL Function
    50

    View Slide

  68. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Overview
    51

    View Slide

  69. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Overview
    ECG
    device
    Kafka
    Producer API
    51

    View Slide

  70. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Overview
    KSQL
    Sensor data Scored data
    Apply ML model
    ECG
    device
    Kafka
    Producer API
    51

    View Slide

  71. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Overview
    KSQL
    Sensor data Scored data
    Apply ML model
    Elasticsearch
    Kafka
    Connect
    All data
    ECG
    device
    Kafka
    Producer API
    51

    View Slide

  72. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Overview
    KSQL
    Sensor data Scored data
    Apply ML model
    Elasticsearch
    Kafka
    Connect
    All data
    Kafka
    Consumer API
    Emergency
    system
    Critical Data
    ECG
    device
    Kafka
    Producer API
    51

    View Slide

  73. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Training the model
    52

    View Slide

  74. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Training the model
    52
    anomaly_model <- h2o.deeplearning(
    x = names(train_ecg),
    training_frame = train_ecg,
    activation = "Tanh",
    autoencoder = TRUE,
    hidden = c(50,20,50),
    sparse = TRUE,
    l1 = 1e-4,
    epochs = 100)
    R

    View Slide

  75. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Training the model
    52
    2.10,2.13,2.19,2.28,2.44,2.62,2.80,3.04,3.36,3.69,3.97,4.24,4.53,4.80,5.02,5.21,5.40,5.57,5.7
    79,5.86,5.92,5.98,6.02,6.06,6.08,6.14,6.18,6.22,6.27,6.32,6.35,6.38,6.45,6.49,6.53,6.57,6.64,
    ,6.73,6.78,6.83,6.88,6.92,6.94,6.98,7.01,7.03,7.05,7.06,7.07,7.08,7.06,7.04,7.03,6.99,6.94,6.
    .83,6.77,6.69,6.60,6.53,6.45,6.36,6.27,6.19,6.11,6.03,5.94,5.88,5.81,5.75,5.68,5.62,5.61,5.54
    9,5.45,5.42,5.38,5.34,5.31,5.30,5.29,5.26,5.23,5.23,5.22,5.20,5.19,5.18,5.19,5.17,5.15,5.14,5
    5.16,5.15,5.15,5.15,5.14,5.14,5.14,5.15,5.14,5.14,5.13,5.15,5.15,5.15,5.14,5.16,5.15,5.15,5.1
    14,5.15,5.15,5.14,5.13,5.14,5.14,5.11,5.12,5.12,5.12,5.09,5.09,5.09,5.10,5.08,5.08,5.08,5.08,
    ,5.05,5.06,5.07,5.05,5.03,5.03,5.04,5.03,5.01,5.01,5.02,5.01,5.01,5.00,5.00,5.02,5.01,4.98,5.
    .00,5.00,4.99,5.00,5.01,5.02,5.01,5.03,5.03,5.02,5.02,5.04,5.04,5.04,5.02,5.02,5.01,4.99,4.98
    6,4.96,4.96,4.94,4.93,4.93,4.93,4.93,4.93,5.02,5.27,5.80,5.94,5.58,5.39,5.32,5.25,5.21,5.13,4
    4.71,4.39,4.05,3.69,3.32,3.05,2.99,2.74,2.61,2.47,2.35,2.26,2.20,2.15,2.10,2.08
    anomaly_model <- h2o.deeplearning(
    x = names(train_ecg),
    training_frame = train_ecg,
    activation = "Tanh",
    autoencoder = TRUE,
    hidden = c(50,20,50),
    sparse = TRUE,
    l1 = 1e-4,
    epochs = 100)
    R Training data

    View Slide

  76. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Training the model
    52
    2.10,2.13,2.19,2.28,2.44,2.62,2.80,3.04,3.36,3.69,3.97,4.24,4.53,4.80,5.02,5.21,5.40,5.57,5.7
    79,5.86,5.92,5.98,6.02,6.06,6.08,6.14,6.18,6.22,6.27,6.32,6.35,6.38,6.45,6.49,6.53,6.57,6.64,
    ,6.73,6.78,6.83,6.88,6.92,6.94,6.98,7.01,7.03,7.05,7.06,7.07,7.08,7.06,7.04,7.03,6.99,6.94,6.
    .83,6.77,6.69,6.60,6.53,6.45,6.36,6.27,6.19,6.11,6.03,5.94,5.88,5.81,5.75,5.68,5.62,5.61,5.54
    9,5.45,5.42,5.38,5.34,5.31,5.30,5.29,5.26,5.23,5.23,5.22,5.20,5.19,5.18,5.19,5.17,5.15,5.14,5
    5.16,5.15,5.15,5.15,5.14,5.14,5.14,5.15,5.14,5.14,5.13,5.15,5.15,5.15,5.14,5.16,5.15,5.15,5.1
    14,5.15,5.15,5.14,5.13,5.14,5.14,5.11,5.12,5.12,5.12,5.09,5.09,5.09,5.10,5.08,5.08,5.08,5.08,
    ,5.05,5.06,5.07,5.05,5.03,5.03,5.04,5.03,5.01,5.01,5.02,5.01,5.01,5.00,5.00,5.02,5.01,4.98,5.
    .00,5.00,4.99,5.00,5.01,5.02,5.01,5.03,5.03,5.02,5.02,5.04,5.04,5.04,5.02,5.02,5.01,4.99,4.98
    6,4.96,4.96,4.94,4.93,4.93,4.93,4.93,4.93,5.02,5.27,5.80,5.94,5.58,5.39,5.32,5.25,5.21,5.13,4
    4.71,4.39,4.05,3.69,3.32,3.05,2.99,2.74,2.61,2.47,2.35,2.26,2.20,2.15,2.10,2.08
    anomaly_model <- h2o.deeplearning(
    x = names(train_ecg),
    training_frame = train_ecg,
    activation = "Tanh",
    autoencoder = TRUE,
    hidden = c(50,20,50),
    sparse = TRUE,
    l1 = 1e-4,
    epochs = 100)
    R Training data

    View Slide

  77. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Training the model
    52
    2.10,2.13,2.19,2.28,2.44,2.62,2.80,3.04,3.36,3.69,3.97,4.24,4.53,4.80,5.02,5.21,5.40,5.57,5.7
    79,5.86,5.92,5.98,6.02,6.06,6.08,6.14,6.18,6.22,6.27,6.32,6.35,6.38,6.45,6.49,6.53,6.57,6.64,
    ,6.73,6.78,6.83,6.88,6.92,6.94,6.98,7.01,7.03,7.05,7.06,7.07,7.08,7.06,7.04,7.03,6.99,6.94,6.
    .83,6.77,6.69,6.60,6.53,6.45,6.36,6.27,6.19,6.11,6.03,5.94,5.88,5.81,5.75,5.68,5.62,5.61,5.54
    9,5.45,5.42,5.38,5.34,5.31,5.30,5.29,5.26,5.23,5.23,5.22,5.20,5.19,5.18,5.19,5.17,5.15,5.14,5
    5.16,5.15,5.15,5.15,5.14,5.14,5.14,5.15,5.14,5.14,5.13,5.15,5.15,5.15,5.14,5.16,5.15,5.15,5.1
    14,5.15,5.15,5.14,5.13,5.14,5.14,5.11,5.12,5.12,5.12,5.09,5.09,5.09,5.10,5.08,5.08,5.08,5.08,
    ,5.05,5.06,5.07,5.05,5.03,5.03,5.04,5.03,5.01,5.01,5.02,5.01,5.01,5.00,5.00,5.02,5.01,4.98,5.
    .00,5.00,4.99,5.00,5.01,5.02,5.01,5.03,5.03,5.02,5.02,5.04,5.04,5.04,5.02,5.02,5.01,4.99,4.98
    6,4.96,4.96,4.94,4.93,4.93,4.93,4.93,4.93,5.02,5.27,5.80,5.94,5.58,5.39,5.32,5.25,5.21,5.13,4
    4.71,4.39,4.05,3.69,3.32,3.05,2.99,2.74,2.61,2.47,2.35,2.26,2.20,2.15,2.10,2.08
    anomaly_model <- h2o.deeplearning(
    x = names(train_ecg),
    training_frame = train_ecg,
    activation = "Tanh",
    autoencoder = TRUE,
    hidden = c(50,20,50),
    sparse = TRUE,
    l1 = 1e-4,
    epochs = 100)
    R Training data
    Model

    View Slide

  78. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Applying the model to a stream of data
    53
    CREATE STREAM ecg_scored AS
    SELECT event_id,
    ANOMALY(SENSORINPUT)
    FROM health_sensor;
    8 6.00#6.04#6.08#6.10#6.14…
    9 5.24#5.22#5.22#5.22#5.23…
    KSQL
    Model
    8 1.24412
    9 2.12952

    View Slide

  79. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Stream the data to Elasticsearch with Kafka Connect
    54
    CREATE STREAM ecg_scored AS
    SELECT event_id,
    ANOMALY(SENSORINPUT)
    FROM health_sensor;
    8 6.00#6.04#6.08#6.10#6.14…
    9 5.24#5.22#5.22#5.22#5.23…
    KSQL
    Model
    Kafka
    Connect Elasticsearch
    8 1.24412
    9 2.12952

    View Slide

  80. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Event-driven, real-time Alerts on IoT data with KSQL
    55
    sensor_raw
    8 6.00#6.04#6.08#6.10#6.14…
    9 5.24#5.22#5.22#5.22#5.23…

    View Slide

  81. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Event-driven, real-time Alerts on IoT data with KSQL
    55
    CREATE STREAM ecg_scored AS
    SELECT event_id,
    ANOMALY(SENSORINPUT)
    FROM health_sensor;
    sensor_raw
    8 6.00#6.04#6.08#6.10#6.14…
    9 5.24#5.22#5.22#5.22#5.23…
    ecg_scored
    8 1.24412
    9 2.12952

    View Slide

  82. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Event-driven, real-time Alerts on IoT data with KSQL
    55
    CREATE STREAM ecg_scored AS
    SELECT event_id,
    ANOMALY(SENSORINPUT)
    FROM health_sensor;
    CREATE STREAM alerts AS
    SELECT * FROM ecg_scored
    WHERE ANOMALY > 1.5
    sensor_raw
    8 6.00#6.04#6.08#6.10#6.14…
    9 5.24#5.22#5.22#5.22#5.23…
    ecg_scored
    8 1.24412
    9 2.12952
    sensor_alerts
    9 2.12952

    View Slide

  83. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Event-driven, real-time Alerts on IoT data with KSQL
    55
    CREATE STREAM ecg_scored AS
    SELECT event_id,
    ANOMALY(SENSORINPUT)
    FROM health_sensor;
    CREATE STREAM alerts AS
    SELECT * FROM ecg_scored
    WHERE ANOMALY > 1.5
    ALERT
    APP
    sensor_raw
    8 6.00#6.04#6.08#6.10#6.14…
    9 5.24#5.22#5.22#5.22#5.23…
    ecg_scored
    8 1.24412
    9 2.12952
    sensor_alerts
    9 2.12952

    View Slide

  84. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Visualise the data in realtime
    56

    View Slide

  85. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Kafka + Deep Learning - resources
    57
    https://github.com/kaiwaehner
    https://www.confluent.io/blog/

    View Slide

  86. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Confluent Open Source : Apache Kafka with a bunch of cool stuff! For free!
    58
    Database Changes Log Events loT Data Web Events …
    CRM
    Data Warehouse
    Database
    Hadoop
    Data

    Integration

    Monitoring
    Analytics
    Custom Apps
    Transformations
    Real-time Applications

    Apache Open Source Confluent Open Source Confluent Enterprise
    Confluent Platform
    Confluent Platform
    Apache Kafka®
    Core | Connect API | Streams API
    Data Compatibility
    Schema Registry
    Monitoring & Administration
    Confluent Control Center | Security
    Operations
    Replicator | Auto Data Balancing
    Development and Connectivity
    Clients | Connectors | REST Proxy | CLI
    Apache Open Source Confluent Open Source Confluent Enterprise
    SQL Stream Processing
    KSQL

    View Slide

  87. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning
    Free Books!
    59
    https://www.confluent.io/apache-kafka-stream-processing-book-bundle

    View Slide

  88. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 60
    @rmoff [email protected]
    https://slackpass.io/confluentcommunity
    https://www.confluent.io/download/

    View Slide

  89. @rmoff / Processing IoT data with Apache Kafka, KSQL, and Machine Learning 61
    #EOF

    View Slide