Upgrade to Pro — share decks privately, control downloads, hide ads and more …

IoT Data Processing and Analytics 101

IoT Data Processing and Analytics 101

Michael Hausenblas

November 03, 2015
Tweet

More Decks by Michael Hausenblas

Other Decks in Technology

Transcript

  1. © 2015 Mesosphere, Inc. All Rights Reserved. IOT DATA PROCESSING

    & ANALYTICS 101 1 Michael Hausenblas, Developer & Cloud Advocate | 2015-11-03 | EclipseCon
  2. © 2015 Mesosphere, Inc. All Rights Reserved. OVERALL FOCUS 11

    Devices IoT Gateways Networks Backend Systems iot.eclipse.org
  3. © 2015 Mesosphere, Inc. All Rights Reserved. LET'S TALK ABOUT

    WORKLOADS* … 13 *) kudos to Timothy St. Clair, @timothysc batch streaming PaaS MapReduce
  4. © 2015 Mesosphere, Inc. All Rights Reserved. • Kafka •

    ØMQ, RabbitMQ, Disque (Redis-based), etc. • fluentd, Logstash, Flume, etc. • Akka streams • cloud-only: AWS SQS, Google Cloud Pub/Sub • see also queues.io MESSAGE QUEUES & ROUTERS 14
  5. © 2015 Mesosphere, Inc. All Rights Reserved. APACHE KAFKA 15

    • High-throughput, distributed, persistent publish-subscribe messaging system • Originates from LinkedIn • Typically used as buffer/de-coupling layer in online stream processing Message queues & routers kafka.apache.org
  6. © 2015 Mesosphere, Inc. All Rights Reserved. STREAM PROCESSING PLATFORMS

    17 • Storm • Spark • Samza • Flink • Concord • cloud-only: AWS Kinesis, Google Cloud Dataflow • see also my webinar on stream processing
  7. © 2015 Mesosphere, Inc. All Rights Reserved. APACHE STORM 18

    • Distributed, fault-tolerant stream- processing platform • Guaranteed message processing (replaying messages on failure) • Concepts: tuples, streams, spouts, bolts, topologies Stream processing platforms storm.apache.org
  8. © 2015 Mesosphere, Inc. All Rights Reserved. APACHE SPARK 19

    Stream processing platforms spark.apache.org Spark SQL Spark Streaming MLlib
 (machine learning) Spark core (RDD) GraphX
 (graph processing) Mesos Filesystem (local, HDFS, S3) or data store (HBase, Cassandra, Elasticsearch, etc.) YARN Standalone
  9. © 2015 Mesosphere, Inc. All Rights Reserved. TIME SERIES DATASTORES

    20 • InfluxDB • OpenTSDB • KairosDB • Prometheus • see also iot-a.info
  10. © 2015 Mesosphere, Inc. All Rights Reserved. OPENTSDB 21 •

    Distributed time series database on top HBase • Store, index, query & plot metrics • Extremely scalable • Low-level monitoring time series datastores opentsdb.net
  11. © 2015 Mesosphere, Inc. All Rights Reserved. INFLUXDB 22 •

    No-dependency, time series database written in Go • SQLish query language (incl. regex, fan out) • Single node or Raft-based distributed node mode time series datastores influxdb.com
  12. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL OS VS.

    DISTRIBUTED OS 24 http://bitly.com/os-vs-dcos
  13. © 2015 Mesosphere, Inc. All Rights Reserved. DCOS IS A

    DISTRIBUTED OPERATING SYSTEM 25 • local OS per node (+container enabled) • scheduling (long-lived, batch) • networking • service discovery • stateful services • security • monitoring, logging, debugging
  14. © 2015 Mesosphere, Inc. All Rights Reserved. BENEFITS 28 DCOS

    • Run stateless services such as nginx or Java app server, etc. and Big Data services like Spark, Kafka, Cassandra, etc. together on one cluster • Dynamic partitioning of your cluster, depending on your needs (business requirements) • Increased utilization: ca. 10% → 80%+
  15. © 2015 Mesosphere, Inc. All Rights Reserved. MESOSPHERE IS HIRING,

    WORLDWIDE … San Francisco New York Hamburg https://mesosphere.com/careers/
  16. © 2015 Mesosphere, Inc. All Rights Reserved. Q & A

    31 • @mhausenblas • @mesosphere • mesosphere.com/infinity