Slide 1

Slide 1 text

© 2015 Mesosphere, Inc. All Rights Reserved. IOT DATA PROCESSING & ANALYTICS 101 1 Michael Hausenblas, Developer & Cloud Advocate | 2015-11-03 | EclipseCon

Slide 2

Slide 2 text

© 2015 Mesosphere, Inc. All Rights Reserved. WHY BOTHER? 2

Slide 3

Slide 3 text

© 2015 Mesosphere, Inc. All Rights Reserved. 3

Slide 4

Slide 4 text

© 2015 Mesosphere, Inc. All Rights Reserved. AIRLINES 4

Slide 5

Slide 5 text

© 2015 Mesosphere, Inc. All Rights Reserved. LOGISTICS 5

Slide 6

Slide 6 text

© 2015 Mesosphere, Inc. All Rights Reserved. HEALTH
 CARE 6

Slide 7

Slide 7 text

© 2015 Mesosphere, Inc. All Rights Reserved. TRADERS 7

Slide 8

Slide 8 text

© 2015 Mesosphere, Inc. All Rights Reserved. FARMERS 8

Slide 9

Slide 9 text

© 2015 Mesosphere, Inc. All Rights Reserved. CITIES 9 © 2014, Wired magazine

Slide 10

Slide 10 text

© 2015 Mesosphere, Inc. All Rights Reserved. YOU 10

Slide 11

Slide 11 text

© 2015 Mesosphere, Inc. All Rights Reserved. OVERALL FOCUS 11 Devices IoT Gateways Networks Backend Systems iot.eclipse.org

Slide 12

Slide 12 text

© 2015 Mesosphere, Inc. All Rights Reserved. THE
 TOOLBOX 12

Slide 13

Slide 13 text

© 2015 Mesosphere, Inc. All Rights Reserved. LET'S TALK ABOUT WORKLOADS* … 13 *) kudos to Timothy St. Clair, @timothysc batch streaming PaaS MapReduce

Slide 14

Slide 14 text

© 2015 Mesosphere, Inc. All Rights Reserved. • Kafka • ØMQ, RabbitMQ, Disque (Redis-based), etc. • fluentd, Logstash, Flume, etc. • Akka streams • cloud-only: AWS SQS, Google Cloud Pub/Sub • see also queues.io MESSAGE QUEUES & ROUTERS 14

Slide 15

Slide 15 text

© 2015 Mesosphere, Inc. All Rights Reserved. APACHE KAFKA 15 • High-throughput, distributed, persistent publish-subscribe messaging system • Originates from LinkedIn • Typically used as buffer/de-coupling layer in online stream processing Message queues & routers kafka.apache.org

Slide 16

Slide 16 text

© 2015 Mesosphere, Inc. All Rights Reserved. FLUENTD 16 Message queues & routers www.fluentd.org

Slide 17

Slide 17 text

© 2015 Mesosphere, Inc. All Rights Reserved. STREAM PROCESSING PLATFORMS 17 • Storm • Spark • Samza • Flink • Concord • cloud-only: AWS Kinesis, Google Cloud Dataflow • see also my webinar on stream processing

Slide 18

Slide 18 text

© 2015 Mesosphere, Inc. All Rights Reserved. APACHE STORM 18 • Distributed, fault-tolerant stream- processing platform • Guaranteed message processing (replaying messages on failure) • Concepts: tuples, streams, spouts, bolts, topologies Stream processing platforms storm.apache.org

Slide 19

Slide 19 text

© 2015 Mesosphere, Inc. All Rights Reserved. APACHE SPARK 19 Stream processing platforms spark.apache.org Spark SQL Spark Streaming MLlib
 (machine learning) Spark core (RDD) GraphX
 (graph processing) Mesos Filesystem (local, HDFS, S3) or data store (HBase, Cassandra, Elasticsearch, etc.) YARN Standalone

Slide 20

Slide 20 text

© 2015 Mesosphere, Inc. All Rights Reserved. TIME SERIES DATASTORES 20 • InfluxDB • OpenTSDB • KairosDB • Prometheus • see also iot-a.info

Slide 21

Slide 21 text

© 2015 Mesosphere, Inc. All Rights Reserved. OPENTSDB 21 • Distributed time series database on top HBase • Store, index, query & plot metrics • Extremely scalable • Low-level monitoring time series datastores opentsdb.net

Slide 22

Slide 22 text

© 2015 Mesosphere, Inc. All Rights Reserved. INFLUXDB 22 • No-dependency, time series database written in Go • SQLish query language (incl. regex, fan out) • Single node or Raft-based distributed node mode time series datastores influxdb.com

Slide 23

Slide 23 text

© 2015 Mesosphere, Inc. All Rights Reserved. MEET THE DATACENTER
 OPERATING
 SYSTEM
 (DCOS) 23

Slide 24

Slide 24 text

© 2015 Mesosphere, Inc. All Rights Reserved. LOCAL OS VS. DISTRIBUTED OS 24 http://bitly.com/os-vs-dcos

Slide 25

Slide 25 text

© 2015 Mesosphere, Inc. All Rights Reserved. DCOS IS A DISTRIBUTED OPERATING SYSTEM 25 • local OS per node (+container enabled) • scheduling (long-lived, batch) • networking • service discovery • stateful services • security • monitoring, logging, debugging

Slide 26

Slide 26 text

© 2015 Mesosphere, Inc. All Rights Reserved. 26

Slide 27

Slide 27 text

© 2015 Mesosphere, Inc. All Rights Reserved. 27

Slide 28

Slide 28 text

© 2015 Mesosphere, Inc. All Rights Reserved. BENEFITS 28 DCOS • Run stateless services such as nginx or Java app server, etc. and Big Data services like Spark, Kafka, Cassandra, etc. together on one cluster • Dynamic partitioning of your cluster, depending on your needs (business requirements) • Increased utilization: ca. 10% → 80%+

Slide 29

Slide 29 text

© 2015 Mesosphere, Inc. All Rights Reserved. DEMO
 TIME! 29

Slide 30

Slide 30 text

© 2015 Mesosphere, Inc. All Rights Reserved. MESOSPHERE IS HIRING, WORLDWIDE … San Francisco New York Hamburg https://mesosphere.com/careers/

Slide 31

Slide 31 text

© 2015 Mesosphere, Inc. All Rights Reserved. Q & A 31 • @mhausenblas • @mesosphere • mesosphere.com/infinity