Slide 1

Slide 1 text

© 2016 Mesosphere, Inc. All Rights Reserved. DEVELOPING ELASTIC DATA PIPELINES 1 Michael Hausenblas, Developer & Cloud Advocate | 2016-03-14 | Webinar

Slide 2

Slide 2 text

© 2016 Mesosphere, Inc. All Rights Reserved. MOTIVATION 2

Slide 3

Slide 3 text

© 2016 Mesosphere, Inc. All Rights Reserved. AIRLINES 3

Slide 4

Slide 4 text

© 2016 Mesosphere, Inc. All Rights Reserved. LOGISTICS 4

Slide 5

Slide 5 text

© 2016 Mesosphere, Inc. All Rights Reserved. HEALTH
 CARE 5

Slide 6

Slide 6 text

© 2016 Mesosphere, Inc. All Rights Reserved. TRADERS 6

Slide 7

Slide 7 text

© 2016 Mesosphere, Inc. All Rights Reserved. CITIES 7 © 2014, Wired magazine

Slide 8

Slide 8 text

© 2016 Mesosphere, Inc. All Rights Reserved. YOU 8

Slide 9

Slide 9 text

© 2016 Mesosphere, Inc. All Rights Reserved. THE
 TOOLBOX 9

Slide 10

Slide 10 text

© 2016 Mesosphere, Inc. All Rights Reserved. LET'S TALK ABOUT WORKLOADS* … 10 *) kudos to Timothy St. Clair, @timothysc batch streaming PaaS MapReduce

Slide 11

Slide 11 text

© 2016 Mesosphere, Inc. All Rights Reserved. • Apache Kafka • ØMQ, RabbitMQ, Disque (Redis-based), etc. • fluentd, Logstash, Flume • Akka streams • cloud-only: AWS SQS, Google Cloud Pub/Sub • see also queues.io MESSAGE QUEUES & ROUTERS 11

Slide 12

Slide 12 text

© 2016 Mesosphere, Inc. All Rights Reserved. APACHE KAFKA 12 • High-throughput, distributed, persistent publish-subscribe messaging system • Originates from LinkedIn • Typically used as buffer/de-coupling layer in online stream processing Message queues & routers kafka.apache.org

Slide 13

Slide 13 text

© 2016 Mesosphere, Inc. All Rights Reserved. FLUENTD 13 Message queues & routers www.fluentd.org

Slide 14

Slide 14 text

© 2016 Mesosphere, Inc. All Rights Reserved. STREAM PROCESSING PLATFORMS 14 • Apache Storm • Apache Spark • Apache Samza • Apache Flink • Concord • cloud-only: AWS Kinesis, Google Cloud Dataflow • see also my webinar on stream processing

Slide 15

Slide 15 text

© 2016 Mesosphere, Inc. All Rights Reserved. APACHE STORM 15 • Distributed, fault-tolerant stream- processing platform • Guaranteed message processing (replaying messages on failure) • Concepts: tuples, streams, spouts, bolts, topologies Stream processing platforms storm.apache.org

Slide 16

Slide 16 text

© 2016 Mesosphere, Inc. All Rights Reserved. APACHE SPARK 16 Stream processing platforms spark.apache.org Spark SQL Spark Streaming MLlib
 (machine learning) Spark core (RDD) GraphX
 (graph processing) Mesos Filesystem (local, HDFS, S3) or data store (HBase, Cassandra, Elasticsearch, etc.) YARN Standalone

Slide 17

Slide 17 text

© 2016 Mesosphere, Inc. All Rights Reserved. TIME SERIES DATASTORES 17 • InfluxDB • OpenTSDB • KairosDB • Prometheus • see also iot-a.info

Slide 18

Slide 18 text

© 2016 Mesosphere, Inc. All Rights Reserved. OPENTSDB 18 • Distributed time series database on top HBase • Store, index, query & plot metrics • Extremely scalable • Low-level monitoring Time series datastores opentsdb.net

Slide 19

Slide 19 text

© 2016 Mesosphere, Inc. All Rights Reserved. INFLUXDB 19 • No-dependency, time series database written in Go • SQLish query language (incl. regex, fan out) • Single node or Raft-based distributed node mode Time series datastores influxdb.com

Slide 20

Slide 20 text

© 2016 Mesosphere, Inc. All Rights Reserved. CHALLENGES 20 The Toolbox • Distributed systems are hard • Set up and operation of components • One (static) cluster per component • Efficient usage of cluster resources (TCO)

Slide 21

Slide 21 text

© 2016 Mesosphere, Inc. All Rights Reserved. MEET THE DATACENTER
 OPERATING
 SYSTEM
 21

Slide 22

Slide 22 text

© 2016 Mesosphere, Inc. All Rights Reserved. 22

Slide 23

Slide 23 text

© 2016 Mesosphere, Inc. All Rights Reserved. BENEFITS 23 DCOS • Run stateless services (Web server, app server, etc.) and Big Data services like Kafka, Spark, or Cassandra together on one cluster • Dynamic partitioning of your cluster, depending on your business requirements • Increased utilization (10% → 80%++)

Slide 24

Slide 24 text

© 2016 Mesosphere, Inc. All Rights Reserved. INFINITY 24 DCOS https://mesosphere.com/infinity/

Slide 25

Slide 25 text

© 2016 Mesosphere, Inc. All Rights Reserved. ELASTIC DATA PIPELINES BY
 EXAMPLE 25

Slide 26

Slide 26 text

© 2016 Mesosphere, Inc. All Rights Reserved. A SIMPLE DATA PIPELINE 26 Examples https://github.com/mesosphere/cassandra-kairosdb-tutorial

Slide 27

Slide 27 text

© 2016 Mesosphere, Inc. All Rights Reserved. HYBRID DATA PIPELINE 27 Examples https://mesosphere.com/blog/2015/11/18/dcos-time-series-demo

Slide 28

Slide 28 text

© 2016 Mesosphere, Inc. All Rights Reserved. HANDS- ON … 28 Examples

Slide 29

Slide 29 text

© 2016 Mesosphere, Inc. All Rights Reserved. Q & A 29 • @mhausenblas • mhausenblas.info • @mesosphere • mesosphere.com