Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic Data Pipelines with DC/OS on Microsoft Azure

Elastic Data Pipelines with DC/OS on Microsoft Azure

BigData.be meetup

Michael Hausenblas

May 17, 2016
Tweet

More Decks by Michael Hausenblas

Other Decks in Technology

Transcript

  1. © 2016 Mesosphere, Inc. All Rights Reserved. ELASTIC DATA PIPELINES

    WITH
 DC/OS ON AZURE 1 Michael Hausenblas | 2016-05-17 | 37th BigData.be Meetup, Brussels
  2. © 2016 Mesosphere, Inc. All Rights Reserved. sys admin devops

    developer architect data engineer data scientist
  3. © 2015 Mesosphere, Inc. All Rights Reserved. LET'S TALK ABOUT

    WORKLOADS* … 3 *) kudos to Timothy St. Clair, @timothysc batch streaming PaaS MapReduce
  4. © 2015 Mesosphere, Inc. All Rights Reserved. • Apache Kafka

    • ØMQ, RabbitMQ, Disque (Redis-based), etc. • fluentd, Logstash, Flume • Akka streams • cloud-only: AWS SQS, Google Cloud Pub/Sub • see also queues.io MESSAGE QUEUES & ROUTERS 4
  5. © 2015 Mesosphere, Inc. All Rights Reserved. STREAM PROCESSING PLATFORMS

    5 • Apache Storm • Apache Spark • Apache Samza • Apache Flink • Concord • cloud-only: AWS Kinesis, Google Cloud Dataflow • see also my webinar on stream processing
  6. © 2015 Mesosphere, Inc. All Rights Reserved. TIME SERIES DATASTORES

    6 • InfluxDB • OpenTSDB • KairosDB • Prometheus • see also iot-a.info
  7. © 2015 Mesosphere, Inc. All Rights Reserved. CHALLENGES 7 •

    Set up and operation of components • Elasticity: static vs. dynamic partitioning • Efficient usage of resources (utilization/TCO)
  8. © 2016 Mesosphere, Inc. All Rights Reserved. DISTRIBUTED APPLICATION 10

    hardware OS app hardware OS hardware OS hardware OS hardware OS hardware OS hardware OS
  9. © 2016 Mesosphere, Inc. All Rights Reserved. DISTRIBUTED OS +

    DISTRIBUTED APP 11 hardware OS app hardware OS hardware OS hardware OS hardware OS hardware OS hardware OS distributed OS
  10. © 2016 Mesosphere, Inc. All Rights Reserved. LINUX
 CONTAINERS 14

    The why and the what: • Containers vs VMs • app-level dependency management • lightweight (startup time, footprint, average runtime) • isolation & security
  11. © 2016 Mesosphere, Inc. All Rights Reserved. LINUX
 CONTAINERS 15

    • namespaces • Isolate PIDs between processes • Isolate process to network resources • Isolate the hostname to fake it out (UTS) • Isolate the filesystem mount points (chroot) • Isolate inter process communication (IPC) • Isolate specific users to specific processes • cgroups
 https://sysadmincasts.com/episodes/14-introduction-to-linux-control-groups-cgroups
  12. © 2016 Mesosphere, Inc. All Rights Reserved. DC/OS BENEFITS 21

    • One cluster for • stateless services such as Web servers & app servers (via Marathon) • stateful services like PostgreSQL, MemSQL, Kafka, Cassandra, etc. • elastic data processing via Spark, Akka, etc. • CI/CD, for example Jenkins+Marathon • Dynamic partitioning of your cluster, depending on your needs • Increased utilization (10% → 80%+)
  13. © 2015 Mesosphere, Inc. All Rights Reserved. 26 A SLIGHTLY

    MORE COMPLEX EXAMPLE mesosphere.com/blog/2015/11/18/dcos-time-series-demo/
  14. © 2016 Mesosphere, Inc. All Rights Reserved. 28 WHERE CAN


    I LEARN MORE? http://shop.oreilly.com/product/9781939902184.do 28 http://shop.oreilly.com/product/0636920035671.do
  15. © 2016 Mesosphere, Inc. All Rights Reserved. 29 WHERE CAN


    I LEARN MORE? 29 https://www.nginx.com/resources/library/docker-networking/
  16. © 2016 Mesosphere, Inc. All Rights Reserved. 30 WHERE CAN


    I LEARN MORE? http://shop.oreilly.com/product/0636920039952.do https://manning.com/books/mesos-in-action 30
  17. © 2016 Mesosphere, Inc. All Rights Reserved. Q & A

    31 • @mhausenblas • mhausenblas.info • [email protected] https://dcos.io