Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Heeren Sharma

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Heeren Sharma

Data Collection and Processing using AWS together with Docker to orchestrate different services/application for data centric Solutions

Avatar for Munich DataGeeks

Munich DataGeeks

April 30, 2015
Tweet

More Decks by Munich DataGeeks

Other Decks in Programming

Transcript

  1. Agenda • A little about AWS • Case Study -

    Data Collection and processing • Docker (better say Fig) jumps in • Best of both worlds
  2. AWS

  3. Case Study • Data Processing pipeline • Simple Terms -

    Stream of incoming data, need to collect it (Intelligently), and process it further • Players - Data Providers, SQS, SNS, S3, EC2, EBS, Kinesis (for super advance use cases - optional)
  4. How - Step By Step • SNS - Notification service

    -> Notification get Publish here • SQS - Queuing service -> Subscribed to SNS notifications • S3 - Storage service -> SQS messages point to S3 files • EBS - Hard Disk -> Store data “locally” for processing • EC2 - Computing machine -> Data Processing time • Store processed data on S3 bucket + publish its path in SNS notification
  5. Good Practices • Kinesis vs SQS - Kinesis wins. Why?

    Message ordering, low latency, Multiple consumers and producers handling • Nomenclature of S3 data files - timestamp based • Based on streaming data, choose the resources.
  6. How to realize Docker • Start building containers and in

    respect Docker images • Can couple various services - But try to create as much independent services in containers as possible. • Push them on S3 bucket or on your own docker registry • Pull docker images from registry, build them, make it up. :-)
  7. Resources • Awsome talk about data processing - https://www.youtube.com/watch? v=yO3SBU6vVKA

    • Docker - https://www.docker.com/ • Docker Compose - https://docs.docker.com/compose/ • Fig - http://www.fig.sh/ - Legacy • For Mac Users - Boot2Docker - http://boot2docker.io/ • For advance data streaming - Amazon Kinesis - http://docs.aws.amazon.com/ kinesis/latest/dev/kinesis-sample-application.html • AWS - Google bitte!! • Email - [email protected]