Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Phoenix Data Conference 2014 - Shekhar Vemuri

Phoenix Data Conference 2014 - Shekhar Vemuri

Big Data in the Cloud

teamclairvoyant

October 25, 2014
Tweet

More Decks by teamclairvoyant

Other Decks in Technology

Transcript

  1. ABOUT • PRINCIPAL at CLAIRVOYANT • PRODUCT, DATA, ANALYTICS and

    CLOUD • large scale web and data systems • simple, lightweight solutions
  2. QUICK POLL • HADOOP, HIVE, PIG • PUBLIC CLOUD, IaaS,

    SaaS • AMAZON AWS, EC2 • ELASTICITY • S3, EMR, KINESIS • IoT
  3. USE CASES RISK  MODELING PERSONALIZED   MEDICINE AD  TARGETING INTERNET

     OF   THINGS THREAT   ANALYSIS RECOMMENDATIONS SURVEILLANCE RETENTION 360  CUSTOMER   VIEW
  4. DRIVING FACTORS • variety in data • not just transactional

    data • potential for tremendous insight - when combining transactional data with additional data sources • LinkedIn, Twitter, Facebook, Pinterest , Open Data • Internet of Things
  5. the CLOUD • IaaS, SaaS • on demand subscription •

    subscription vs owning • tradeoff • ease of adoption • powering nextgen entrepreneurship
  6. AMAZON  EMR AMAZON  S3 AMAZON  EC2 LOG  FILES ReST CLIENTS

    WEB  APP,  REST  APIs AMAZON  REDSHIFT LOG  FILES  -­‐  STORED  in  S3 MAP-­‐REDUCE,  HIVE,     PIG,  CASCADING  jobs   STORE  summarized  data
  7. AMAZON  EMR AMAZON  S3 AMAZON  EC2 LOG  FILES ReST CLIENTS

    WEB  APP,  REST  APIs LOG  FILES  -­‐  STORED  in  S3 MAP-­‐REDUCE,  HIVE,     PIG,  CASCADING  jobs   CLOUDERA  IMPALA
  8. BUILDING BLOCKS • amazon AWS • amazon EMR • amazon

    S3 • kinesis • redshift • spot instances
  9. HEADER Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed

    do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud. ! SUBHEADER ! exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. ! Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. ! Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
  10. PROS • like other cloud solutions - reduces the barrier

    to adoption • especially if you are already in the cloud • can provide ability to implement quick POCs
  11. HEADER Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed

    do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud. ! SUBHEADER ! exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. ! Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. ! Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Category 4 Category 3 Category 2 Category 1 0 1.3 2.5 3.8 5 Series 3 Series 2 Series 1
  12. CONS • depending on your current infrastructure - may end

    up continually replicating data • data security, privacy
  13. LEARNINGS • Build platforms once the need is strongly felt

    • Prepare to Fail fast, couple of times before the final version • what you think will happen, will not
  14. LEARNINGS • COSTS can spiral out of control • Leverage

    spot instances to reduce costs, especially for bursty workloads • S3 Can be very slow to run and initialize large workloads • especially in recovery scenarios • but data resiliency is not an issue