Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lean Enterprise with Microservices and Big Data

Lean Enterprise with Microservices and Big Data

Johann Romefort

December 10, 2014
Tweet

More Decks by Johann Romefort

Other Decks in Technology

Transcript

  1. My Background • Seesmic - Co-founder & CTO Video conversation

    platform Social media clients…lots of pivots :) • Rainbow - Co-founder & CTO Enterprise App Store
  2. Goal of this presentation • Understand what is the Lean

    Enterprise, how it relates to big data and the software architecture you build • Have a basic understanding of the technologies and tools involved
  3. What is the Lean Enterprise? http://en.wikipedia.org/wiki/Lean_enterprise “Lean enterprise is a

    practice focused on value creation for the end customer with minimal waste and processes.”
  4. Enabling the OODA Loop ! ! “Get inside your adversaries'

    OODA loop to disorient them” ! OBSERVE ORIENT DECIDE ACT USAF Colonel John Boyd on Combat: OODA Loop
  5. OODA Loop • (Observe) Innovation and (Decide) Culture are mainly

    human-based • Orient (BigData) and Act (Cloud) can be automated
  6. What is Big Data? • It’s data at the intersection

    of 3 V: • Velocity (Batch / Real time / Streaming) • Volume (Terabytes/Petabytes) • Variety (structure/semi-structured/unstructured)
  7. Why is everybody talking about it? • Cost of generation

    of data has gone down • By 2015, 3B people will be online, pushing data volume created to 8 zettabytes • More data = More insights = Better decisions • Ease and cost of processing is falling thanks to cloud platforms
  8. Data flow and constraints Generate Ingest / Store Process Visualize

    / Share The 3 V involve heterogeneity and make it hard to achieve those steps
  9. What is AWS? • AWS is a cloud computing platform

    • On-demand delivery of IT resources • Pay-as-you-go pricing model
  10. Cloud Computing + + Storage Compute Networking Adapts dynamically to

    ever changing needs to stick closely to user infrastructure and applications requirements
  11. How does AWS helps with Big Data? • Remove constraints

    on the ingesting, storing, and processing layer and adapts closely to demands. • Provides a collection of integrated tools to adapt to the 3 V’s of Big Data
 • Unlimited capacity of storage and processing power fits well to changing data storage and analysis requirements.
  12. Computing Solutions for Big Data on AWS EC2 All-purpose computing

    instances. Dynamic Provisioning and resizing Let you scale your infrastructure at low cost Use Case: Well suited for running custom or proprietary application (ex: SAP Hana, Tableau…)
  13. Computing Solutions for Big Data on AWS EMR ‘Hadoop in

    the cloud’ Adapt to complexity of the analysis and volume of data to process Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
  14. Computing Solutions for Big Data on AWS Kinesis Stream Processing

    Real-time data Scale to adapt to the flow of inbound data Use Case: Complex Event Processing, click streams, sensors data, computation over window of time
  15. Computing Solutions for Big Data on AWS RedShift Data Warehouse

    in the cloud Scales to Petabytes Supports SQL Querying Start small for just $0.25/h Use Case: BI Analysis, Use of ODBC/JDBC legacy software to analyze or visualize data
  16. Storage Solution for Big Data on AWS DynamoDB NoSQL Database

    Consistent Low latency access Column-base flexible data model Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
  17. Storage Solution for Big Data on AWS S3 Use Case:

    Backups and Disaster recovery, Media storage, Storage for data analysis Versatile storage system Low-cost Fast retrieving of data
  18. Storage Solution for Big Data on AWS Glacier Use Case:

    Storing raw logs of data. Storing media archives. Magnetic tape replacement Archive storage of cold data Extremely low-cost optimized for data infrequently accessed
  19. Given the 3V’s a collection of tools is most of

    the time needed for your data processing and storage. Integrated Environment for Big Data AWS Big Data solutions comes integrated with each others already AWS Big Data solutions also integrate with the whole AWS ecosystem (Security, Identity Management, Logging, Backups, Management Console…)
  20. Tightly integrated rich environment of tools On-demand scaling sticking to

    processing requirements + = Extremely cost-effective and easy to deploy solution for big data needs
  21. • Error Detection: Real-time detection of hardware problems • Optimization

    and Energy management Use Case: Real-time IOT Analytics Gathering data in real time from sensors deployed in factory and send them for immediate processing
  22. First Version of the infrastructure Aggregate Sensors data nodejs stream

    processor On customer site evaluate rules over time window in-house hadoop cluster mongodb feed algorithm write raw data for further processing backup
  23. Version of the infrastructure ported to AWS Aggregate Sensors data

    On customer site evaluate rules over time window write raw data for archiving Kinesis RedShift for BI analysis Glacier
  24. ACT

  25. First year @seesmic • Prototype becomes production • Monolithic architecture

    • No analytics/metrics • Little monitoring • Little automated testing
  26. Impact on dev team • Frustration of slow release process

    • Lots of back and forth due to bugs and the necessity to test app all over each time • Chain of command too long • Feeling no power in the process • Low trust
  27. Impact on product team • Frustration of not executing fast

    enough • Frustration of having to ask for everything (like metrics) • Feeling engineers always have the last word
  28. • Break down software into smaller autonomous units • Break

    down teams into smaller autonomous units • Automating and tooling, CI / CD • Plan for the worst What can you do?
  29. Amazon’s “two-pizza teams” • 6 to 10 people; you can

    feed them with two pizzas. • It’s not about size, but about accountability and autonomy • Each team has its own fitness function
  30. • Full devops model: good tooling needed • Still need

    to be designed for resiliency • Harder to test Friction points
  31. Continuous Integration (CI) is the practice, in software engineering, of

    merging all developer working copies with a shared mainline several times a day
  32. Tools for Continuous Integration • Jenkins (Open Source, Lot of

    plugins, hard to configure) • Travis CI (Look better, less plugins)
  33. Tools for Continuous Deployment • GO.cd (Open-Source) • shippable.com (SaaS,

    Docker support) • Code Deploy (AWS) + Puppet, Chef, Ansible, Salt, Docker…
  34. Impact on dev • Autonomy • Not afraid to try

    new things • More confident in codebase • Don’t have to linger around with old bugs until there’s a release
  35. Impact on product team • Iterate faster on features •

    Can make, bake and break hypothesis faster • Product gets improved incrementally everyday
  36. • Enabling Microservices architecture • Enabling better testing • Enabling

    devops model • Come talk to the Docker team tomorrow!