Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Google-scale computing for the people

Google-scale computing for the people

Presented at the DevOps Master Class NYC with Opbeat & DigitalOcean in November 2014

Avatar for Joonas Bergius

Joonas Bergius

November 17, 2014
Tweet

More Decks by Joonas Bergius

Other Decks in Technology

Transcript

  1. A little bit of background • Lessons learned from building

    services at DigitalOcean • Observations from across the industry: • What are the most innovative and successful companies doing? • How can we leverage what they are doing to enable ourselves to innovate faster? @prometheus 3
  2. Challenges you may be familiar with: • Snowflakes in your

    infrastructure • Hardware & software failures • Scaling services is hard @prometheus 4
  3. Snowflakes in your infrastructure • We all have/had these. •

    Easily forgotten, until that one business critical thing they were responsible for does not happen. • When it comes to operations, they are finicky at best. @prometheus 5
  4. Hardware & software failures Disk drives, for example, can exhibit

    annualized failure rates higher than 4%. Different deployments have reported between 1.2 and 16 average server-level restarts per year. — Barroso and Hölzle in “The Data Center as a computer” @prometheus 7
  5. Scaling services is hard • You start with a monolith

    deployed to a single server • Over time you break things down in to different components • Coordinating them becomes increasingly more involved • Is this really the business you are in? @prometheus 8
  6. “The Data Center as a computer” [Resource Management] is perhaps

    the most indispensable component of the cluster-level infrastructure layer. It controls the mapping of user tasks to hardware resources, enforces priorities and quotas, and provides basic task management services. A more useful version should present a higher level of abstraction, automate allocation of resources, and allow resource sharing at a finer level of granularity. — Barroso and Hölzle in “The Data Center as a computer” @prometheus 9
  7. Shifting your mental model • Think in services instead of

    servers • Do you care about where it runs or that it just runs? • Think in resources instead of servers • Do you care about where it runs or what it needs to run? @prometheus 10
  8. What is Mesos? Mesos abstracts CPU, memory, storage, and other

    compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesos runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elastic Search) with API’s for resource management and scheduling across entire datacenter and cloud environments. @prometheus 11
  9. “What I learned from my time at Netflix” — @adrianco

    • Speed wins in the marketplace • Remove friction from product development • High trust, low process, no hand-offs between teams • Freedom and responsibility culture • Don’t do your own undifferentiated heavy lifting • Use simple patterns automated by tooling • Self service makes impossible things instant @prometheus 17
  10. Resources to get you started: • For further information about

    Mesos: • https://mesos.apache.org • For tutorials, tools & developer r esources: • https://mesosphere.io/learn/ • For deploying Mesosphere on DigitalOcean: • https://digitalocean.mesosphere.com/ @prometheus 19