Google-scale computing for the people

Google-scale computing for the people Joonas Bergius @prometheus 1

$ whoami Joonas Bergius, @prometheus Director of Engineering DigitalOcean @prometheus
2

A little bit of background • Lessons learned from building
services at DigitalOcean • Observations from across the industry: • What are the most innovative and successful companies doing? • How can we leverage what they are doing to enable ourselves to innovate faster? @prometheus 3

Challenges you may be familiar with: • Snowﬂakes in your
infrastructure • Hardware & software failures • Scaling services is hard @prometheus 4

Snowflakes in your infrastructure • We all have/had these. •
Easily forgotten, until that one business critical thing they were responsible for does not happen. • When it comes to operations, they are ﬁnicky at best. @prometheus 5

Hardware & software failures @prometheus 6

Hardware & software failures Disk drives, for example, can exhibit
annualized failure rates higher than 4%. Different deployments have reported between 1.2 and 16 average server-level restarts per year. — Barroso and Hölzle in “The Data Center as a computer” @prometheus 7

Scaling services is hard • You start with a monolith
deployed to a single server • Over time you break things down in to different components • Coordinating them becomes increasingly more involved • Is this really the business you are in? @prometheus 8

“The Data Center as a computer” [Resource Management] is perhaps
the most indispensable component of the cluster-level infrastructure layer. It controls the mapping of user tasks to hardware resources, enforces priorities and quotas, and provides basic task management services. A more useful version should present a higher level of abstraction, automate allocation of resources, and allow resource sharing at a ﬁner level of granularity. — Barroso and Hölzle in “The Data Center as a computer” @prometheus 9

Shifting your mental model • Think in services instead of
servers • Do you care about where it runs or that it just runs? • Think in resources instead of servers • Do you care about where it runs or what it needs to run? @prometheus 10

What is Mesos? Mesos abstracts CPU, memory, storage, and other
compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesos runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elastic Search) with API’s for resource management and scheduling across entire datacenter and cloud environments. @prometheus 11

Mesos architecture @prometheus 12

How it all works @prometheus 13

Frameworks for Mesos @prometheus 14

Putting it all together @prometheus 15

So that’s great, but how can I use this for
my benefit? @prometheus 16

“What I learned from my time at Netflix” — @adrianco
• Speed wins in the marketplace • Remove friction from product development • High trust, low process, no hand-offs between teams • Freedom and responsibility culture • Don’t do your own undifferentiated heavy lifting • Use simple patterns automated by tooling • Self service makes impossible things instant @prometheus 17

Okay, so how do I get started? @prometheus 18

Resources to get you started: • For further information about
Mesos: • https://mesos.apache.org • For tutorials, tools & developer r esources: • https://mesosphere.io/learn/ • For deploying Mesosphere on DigitalOcean: • https://digitalocean.mesosphere.com/ @prometheus 19

Thanks! Questions? @prometheus 20

Google-scale computing for the people

Google-scale computing for the people

Joonas Bergius

More Decks by Joonas Bergius

Other Decks in Technology

Featured

Transcript