Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Vertical to Horizontal

From Vertical to Horizontal

A presentation about the challenges of scalability in the cloud, given at LinuxWochen Wien 2017

Pierre-Yves Ritschard

May 05, 2017

More Decks by Pierre-Yves Ritschard

Other Decks in Programming


  1. @pyr Four-line bio • CTO & co-founder at Exoscale •

    Open Source Developer • Monitoring & Distributed Systems Enthusiast • Linux since 1997
  2. @pyr Scalability “The ability of a system, network, or process

    to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth” - Wikipedia
  3. Quick Notes • “Cloud” an umbrella term • Here conflated

    with public IAAS • Oriented toward web application design
  4. @pyr Moore’s law “Over the history of computing, the number

    of transistors on integrated circuits doubles approximately every two years.”
  5. @pyr Average core speed has been stable for several years

    Consistent increase in cores per node
  6. @pyr • IT as a utility • Programmable resources •

    Decoupling of storage from system resources • Usage-based billing model
  7. @pyr • Much lower capacity planning overhead • OPEX makes

    accounting department happy • Nobody likes to change disks or rack servers
  8. @pyr • Switches? gone. • VLANs? gone. • IP allocation

    and translation? gone. • OS partitioning? gone. • OS RAID management? gone.
  9. @pyr provider "exoscale" { api_key = "${var.exoscale_api_key}" secret_key = "${var.exoscale_secret_key}"

    } resource "exoscale_instance" "web" { template = "ubuntu 17.04" disk_size = "50g" template = "ubuntu 17.04" profile = "medium" ssh_key = "production" }
  10. @pyr • It’s hard to break out of the big

    iron mental model • It’s hard to change our trust model ◦ “I want to be able to see my servers!” • There is still an upper limit on node size • Horizontal-first approach to building infrastructure
  11. @pyr Distributed systems are subject to Brewer/CAP Cannot enjoy three

    of Consistency, Availability, Partition tolerance
  12. @pyr • Consistency: Simultaneous requests see a consistent set of

    data • Availability: Each incoming request is acknowledged and receives a success or failure response • Partition Tolerance: The system will continue to process incoming requests in the face of failures
  13. @pyr Inspectable services Queues over RPC Degrade gracefully Prefer concerned

    citizens Configuration from a service registry Nodes as immutable data structures
  14. @pyr Don’t give up Use connection pooling and retry policies.

    Best in class: finagle, cassandra-driver
  15. @pyr All moving parts force new compromises This is true

    of internal and external components
  16. @pyr You probably want an AP queueing system So please

    avoid using MySQL as one! Candidates: Apache Kafka, RabbitMQ, Redis (to a lesser extent)
  17. @pyr Keep track of node volatility Reprovisioning of configuration on

    cluster topology changes Load-balancers make a great interaction point (concentrate changes there)
  18. @pyr The service registry is critical Ideally needs to be

    a strongly consistent, distributed system. You already have an eventually consistent one: DNS!
  19. @pyr Zookeeper and Etcd Current best in class. Promotes usage

    in-app as well as distributed locks, barriers, etc.
  20. @pyr • Configuration Drift? Reprovision node. • New version of

    software? Reprovision node. • Configuration file change? Reprovision node.
  21. @pyr Depart from using the machine as the base unit

    of reasoning All nodes in clusters should be equivalent
  22. @pyr Generic platform abstractions PAAS solutions are a commodity (cf:

    OpenShift) Generic scheduling and failover frameworks (Mesos, Kubernetes Operators)