Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Availability PHP (Zendcon 2016)

Josh Butts
October 20, 2016

High Availability PHP (Zendcon 2016)

With the rise of containerized applications, more and more people are starting to consider running high-availability applications in production with Docker. This is not to be taken lightly and the path is fraught with peril. In this talk, we'll discuss what a highly-available Docker-powered PHP environment looks like and how to build one. We'll also look at strategies for using Docker and container concepts to avoid getting burned by "disposable" cloud hardware. We'll look at load balancing, service discovery, failover, and talk about tools that make these manageable. We'll also talk about the speed at which the Docker ecosystem is moving and how to cope with that when dealing with production applications.

Josh Butts

October 20, 2016
Tweet

More Decks by Josh Butts

Other Decks in Technology

Transcript

  1. High-Availability PHP
    Josh Butts
    Zendcon 2016

    View Slide

  2. About Me
    • VP of Engineering, offers.com
    • Austin PHP Organizer
    • I play competitive Skee Ball
    • github.com/jimbojsb
    • @jimbojsb
    2

    View Slide

  3. SURVEY
    Lets Start With A
    3

    View Slide

  4. Agenda
    • What can we consider highly available?
    • Why are containers well-suited for this?
    • What technology choices do we have to
    make?
    • Recommendations
    • Lessons learned the hard way
    4

    View Slide

  5. Opinion vs Fact
    • This talk is based on my opinions
    • There are many different ways to do things
    • If I trash your favorite, we can still have a
    beer later
    • Why am I even qualified to talk about this?
    5

    View Slide

  6. This is not a tutorial
    • There’s no way I can show you enough in
    an hour to build this all from scratch
    • See what ideas might apply to your
    systems
    6

    View Slide

  7. What is High Availability
    • Your stuff just doesn’t go down
    • Like ever
    • And not just by happy coincidence!
    7

    View Slide

  8. How Often Are You Down?
    8
    • 99% Uptime = Down 7h a month
    • 99.9% Uptime = Down 45m a month
    • 99.99% Uptime = Down <5m a month
    • 99.999% Uptime = Down <30s a month

    View Slide

  9. What Should We Shoot For?
    • Minimum “4 9’s”
    • “5 9’s” is totally doable
    • HA costs real money
    • All about managing potential loss
    9

    View Slide

  10. How To Calculate Your Risk Tolerance
    • Log in to your AWS account
    • Hand me your laptop
    • I will terminate one EC2 instance of my
    choosing, at random
    • How long will you let me sit there?
    10

    View Slide

  11. But seriously…
    • Risk mitigation costs money
    • Consider battery backups as an example
    • Asking “how much reliability do you want”
    is a silly question
    • Make these decisions with hard numbers,
    not feelings
    11

    View Slide

  12. Obligatory Metaphors
    • Until the late 2000’s, we treated servers
    like pets
    • Then with Chef, Puppet, Ansible, etc, we
    treated them like cattle
    • Now we can treat them like ants
    12

    View Slide

  13. Assumptions for Now
    • You have an AWS account
    • You have something to lose if your apps
    are down
    • You have a budget to solve this problem
    13

    View Slide

  14. Example App Ecosystem
    14
    PHP Web
    App
    API
    Scheduled
    Jobs
    Queue
    Workers
    Database
    Cache
    Job
    Queue
    Uploaded
    Files

    View Slide

  15. Lets Start with Hardware
    • All this stuff works great in the cloud
    • It also works just fine on bare metal
    • You need at least 2 of everything
    • You need a plan for how to fail
    • You need a replacement plan
    15

    View Slide

  16. Self-Healing Systems
    • If a server ceases to exist, it should be
    replaced without human interaction
    • Use AWS Cloud Formation
    • Actually learn AWS Cloud Formation
    • AWS Elastic Beanstalk is an option
    • Terraform is also decent
    16

    View Slide

  17. What about my DevOps Tools?
    • I’ve got all this _____ stuff already set up
    • Run EVERYTHING in Docker, and you
    don’t need it
    • You can still use it if you insist
    • Who runs the scripts?
    • Is the _____ server highly available?
    17

    View Slide

  18. Why Docker Is Suitable
    • Immutable, disposable infrastructure
    • Requires no bootstrapping if using a
    docker-friendly OS
    • Don’t have to care what is running where,
    just that you have enough hardware
    18

    View Slide

  19. Set up your hosts
    • I recommend CoreOS
    • Understand the CoreOS version scheme
    • CoreOS is self-upgrading (probably bad)
    19

    View Slide

  20. Outsource Your State
    • Docker isn’t amazing at managing state
    • Local state is not fault-tolerant
    • AWS provides perfectly good, managed
    storage
    • Managed != High Availability
    20

    View Slide

  21. Database
    • What does your I/O load look like?
    • Split writes and reads
    • Recommend AWS Aurora
    • Always have at least 2 of your biggest
    server
    • Watch out for maintenance windows
    21

    View Slide

  22. Disk Storage
    • Try to avoid local disk storage of anything
    • Put PHP sessions in a cache of your
    choosing
    • Upload files directly to S3
    • Consider Flysystem to avoid development
    S3 buckets
    22

    View Slide

  23. Job Queue
    • Pick one that can be load balanced
    • Ideally outsource this too
    • Apache Kafka
    • Amazon SQS
    • RabbitMQ
    23

    View Slide

  24. Cache
    • How important is your cache?
    • Does your app work if the cache
    disappears?
    • Make sure it’s not the source of truth
    • Sharding vs. Replication for scale
    24

    View Slide

  25. RUN EVERYTHING ELSE IN
    DOCKER
    Now that we’ve solved our state problems
    25

    View Slide

  26. Containerize All The Things
    • This isn’t just about containers for the sake
    of containers
    • The container way of thinking leads you
    down the right path
    26

    View Slide

  27. docker run Is Not Sufficient
    • Just like with building apps, you’re going to
    want a framework
    • Some sort of api and deployment system
    to run containers
    • Something to wrangle hardware with
    27

    View Slide

  28. This is a solved problem
    • Kubernetes
    • Mesos / Marathon
    • Docker Swarm / Compose
    • Rancher
    28

    View Slide

  29. My Recommendation
    • Rancher
    • Experimenting & learning
    • Small-to-mid scale
    • NoOps
    • Mesos & Marathon
    • Big scale (dozens of services & instances)
    • You have an ops team (or an expert dev)
    • Full AWS stack
    29

    View Slide

  30. Apache Mesos
    • Your “interface” to hardware
    • You don’t speak to it directly
    • Clusters many instances into a pool of
    resources
    • Runs containers
    30

    View Slide

  31. 31

    View Slide

  32. 32

    View Slide

  33. Marathon
    • Runs your long-running services /
    containers
    • Websites, APIs, etc
    • Job queue workers
    • May or may not expose ports
    • Works with haproxy
    33

    View Slide

  34. 34

    View Slide

  35. Scheduled Jobs
    • Remember, we agreed to run everything in
    Docker.
    • How do we load balance & failover cron?
    35

    View Slide

  36. This also, is a solved problem
    • Chronos
    • Part of “Mesosphere”
    • Clunky UI
    • Kind of a pain to deploy to
    • Singularity
    • Not singly focused
    • Happens to do scheduled jobs really well
    36

    View Slide

  37. 37

    View Slide

  38. Resource Allocation
    • Marathon and Singularity both support
    cgroups and docker resource offers
    • Commit ahead of time to what your app
    needs, don’t over-provision
    • Can result in fragmentation
    38

    View Slide

  39. Load Balancers
    • Apply liberally
    • Also good for port translation on public
    facing side
    • Load balance all your internal services too
    • If you don’t need a load balancer, it’s a
    good sign that’s a risky service
    39

    View Slide

  40. Service Discovery
    • Can you ever really know where anything
    is running?
    • No, No you cannot
    • Because of the way Docker works, there
    will be many “unknown” ports
    40

    View Slide

  41. This is a solved problem
    • etcd
    • Consul
    • Zookeeper
    • Several others
    41

    View Slide

  42. My Recommendation
    • Don’t use any of these
    • Service discovery costs time
    • How often do your services actually
    reconfigure themselves
    • We use known-port discovery
    42

    View Slide

  43. Full Network Diagram
    43
    Marathon
    Marathon
    Mesos Master
    Mesos Master
    Mesos Master
    Mesos ELB
    Marathon ELB
    The
    Interweb
    Marathon LBs
    Marathon LBs
    App ELBS
    App ELBS
    App ELBS
    Mesos Slaves
    Mesos Slaves
    Mesos Slaves
    Mesos Slaves
    Mesos Slaves
    VPN
    Singularity
    Singularity
    Singularity ELB
    Zookeeper
    Zookeeper
    Zookeeper

    View Slide

  44. IT’S BASICALLY MAGIC
    If you get this far
    44

    View Slide

  45. Single App Network Diagram
    45
    Internet
    myapp.com:80
    haproxy:10090
    container:31456
    container:30437
    haproxy:10090

    View Slide

  46. So Now What
    • You went and built out all this fancy stuff
    • You absolutely MUST test it
    • Go terminate an instance, see if everything
    fixes itself
    • If you haven’t tested each “HA”
    component, you’re not done yet
    46

    View Slide

  47. We can version our infrastructure
    47

    View Slide

  48. We can version our infrastructure
    48

    View Slide

  49. DevOps Benefits
    • Ops can focus on supporting the cluster
    and not the apps
    • Empower engineers to ship infrastructure
    in a “fool proof” way
    • Intangible benefit of spreading cool tech
    into the organization
    49

    View Slide

  50. Learn from my mistakes
    • RTFM, specifically about minimum requirements
    • Is your DNS HA?
    • Upgrade your way out of trouble
    • It’s highly unlikely you need a bleeding edge
    version of Docker
    • If you deploy in Docker, you’d better be
    developing in it
    50

    View Slide

  51. Downsides
    • Mesos/Marathon doesn’t expose some of
    the most modern Docker features
    • This stuff is not easy, and it’s moving fast
    • It can be hard to test the waters, lots of
    circular dependencies
    51

    View Slide

  52. QUESTIONS?
    Phew. That was alot.
    52

    View Slide

  53. JOIND.IN/TALK/3E36D
    I’d love your feedback
    53

    View Slide