Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Availability PHP (Zendcon 2016)

44a352b02a91a9e841da7533bc5d9b8e?s=47 Josh Butts
October 20, 2016

High Availability PHP (Zendcon 2016)

With the rise of containerized applications, more and more people are starting to consider running high-availability applications in production with Docker. This is not to be taken lightly and the path is fraught with peril. In this talk, we'll discuss what a highly-available Docker-powered PHP environment looks like and how to build one. We'll also look at strategies for using Docker and container concepts to avoid getting burned by "disposable" cloud hardware. We'll look at load balancing, service discovery, failover, and talk about tools that make these manageable. We'll also talk about the speed at which the Docker ecosystem is moving and how to cope with that when dealing with production applications.

44a352b02a91a9e841da7533bc5d9b8e?s=128

Josh Butts

October 20, 2016
Tweet

Transcript

  1. High-Availability PHP Josh Butts Zendcon 2016

  2. About Me • VP of Engineering, offers.com • Austin PHP

    Organizer • I play competitive Skee Ball • github.com/jimbojsb • @jimbojsb 2
  3. SURVEY Lets Start With A 3

  4. Agenda • What can we consider highly available? • Why

    are containers well-suited for this? • What technology choices do we have to make? • Recommendations • Lessons learned the hard way 4
  5. Opinion vs Fact • This talk is based on my

    opinions • There are many different ways to do things • If I trash your favorite, we can still have a beer later • Why am I even qualified to talk about this? 5
  6. This is not a tutorial • There’s no way I

    can show you enough in an hour to build this all from scratch • See what ideas might apply to your systems 6
  7. What is High Availability • Your stuff just doesn’t go

    down • Like ever • And not just by happy coincidence! 7
  8. How Often Are You Down? 8 • 99% Uptime =

    Down 7h a month • 99.9% Uptime = Down 45m a month • 99.99% Uptime = Down <5m a month • 99.999% Uptime = Down <30s a month
  9. What Should We Shoot For? • Minimum “4 9’s” •

    “5 9’s” is totally doable • HA costs real money • All about managing potential loss 9
  10. How To Calculate Your Risk Tolerance • Log in to

    your AWS account • Hand me your laptop • I will terminate one EC2 instance of my choosing, at random • How long will you let me sit there? 10
  11. But seriously… • Risk mitigation costs money • Consider battery

    backups as an example • Asking “how much reliability do you want” is a silly question • Make these decisions with hard numbers, not feelings 11
  12. Obligatory Metaphors • Until the late 2000’s, we treated servers

    like pets • Then with Chef, Puppet, Ansible, etc, we treated them like cattle • Now we can treat them like ants 12
  13. Assumptions for Now • You have an AWS account •

    You have something to lose if your apps are down • You have a budget to solve this problem 13
  14. Example App Ecosystem 14 PHP Web App API Scheduled Jobs

    Queue Workers Database Cache Job Queue Uploaded Files
  15. Lets Start with Hardware • All this stuff works great

    in the cloud • It also works just fine on bare metal • You need at least 2 of everything • You need a plan for how to fail • You need a replacement plan 15
  16. Self-Healing Systems • If a server ceases to exist, it

    should be replaced without human interaction • Use AWS Cloud Formation • Actually learn AWS Cloud Formation • AWS Elastic Beanstalk is an option • Terraform is also decent 16
  17. What about my DevOps Tools? • I’ve got all this

    _____ stuff already set up • Run EVERYTHING in Docker, and you don’t need it • You can still use it if you insist • Who runs the scripts? • Is the _____ server highly available? 17
  18. Why Docker Is Suitable • Immutable, disposable infrastructure • Requires

    no bootstrapping if using a docker-friendly OS • Don’t have to care what is running where, just that you have enough hardware 18
  19. Set up your hosts • I recommend CoreOS • Understand

    the CoreOS version scheme • CoreOS is self-upgrading (probably bad) 19
  20. Outsource Your State • Docker isn’t amazing at managing state

    • Local state is not fault-tolerant • AWS provides perfectly good, managed storage • Managed != High Availability 20
  21. Database • What does your I/O load look like? •

    Split writes and reads • Recommend AWS Aurora • Always have at least 2 of your biggest server • Watch out for maintenance windows 21
  22. Disk Storage • Try to avoid local disk storage of

    anything • Put PHP sessions in a cache of your choosing • Upload files directly to S3 • Consider Flysystem to avoid development S3 buckets 22
  23. Job Queue • Pick one that can be load balanced

    • Ideally outsource this too • Apache Kafka • Amazon SQS • RabbitMQ 23
  24. Cache • How important is your cache? • Does your

    app work if the cache disappears? • Make sure it’s not the source of truth • Sharding vs. Replication for scale 24
  25. RUN EVERYTHING ELSE IN DOCKER Now that we’ve solved our

    state problems 25
  26. Containerize All The Things • This isn’t just about containers

    for the sake of containers • The container way of thinking leads you down the right path 26
  27. docker run Is Not Sufficient • Just like with building

    apps, you’re going to want a framework • Some sort of api and deployment system to run containers • Something to wrangle hardware with 27
  28. This is a solved problem • Kubernetes • Mesos /

    Marathon • Docker Swarm / Compose • Rancher 28
  29. My Recommendation • Rancher • Experimenting & learning • Small-to-mid

    scale • NoOps • Mesos & Marathon • Big scale (dozens of services & instances) • You have an ops team (or an expert dev) • Full AWS stack 29
  30. Apache Mesos • Your “interface” to hardware • You don’t

    speak to it directly • Clusters many instances into a pool of resources • Runs containers 30
  31. 31

  32. 32

  33. Marathon • Runs your long-running services / containers • Websites,

    APIs, etc • Job queue workers • May or may not expose ports • Works with haproxy 33
  34. 34

  35. Scheduled Jobs • Remember, we agreed to run everything in

    Docker. • How do we load balance & failover cron? 35
  36. This also, is a solved problem • Chronos • Part

    of “Mesosphere” • Clunky UI • Kind of a pain to deploy to • Singularity • Not singly focused • Happens to do scheduled jobs really well 36
  37. 37

  38. Resource Allocation • Marathon and Singularity both support cgroups and

    docker resource offers • Commit ahead of time to what your app needs, don’t over-provision • Can result in fragmentation 38
  39. Load Balancers • Apply liberally • Also good for port

    translation on public facing side • Load balance all your internal services too • If you don’t need a load balancer, it’s a good sign that’s a risky service 39
  40. Service Discovery • Can you ever really know where anything

    is running? • No, No you cannot • Because of the way Docker works, there will be many “unknown” ports 40
  41. This is a solved problem • etcd • Consul •

    Zookeeper • Several others 41
  42. My Recommendation • Don’t use any of these • Service

    discovery costs time • How often do your services actually reconfigure themselves • We use known-port discovery 42
  43. Full Network Diagram 43 Marathon Marathon Mesos Master Mesos Master

    Mesos Master Mesos ELB Marathon ELB The Interweb Marathon LBs Marathon LBs App ELBS App ELBS App ELBS Mesos Slaves Mesos Slaves Mesos Slaves Mesos Slaves Mesos Slaves VPN Singularity Singularity Singularity ELB Zookeeper Zookeeper Zookeeper
  44. IT’S BASICALLY MAGIC If you get this far 44

  45. Single App Network Diagram 45 Internet myapp.com:80 haproxy:10090 container:31456 container:30437

    haproxy:10090
  46. So Now What • You went and built out all

    this fancy stuff • You absolutely MUST test it • Go terminate an instance, see if everything fixes itself • If you haven’t tested each “HA” component, you’re not done yet 46
  47. We can version our infrastructure 47

  48. We can version our infrastructure 48

  49. DevOps Benefits • Ops can focus on supporting the cluster

    and not the apps • Empower engineers to ship infrastructure in a “fool proof” way • Intangible benefit of spreading cool tech into the organization 49
  50. Learn from my mistakes • RTFM, specifically about minimum requirements

    • Is your DNS HA? • Upgrade your way out of trouble • It’s highly unlikely you need a bleeding edge version of Docker • If you deploy in Docker, you’d better be developing in it 50
  51. Downsides • Mesos/Marathon doesn’t expose some of the most modern

    Docker features • This stuff is not easy, and it’s moving fast • It can be hard to test the waters, lots of circular dependencies 51
  52. QUESTIONS? Phew. That was alot. 52

  53. JOIND.IN/TALK/3E36D I’d love your feedback 53