High Availability PHP (Zendcon 2016)

High-Availability PHP Josh Butts Zendcon 2016

About Me • VP of Engineering, offers.com • Austin PHP
Organizer • I play competitive Skee Ball • github.com/jimbojsb • @jimbojsb 2

SURVEY Lets Start With A 3

Agenda • What can we consider highly available? • Why
are containers well-suited for this? • What technology choices do we have to make? • Recommendations • Lessons learned the hard way 4

Opinion vs Fact • This talk is based on my
opinions • There are many different ways to do things • If I trash your favorite, we can still have a beer later • Why am I even qualified to talk about this? 5

This is not a tutorial • There’s no way I
can show you enough in an hour to build this all from scratch • See what ideas might apply to your systems 6

What is High Availability • Your stuff just doesn’t go
down • Like ever • And not just by happy coincidence! 7

How Often Are You Down? 8 • 99% Uptime =
Down 7h a month • 99.9% Uptime = Down 45m a month • 99.99% Uptime = Down <5m a month • 99.999% Uptime = Down <30s a month

What Should We Shoot For? • Minimum “4 9’s” •
“5 9’s” is totally doable • HA costs real money • All about managing potential loss 9

How To Calculate Your Risk Tolerance • Log in to
your AWS account • Hand me your laptop • I will terminate one EC2 instance of my choosing, at random • How long will you let me sit there? 10

But seriously… • Risk mitigation costs money • Consider battery
backups as an example • Asking “how much reliability do you want” is a silly question • Make these decisions with hard numbers, not feelings 11

Obligatory Metaphors • Until the late 2000’s, we treated servers
like pets • Then with Chef, Puppet, Ansible, etc, we treated them like cattle • Now we can treat them like ants 12

Assumptions for Now • You have an AWS account •
You have something to lose if your apps are down • You have a budget to solve this problem 13

Example App Ecosystem 14 PHP Web App API Scheduled Jobs
Queue Workers Database Cache Job Queue Uploaded Files

Lets Start with Hardware • All this stuff works great
in the cloud • It also works just fine on bare metal • You need at least 2 of everything • You need a plan for how to fail • You need a replacement plan 15

Self-Healing Systems • If a server ceases to exist, it
should be replaced without human interaction • Use AWS Cloud Formation • Actually learn AWS Cloud Formation • AWS Elastic Beanstalk is an option • Terraform is also decent 16

What about my DevOps Tools? • I’ve got all this
_____ stuff already set up • Run EVERYTHING in Docker, and you don’t need it • You can still use it if you insist • Who runs the scripts? • Is the _____ server highly available? 17

Why Docker Is Suitable • Immutable, disposable infrastructure • Requires
no bootstrapping if using a docker-friendly OS • Don’t have to care what is running where, just that you have enough hardware 18

Set up your hosts • I recommend CoreOS • Understand
the CoreOS version scheme • CoreOS is self-upgrading (probably bad) 19

Outsource Your State • Docker isn’t amazing at managing state
• Local state is not fault-tolerant • AWS provides perfectly good, managed storage • Managed != High Availability 20

Database • What does your I/O load look like? •
Split writes and reads • Recommend AWS Aurora • Always have at least 2 of your biggest server • Watch out for maintenance windows 21

Disk Storage • Try to avoid local disk storage of
anything • Put PHP sessions in a cache of your choosing • Upload files directly to S3 • Consider Flysystem to avoid development S3 buckets 22

Job Queue • Pick one that can be load balanced
• Ideally outsource this too • Apache Kafka • Amazon SQS • RabbitMQ 23

Cache • How important is your cache? • Does your
app work if the cache disappears? • Make sure it’s not the source of truth • Sharding vs. Replication for scale 24

RUN EVERYTHING ELSE IN DOCKER Now that we’ve solved our
state problems 25

Containerize All The Things • This isn’t just about containers
for the sake of containers • The container way of thinking leads you down the right path 26

docker run Is Not Sufficient • Just like with building
apps, you’re going to want a framework • Some sort of api and deployment system to run containers • Something to wrangle hardware with 27

This is a solved problem • Kubernetes • Mesos /
Marathon • Docker Swarm / Compose • Rancher 28

My Recommendation • Rancher • Experimenting & learning • Small-to-mid
scale • NoOps • Mesos & Marathon • Big scale (dozens of services & instances) • You have an ops team (or an expert dev) • Full AWS stack 29

Apache Mesos • Your “interface” to hardware • You don’t
speak to it directly • Clusters many instances into a pool of resources • Runs containers 30

Marathon • Runs your long-running services / containers • Websites,
APIs, etc • Job queue workers • May or may not expose ports • Works with haproxy 33

Scheduled Jobs • Remember, we agreed to run everything in
Docker. • How do we load balance & failover cron? 35

This also, is a solved problem • Chronos • Part
of “Mesosphere” • Clunky UI • Kind of a pain to deploy to • Singularity • Not singly focused • Happens to do scheduled jobs really well 36

Resource Allocation • Marathon and Singularity both support cgroups and
docker resource offers • Commit ahead of time to what your app needs, don’t over-provision • Can result in fragmentation 38

Load Balancers • Apply liberally • Also good for port
translation on public facing side • Load balance all your internal services too • If you don’t need a load balancer, it’s a good sign that’s a risky service 39

Service Discovery • Can you ever really know where anything
is running? • No, No you cannot • Because of the way Docker works, there will be many “unknown” ports 40

This is a solved problem • etcd • Consul •
Zookeeper • Several others 41

My Recommendation • Don’t use any of these • Service
discovery costs time • How often do your services actually reconfigure themselves • We use known-port discovery 42

Full Network Diagram 43 Marathon Marathon Mesos Master Mesos Master
Mesos Master Mesos ELB Marathon ELB The Interweb Marathon LBs Marathon LBs App ELBS App ELBS App ELBS Mesos Slaves Mesos Slaves Mesos Slaves Mesos Slaves Mesos Slaves VPN Singularity Singularity Singularity ELB Zookeeper Zookeeper Zookeeper

IT’S BASICALLY MAGIC If you get this far 44

Single App Network Diagram 45 Internet myapp.com:80 haproxy:10090 container:31456 container:30437
haproxy:10090

So Now What • You went and built out all
this fancy stuff • You absolutely MUST test it • Go terminate an instance, see if everything fixes itself • If you haven’t tested each “HA” component, you’re not done yet 46

We can version our infrastructure 47

We can version our infrastructure 48

DevOps Benefits • Ops can focus on supporting the cluster
and not the apps • Empower engineers to ship infrastructure in a “fool proof” way • Intangible benefit of spreading cool tech into the organization 49

Learn from my mistakes • RTFM, specifically about minimum requirements
• Is your DNS HA? • Upgrade your way out of trouble • It’s highly unlikely you need a bleeding edge version of Docker • If you deploy in Docker, you’d better be developing in it 50

Downsides • Mesos/Marathon doesn’t expose some of the most modern
Docker features • This stuff is not easy, and it’s moving fast • It can be hard to test the waters, lots of circular dependencies 51

QUESTIONS? Phew. That was alot. 52

JOIND.IN/TALK/3E36D I’d love your feedback 53

High Availability PHP (Zendcon 2016)

High Availability PHP (Zendcon 2016)

More Decks by Josh Butts

Other Decks in Technology

Featured

Transcript