Agenda • What can we consider highly available? • How do we mitigate risk? • Why are containers well suited for HA? • Recommendations • Lessons learned the hard way 3
Opinion vs Fact • This talk is based on my opinions • There are many different ways to do things • If I trash your favorite, we can still be friends • Why am I even qualified to talk about this? 4
This is not a tutorial • There’s no way I can show you enough in an hour to build all this from scratch • See what ideas might apply to your systems • Commit to incremental improvements 5
How often are you down • 99% Uptime = Down 7h/month • 99.9% Uptime = Down 45m/month • 99.99% Uptime = Down <5m/month • 99.999% Uptime = Down <30s/month 7
How to calculate your risk tolerance • Log in to your AWS account • Hand me your laptop • I will terminate one EC2 instance of my choosing • How long will you let me sit there? 9
But seriously… • Risk mitigation costs money • Consider battery backups as an example • Asking “how much reliability do you want” is a silly question • Make these decisions with hard numbers, not feelings 10
Obligatory Metaphors • Until the late 2000’s, we treated servers like pets • Then with Chef, Puppet, Ansible, etc we treated them like cattle • Now we can treat them like ants! 11
Lets start with hardware • All these tactics work great with cloud providers (doesn’t have to be AWS) • You need at least 2 of everything • You need a plan for how to fail • You need a replacement plan 13
Self-healing systems • If a server ceases to exist, it should be replaced without human interaction’ • AWS Cloud Formation and Elastic Beanstalk are good options • Terraform for non-AWSers 14
What about my devops tools? • I’ve already got all this ____ stuff set up • Docker can obviate all of that • You can still use these things if you must • Who runs the scripts? • Is the ____ server highly available? 15
Why Docker? • Immutable, disposable infrastructure • Requires no bootstrapping if using a Docker-friendly OS • Don’t have to care about what is running where, just that you have enough hardware 16
Containerize All The Things! • This isn’t just about containers for the sake of containers • The container way of thinking leads you down the right path 17
docker run Is Not Sufficient • Just like with building apps, you’re going to want a framework • API-based deployment and scheduling of containers • Something to wrangle hardware 18
Containers and Schedulers • Common to run multiple containers on one piece of hardware • What if that hardware goes down? • What if US-East-1D goes down? 20
I’m so tired of hearing people talk about Docker • This is not a Docker talk, I promise • Containers breed immutable, repeatable infrastructure • Immutable infrastructure is disposable and replaceable • Containers breed 12-factor apps • 12-factor apps are modular enough to facilitate true HA 22
Database • What does your I/O load look like • Split reads and writes • AWS Aurora if applicable • You really need at least 2 of your biggest server • Maintenance windows? 23
Disk Storage • Try to avoid local disk storage of anything • Put PHP sessions in a memory cache • Upload files directly to S3 • FlySystem is your friend, especially for development 24
Cache • How important is your cache? • Does your app work if the cache disappears? • Make sure it’s not the source of truth • Sharding vs Replication for scale 25
Service Discovery Overview • Distributed data stores that are a registry of what servers are where • Your code connects to these instead of using a config file • Even if you had to update it manually, it’d be faster than deploying 38
The problem with service discovery • Latency • Each lookup takes approximately 10ms • If you have to look up DB, Cache, ElasticSearch, SMTP, etc, it adds up • Try to organize services by logical application, so you can query for a whole namespace at once 41