High Availability PHP (Nomad PHP January 2018)

High Availability PHP Josh Butts Nomad PHP - January 2018

About Me • SVP of Engineering,  Ziff Davis • Austin
PHP Organizer • github.com/jimbojsb • @jimbojsb 2

Agenda • What can we consider highly available? • How
do we mitigate risk? • Why are containers well suited for HA? • Recommendations • Lessons learned the hard way 3

Opinion vs Fact • This talk is based on my
opinions • There are many different ways to do things • If I trash your favorite, we can still be friends • Why am I even qualified to talk about this? 4

This is not a tutorial • There’s no way I
can show you enough in an hour to build all this from scratch • See what ideas might apply to your systems • Commit to incremental improvements 5

What is high availability? • Your stuff just doesn’t go
down • Like ever • This is not just a happy coincidence 6

How often are you down • 99% Uptime = Down
7h/month • 99.9% Uptime = Down 45m/month • 99.99% Uptime = Down <5m/month • 99.999% Uptime = Down <30s/month 7

What should we shoot for? • Minimum of “4 9’s”
• “5 9’s” is totally doable • HA costs real money • Balance potential loss against costs 8

How to calculate your risk tolerance • Log in to
your AWS account • Hand me your laptop • I will terminate one EC2 instance of my choosing • How long will you let me sit there? 9

But seriously… • Risk mitigation costs money • Consider battery
backups as an example • Asking “how much reliability do you want” is a silly question • Make these decisions with hard numbers, not feelings 10

Obligatory Metaphors • Until the late 2000’s, we treated servers
like pets • Then with Chef, Puppet, Ansible, etc we treated them like cattle • Now we can treat them like ants! 11

Example App Ecosystem 12 PHP Web App API Scheduled Jobs
Queue Workers Database Cache Job Queue Uploaded Files

Lets start with hardware • All these tactics work great
with cloud providers (doesn’t have to be AWS) • You need at least 2 of everything • You need a plan for how to fail • You need a replacement plan 13

Self-healing systems • If a server ceases to exist, it
should be replaced without human interaction’ • AWS Cloud Formation and Elastic Beanstalk are good options • Terraform for non-AWSers 14

What about my devops tools? • I’ve already got all
this ____ stuff set up • Docker can obviate all of that • You can still use these things if you must • Who runs the scripts? • Is the ____ server highly available? 15

Why Docker? • Immutable, disposable infrastructure • Requires no bootstrapping
if using a Docker-friendly OS • Don’t have to care about what is running where, just that you have enough hardware 16

Containerize All The Things! • This isn’t just about containers
for the sake of containers • The container way of thinking leads you down the right path 17

docker run Is Not Sufficient • Just like with building
apps, you’re going to want a framework • API-based deployment and scheduling of containers • Something to wrangle hardware 18

Don’t worry, this is a solved problem • Kubernetes •
Mesosphere / DCOS • Docker Swarm • Rancher 19

Containers and Schedulers • Common to run multiple containers on
one piece of hardware • What if that hardware goes down? • What if US-East-1D goes down? 20

Example 21

I’m so tired of hearing people talk about Docker •
This is not a Docker talk, I promise • Containers breed immutable, repeatable infrastructure • Immutable infrastructure is disposable and replaceable • Containers breed 12-factor apps • 12-factor apps are modular enough to facilitate true HA 22

Database • What does your I/O load look like •
Split reads and writes • AWS Aurora if applicable • You really need at least 2 of your biggest server • Maintenance windows? 23

Disk Storage • Try to avoid local disk storage of
anything • Put PHP sessions in a memory cache • Upload files directly to S3 • FlySystem is your friend, especially for development 24

Cache • How important is your cache? • Does your
app work if the cache disappears? • Make sure it’s not the source of truth • Sharding vs Replication for scale 25

The Basics 26

Getting Started 27

Now IP addresses are broken 28 $clientIp = $_SERVER['REMOTE_ADDR']; //
1.2.3.4:80 - address of load balancer

Lets fix IPs 29 $clientIp = 0; if (isset($_SERVER['HTTP_X_FORWARDED_FOR'])) {
$possibleIp = $_SERVER['HTTP_X_FORWARDED_FOR']; if (strpos($possibleIp, ',') !== false) { $ipList = explode(',', $possibleIp); foreach ($ipList as $ip) { $ip = trim($ip); if (filter_var($ip, FILTER_VALIDATE_IP)) { $clientIp = $ip; break; } } } else { $clientIp = $possibleIp; } } else { $clientIp = $_SERVER['REMOTE_ADDR'] ?? null; }

Now lets fix the databases 30

App Considerations 31 $dbs = ["db1.site.com", "db2.site.com", "db3.site.com"]; $slaveNum =
mt_rand(0, count($dbs) - 1); $pdo = new \PDO($dbs[$slaveNum]);

Better Version 32 $dbs = ["db1.site.com", "db2.site.com", "db3.site.com"]; $slaveNum =
mt_rand(0, count($dbs) - 1); try { $pdo = new \PDO($dbs[$slaveNum]); } catch (\PDOExcetion $e) { for ($i = 0; $i < count($dbs); $i++) { if ($i != $slaveNum) { try { $pdo = new \PDO($dbs[$slaveNum]); break; } catch (\PDOException $e) { } } } if (!$pdo) { throw new \RuntimeException("all out of DBs"); } }

Now we need more load balancers 33

Lets not forget caching! 34

App Considerations 35 require_once __DIR__ . '/vendor/autoload.php'; $client = new
\Predis\Client(); $value = $client->get("myvalue"); if (!$value) { $value = reallyExpensiveFunction(); $client->set("myvalue", $value); }

Better Version 36 class Cache { private $redis; public function
get($key) { try { return $this->redis->get($key); } catch (Exception $e) { return null; } } public function set($key, $value) { try { return $this->redis->set($key, $value); } catch (Exception $e) { } } }

• Consider a circuit breaker pattern for other external services
• https://github.com/ offers/rho 37 Circuit Breaker Pattern

Service Discovery Overview • Distributed data stores that are a
registry of what servers are where • Your code connects to these instead of using a config file • Even if you had to update it manually, it’d be faster than deploying 38

Service Discovery • Etcd • Consul • Zookeeper • Oh
by the way, these all need their own cluster of at least 3 servers 39

Quick Service Discovery Example 40 $etcd = new LinkOrb\Component\Etcd\Client($etcdClusterHostname); $dbs
= $etcClient->get("/database/slaves"); $slaveNum = mt_rand(0, count($dbs) - 1); try { $pdo = new \PDO($dbs[$slaveNum]); } catch (\PDOExcetion $e) { //... }

The problem with service discovery • Latency • Each lookup
takes approximately 10ms • If you have to look up DB, Cache, ElasticSearch, SMTP, etc, it adds up • Try to organize services by logical application, so you can query for a whole namespace at once 41

Updated Discovery Example 42 $etcd = new LinkOrb\Component\Etcd\Client($etcdClusterHostname); $config =
= $etcClient->get("/services/myapp"); $dbs = $config["databases"]["slaves"] $slaveNum = mt_rand(0, count($dbs) - 1); try { $pdo = new \PDO($dbs[$slaveNum]); } catch (\PDOExcetion $e) { //... }

Questions?

High Availability PHP (Nomad PHP January 2018)

High Availability PHP (Nomad PHP January 2018)

More Decks by Josh Butts

Other Decks in Technology

Featured

Transcript