Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Availability PHP (Nomad PHP January 2018)

Josh Butts
January 26, 2018

High Availability PHP (Nomad PHP January 2018)

Josh Butts

January 26, 2018
Tweet

More Decks by Josh Butts

Other Decks in Technology

Transcript

  1. About Me • SVP of Engineering,
 Ziff Davis • Austin

    PHP Organizer • github.com/jimbojsb • @jimbojsb 2
  2. Agenda • What can we consider highly available? • How

    do we mitigate risk? • Why are containers well suited for HA? • Recommendations • Lessons learned the hard way 3
  3. Opinion vs Fact • This talk is based on my

    opinions • There are many different ways to do things • If I trash your favorite, we can still be friends • Why am I even qualified to talk about this? 4
  4. This is not a tutorial • There’s no way I

    can show you enough in an hour to build all this from scratch • See what ideas might apply to your systems • Commit to incremental improvements 5
  5. What is high availability? • Your stuff just doesn’t go

    down • Like ever • This is not just a happy coincidence 6
  6. How often are you down • 99% Uptime = Down

    7h/month • 99.9% Uptime = Down 45m/month • 99.99% Uptime = Down <5m/month • 99.999% Uptime = Down <30s/month 7
  7. What should we shoot for? • Minimum of “4 9’s”

    • “5 9’s” is totally doable • HA costs real money • Balance potential loss against costs 8
  8. How to calculate your risk tolerance • Log in to

    your AWS account • Hand me your laptop • I will terminate one EC2 instance of my choosing • How long will you let me sit there? 9
  9. But seriously… • Risk mitigation costs money • Consider battery

    backups as an example • Asking “how much reliability do you want” is a silly question • Make these decisions with hard numbers, not feelings 10
  10. Obligatory Metaphors • Until the late 2000’s, we treated servers

    like pets • Then with Chef, Puppet, Ansible, etc we treated them like cattle • Now we can treat them like ants! 11
  11. Example App Ecosystem 12 PHP Web App API Scheduled Jobs

    Queue Workers Database Cache Job Queue Uploaded Files
  12. Lets start with hardware • All these tactics work great

    with cloud providers (doesn’t have to be AWS) • You need at least 2 of everything • You need a plan for how to fail • You need a replacement plan 13
  13. Self-healing systems • If a server ceases to exist, it

    should be replaced without human interaction’ • AWS Cloud Formation and Elastic Beanstalk are good options • Terraform for non-AWSers 14
  14. What about my devops tools? • I’ve already got all

    this ____ stuff set up • Docker can obviate all of that • You can still use these things if you must • Who runs the scripts? • Is the ____ server highly available? 15
  15. Why Docker? • Immutable, disposable infrastructure • Requires no bootstrapping

    if using a Docker-friendly OS • Don’t have to care about what is running where, just that you have enough hardware 16
  16. Containerize All The Things! • This isn’t just about containers

    for the sake of containers • The container way of thinking leads you down the right path 17
  17. docker run Is Not Sufficient • Just like with building

    apps, you’re going to want a framework • API-based deployment and scheduling of containers • Something to wrangle hardware 18
  18. Don’t worry, this is a solved problem • Kubernetes •

    Mesosphere / DCOS • Docker Swarm • Rancher 19
  19. Containers and Schedulers • Common to run multiple containers on

    one piece of hardware • What if that hardware goes down? • What if US-East-1D goes down? 20
  20. I’m so tired of hearing people talk about Docker •

    This is not a Docker talk, I promise • Containers breed immutable, repeatable infrastructure • Immutable infrastructure is disposable and replaceable • Containers breed 12-factor apps • 12-factor apps are modular enough to facilitate true HA 22
  21. Database • What does your I/O load look like •

    Split reads and writes • AWS Aurora if applicable • You really need at least 2 of your biggest server • Maintenance windows? 23
  22. Disk Storage • Try to avoid local disk storage of

    anything • Put PHP sessions in a memory cache • Upload files directly to S3 • FlySystem is your friend, especially for development 24
  23. Cache • How important is your cache? • Does your

    app work if the cache disappears? • Make sure it’s not the source of truth • Sharding vs Replication for scale 25
  24. Lets fix IPs 29 $clientIp = 0; if (isset($_SERVER['HTTP_X_FORWARDED_FOR'])) {

    $possibleIp = $_SERVER['HTTP_X_FORWARDED_FOR']; if (strpos($possibleIp, ',') !== false) { $ipList = explode(',', $possibleIp); foreach ($ipList as $ip) { $ip = trim($ip); if (filter_var($ip, FILTER_VALIDATE_IP)) { $clientIp = $ip; break; } } } else { $clientIp = $possibleIp; } } else { $clientIp = $_SERVER['REMOTE_ADDR'] ?? null; }
  25. App Considerations 31 $dbs = ["db1.site.com", "db2.site.com", "db3.site.com"]; $slaveNum =

    mt_rand(0, count($dbs) - 1); $pdo = new \PDO($dbs[$slaveNum]);
  26. Better Version 32 $dbs = ["db1.site.com", "db2.site.com", "db3.site.com"]; $slaveNum =

    mt_rand(0, count($dbs) - 1); try { $pdo = new \PDO($dbs[$slaveNum]); } catch (\PDOExcetion $e) { for ($i = 0; $i < count($dbs); $i++) { if ($i != $slaveNum) { try { $pdo = new \PDO($dbs[$slaveNum]); break; } catch (\PDOException $e) { } } } if (!$pdo) { throw new \RuntimeException("all out of DBs"); } }
  27. App Considerations 35 require_once __DIR__ . '/vendor/autoload.php'; $client = new

    \Predis\Client(); $value = $client->get("myvalue"); if (!$value) { $value = reallyExpensiveFunction(); $client->set("myvalue", $value); }
  28. Better Version 36 class Cache { private $redis; public function

    get($key) { try { return $this->redis->get($key); } catch (Exception $e) { return null; } } public function set($key, $value) { try { return $this->redis->set($key, $value); } catch (Exception $e) { } } }
  29. • Consider a circuit breaker pattern for other external services

    • https://github.com/ offers/rho 37 Circuit Breaker Pattern
  30. Service Discovery Overview • Distributed data stores that are a

    registry of what servers are where • Your code connects to these instead of using a config file • Even if you had to update it manually, it’d be faster than deploying 38
  31. Service Discovery • Etcd • Consul • Zookeeper • Oh

    by the way, these all need their own cluster of at least 3 servers 39
  32. Quick Service Discovery Example 40 $etcd = new LinkOrb\Component\Etcd\Client($etcdClusterHostname); $dbs

    = $etcClient->get("/database/slaves"); $slaveNum = mt_rand(0, count($dbs) - 1); try { $pdo = new \PDO($dbs[$slaveNum]); } catch (\PDOExcetion $e) { //... }
  33. The problem with service discovery • Latency • Each lookup

    takes approximately 10ms • If you have to look up DB, Cache, ElasticSearch, SMTP, etc, it adds up • Try to organize services by logical application, so you can query for a whole namespace at once 41
  34. Updated Discovery Example 42 $etcd = new LinkOrb\Component\Etcd\Client($etcdClusterHostname); $config =

    = $etcClient->get("/services/myapp"); $dbs = $config["databases"]["slaves"] $slaveNum = mt_rand(0, count($dbs) - 1); try { $pdo = new \PDO($dbs[$slaveNum]); } catch (\PDOExcetion $e) { //... }