Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Availability PHP (Nomad PHP January 2018)

Josh Butts
January 26, 2018

High Availability PHP (Nomad PHP January 2018)

Josh Butts

January 26, 2018
Tweet

More Decks by Josh Butts

Other Decks in Technology

Transcript

  1. High Availability PHP
    Josh Butts
    Nomad PHP - January 2018

    View Slide

  2. About Me
    • SVP of Engineering,

    Ziff Davis
    • Austin PHP Organizer
    • github.com/jimbojsb
    • @jimbojsb
    2

    View Slide

  3. Agenda
    • What can we consider highly available?
    • How do we mitigate risk?
    • Why are containers well suited for HA?
    • Recommendations
    • Lessons learned the hard way
    3

    View Slide

  4. Opinion vs Fact
    • This talk is based on my opinions
    • There are many different ways to do things
    • If I trash your favorite, we can still be
    friends
    • Why am I even qualified to talk about this?
    4

    View Slide

  5. This is not a tutorial
    • There’s no way I can show you enough in
    an hour to build all this from scratch
    • See what ideas might apply to your
    systems
    • Commit to incremental improvements
    5

    View Slide

  6. What is high availability?
    • Your stuff just doesn’t go down
    • Like ever
    • This is not just a happy coincidence
    6

    View Slide

  7. How often are you down
    • 99% Uptime = Down 7h/month
    • 99.9% Uptime = Down 45m/month
    • 99.99% Uptime = Down <5m/month
    • 99.999% Uptime = Down <30s/month
    7

    View Slide

  8. What should we shoot for?
    • Minimum of “4 9’s”
    • “5 9’s” is totally doable
    • HA costs real money
    • Balance potential loss against costs
    8

    View Slide

  9. How to calculate your risk tolerance
    • Log in to your AWS account
    • Hand me your laptop
    • I will terminate one EC2 instance of my
    choosing
    • How long will you let me sit there?
    9

    View Slide

  10. But seriously…
    • Risk mitigation costs money
    • Consider battery backups as an example
    • Asking “how much reliability do you want”
    is a silly question
    • Make these decisions with hard numbers,
    not feelings
    10

    View Slide

  11. Obligatory Metaphors
    • Until the late 2000’s, we treated servers like
    pets
    • Then with Chef, Puppet, Ansible, etc we
    treated them like cattle
    • Now we can treat them like ants!
    11

    View Slide

  12. Example App Ecosystem
    12
    PHP Web
    App
    API
    Scheduled
    Jobs
    Queue
    Workers
    Database
    Cache
    Job Queue
    Uploaded
    Files

    View Slide

  13. Lets start with hardware
    • All these tactics work great with cloud
    providers (doesn’t have to be AWS)
    • You need at least 2 of everything
    • You need a plan for how to fail
    • You need a replacement plan
    13

    View Slide

  14. Self-healing systems
    • If a server ceases to exist, it should be
    replaced without human interaction’
    • AWS Cloud Formation and Elastic
    Beanstalk are good options
    • Terraform for non-AWSers
    14

    View Slide

  15. What about my devops tools?
    • I’ve already got all this ____ stuff set up
    • Docker can obviate all of that
    • You can still use these things if you must
    • Who runs the scripts?
    • Is the ____ server highly available?
    15

    View Slide

  16. Why Docker?
    • Immutable, disposable infrastructure
    • Requires no bootstrapping if using a
    Docker-friendly OS
    • Don’t have to care about what is running
    where, just that you have enough hardware
    16

    View Slide

  17. Containerize All The Things!
    • This isn’t just about containers for the sake
    of containers
    • The container way of thinking leads you
    down the right path
    17

    View Slide

  18. docker run Is Not Sufficient
    • Just like with building apps, you’re going to
    want a framework
    • API-based deployment and scheduling of
    containers
    • Something to wrangle hardware
    18

    View Slide

  19. Don’t worry, this is a solved problem
    • Kubernetes
    • Mesosphere / DCOS
    • Docker Swarm
    • Rancher
    19

    View Slide

  20. Containers and Schedulers
    • Common to run multiple containers on one
    piece of hardware
    • What if that hardware goes down?
    • What if US-East-1D goes down?
    20

    View Slide

  21. Example
    21

    View Slide

  22. I’m so tired of hearing people talk about Docker
    • This is not a Docker talk, I promise
    • Containers breed immutable, repeatable
    infrastructure
    • Immutable infrastructure is disposable and
    replaceable
    • Containers breed 12-factor apps
    • 12-factor apps are modular enough to
    facilitate true HA
    22

    View Slide

  23. Database
    • What does your I/O load look like
    • Split reads and writes
    • AWS Aurora if applicable
    • You really need at least 2 of your biggest
    server
    • Maintenance windows?
    23

    View Slide

  24. Disk Storage
    • Try to avoid local disk storage of anything
    • Put PHP sessions in a memory cache
    • Upload files directly to S3
    • FlySystem is your friend, especially for
    development
    24

    View Slide

  25. Cache
    • How important is your cache?
    • Does your app work if the cache
    disappears?
    • Make sure it’s not the source of truth
    • Sharding vs Replication for scale
    25

    View Slide

  26. The Basics
    26

    View Slide

  27. Getting Started
    27

    View Slide

  28. Now IP addresses are broken
    28
    $clientIp = $_SERVER['REMOTE_ADDR'];
    // 1.2.3.4:80 - address of load balancer

    View Slide

  29. Lets fix IPs
    29
    $clientIp = 0;
    if (isset($_SERVER['HTTP_X_FORWARDED_FOR'])) {
    $possibleIp = $_SERVER['HTTP_X_FORWARDED_FOR'];
    if (strpos($possibleIp, ',') !== false) {
    $ipList = explode(',', $possibleIp);
    foreach ($ipList as $ip) {
    $ip = trim($ip);
    if (filter_var($ip, FILTER_VALIDATE_IP)) {
    $clientIp = $ip;
    break;
    }
    }
    } else {
    $clientIp = $possibleIp;
    }
    } else {
    $clientIp = $_SERVER['REMOTE_ADDR'] ?? null;
    }

    View Slide

  30. Now lets fix the databases
    30

    View Slide

  31. App Considerations
    31
    $dbs = ["db1.site.com", "db2.site.com", "db3.site.com"];
    $slaveNum = mt_rand(0, count($dbs) - 1);
    $pdo = new \PDO($dbs[$slaveNum]);

    View Slide

  32. Better Version
    32
    $dbs = ["db1.site.com", "db2.site.com", "db3.site.com"];
    $slaveNum = mt_rand(0, count($dbs) - 1);
    try {
    $pdo = new \PDO($dbs[$slaveNum]);
    } catch (\PDOExcetion $e) {
    for ($i = 0; $i < count($dbs); $i++) {
    if ($i != $slaveNum) {
    try {
    $pdo = new \PDO($dbs[$slaveNum]);
    break;
    } catch (\PDOException $e) {
    }
    }
    }
    if (!$pdo) {
    throw new \RuntimeException("all out of DBs");
    }
    }

    View Slide

  33. Now we need more load balancers
    33

    View Slide

  34. Lets not forget caching!
    34

    View Slide

  35. App Considerations
    35
    require_once __DIR__ . '/vendor/autoload.php';
    $client = new \Predis\Client();
    $value = $client->get("myvalue");
    if (!$value) {
    $value = reallyExpensiveFunction();
    $client->set("myvalue", $value);
    }

    View Slide

  36. Better Version
    36
    class Cache
    {
    private $redis;
    public function get($key)
    {
    try {
    return $this->redis->get($key);
    } catch (Exception $e) {
    return null;
    }
    }
    public function set($key, $value)
    {
    try {
    return $this->redis->set($key, $value);
    } catch (Exception $e) {
    }
    }
    }

    View Slide

  37. • Consider a circuit
    breaker pattern for
    other external
    services
    • https://github.com/
    offers/rho
    37
    Circuit Breaker Pattern

    View Slide

  38. Service Discovery Overview
    • Distributed data stores that are a registry of
    what servers are where
    • Your code connects to these instead of
    using a config file
    • Even if you had to update it manually, it’d
    be faster than deploying
    38

    View Slide

  39. Service Discovery
    • Etcd
    • Consul
    • Zookeeper
    • Oh by the way, these all need their own
    cluster of at least 3 servers
    39

    View Slide

  40. Quick Service Discovery Example
    40
    $etcd = new LinkOrb\Component\Etcd\Client($etcdClusterHostname);
    $dbs = $etcClient->get("/database/slaves");
    $slaveNum = mt_rand(0, count($dbs) - 1);
    try {
    $pdo = new \PDO($dbs[$slaveNum]);
    } catch (\PDOExcetion $e) {
    //...
    }

    View Slide

  41. The problem with service discovery
    • Latency
    • Each lookup takes approximately 10ms
    • If you have to look up DB, Cache,
    ElasticSearch, SMTP, etc, it adds up
    • Try to organize services by logical
    application, so you can query for a whole
    namespace at once
    41

    View Slide

  42. Updated Discovery Example
    42
    $etcd = new LinkOrb\Component\Etcd\Client($etcdClusterHostname);
    $config = = $etcClient->get("/services/myapp");
    $dbs = $config["databases"]["slaves"]
    $slaveNum = mt_rand(0, count($dbs) - 1);
    try {
    $pdo = new \PDO($dbs[$slaveNum]);
    } catch (\PDOExcetion $e) {
    //...
    }

    View Slide

  43. Questions?

    View Slide