$30 off During Our Annual Pro Sale. View Details »

Containers and Microservices make performance worse

Containers and Microservices make performance worse

Snarky but mainly light-hearted look at how introducing microservices and containers into your infrastructure might just make it less-performant if you don't think about the new challenges.

Presented at the London Web Performance meetup

Gareth Rushgrove

August 04, 2015
Tweet

More Decks by Gareth Rushgrove

Other Decks in Technology

Transcript

  1. Containers and
    Microservices
    Puppet Labs
    Gareth Rushgrove
    Make everything performance worse

    View Slide

  2. Gareth Rushgrove
    @garethr

    View Slide

  3. Gareth Rushgrove

    View Slide

  4. Gareth Rushgrove

    View Slide

  5. Quiz Time

    View Slide

  6. Gareth Rushgrove
    Apologies to the following.
    I love you really.

    View Slide

  7. This Talk

    View Slide

  8. Gareth Rushgrove

    View Slide

  9. make snide and sharply critical
    comments
    Gareth Rushgrove
    snark
    /sna:k/
    noun

    View Slide

  10. Fast in-process communication
    vs slow and unreliable HTTP
    Gareth Rushgrove

    View Slide

  11. The performance cost of an
    overlay network
    Gareth Rushgrove

    View Slide

  12. What happened to ps and
    friends?
    Gareth Rushgrove

    View Slide

  13. Microservices and
    Performance
    It will get bad before it gets good

    View Slide

  14. Monoliths are pretty easy to
    understand at a high level
    Gareth Rushgrove

    View Slide

  15. Gareth Rushgrove
    Monolith

    View Slide

  16. Even adding in the supporting
    infrastructure it’s not so bad
    Gareth Rushgrove

    View Slide

  17. Gareth Rushgrove
    Monolith Database
    Load
    Balancer
    Network Network

    View Slide

  18. I can optimise the network, the
    load balancer, the database, the
    application or the client. Easy
    Gareth Rushgrove

    View Slide

  19. What about Microservices?
    Well, first start with a diagram
    like this…
    Gareth Rushgrove

    View Slide

  20. Microservice
    Microservice Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Gareth Rushgrove

    View Slide

  21. Whatever protocol you’re using
    lets assume you’re communicating
    over the network…
    Gareth Rushgrove

    View Slide

  22. Microservice
    Microservice Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Gareth Rushgrove
    Network
    Network
    Network
    Network
    Network
    Network
    Network
    Network

    View Slide

  23. And you probably want more than
    one instance of each service…
    Gareth Rushgrove

    View Slide

  24. Gareth Rushgrove
    Microservice
    Load
    Balancer
    Network

    View Slide

  25. Gareth Rushgrove
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Network
    Network
    Network
    Network
    Network
    Network
    Network
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network

    View Slide

  26. And you probably have a few
    different databases now too…
    Gareth Rushgrove

    View Slide

  27. Gareth Rushgrove
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Microservice
    Network
    Network
    Network
    Network
    Network
    Network
    Network
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network
    Load
    Balancer
    Network
    Database
    Network
    Network
    Database

    View Slide

  28. In my made-up 8 service
    architecture we went from 5 things
    to optimise up to 32
    Gareth Rushgrove
    The Bad

    View Slide

  29. We went from 3 network hops
    to, er, more depending on
    the request
    Gareth Rushgrove
    The Bad

    View Slide

  30. We ignored the cost of
    serialisation/deserialisation
    (JSON can be expensive)
    Gareth Rushgrove
    The Bad

    View Slide

  31. The operational overhead
    just jumped considerably
    Gareth Rushgrove
    The Bad

    View Slide

  32. Lots more network traffic. Watch
    out for latency in particular
    Gareth Rushgrove
    The Bad

    View Slide

  33. Without request tracing
    you’re doomed
    Gareth Rushgrove
    The Bad

    View Slide

  34. Granular services are easier
    to optimise individually
    Gareth Rushgrove
    The Good

    View Slide

  35. Individual services can be
    scaled independently
    Gareth Rushgrove
    The Good

    View Slide

  36. Debugging with
    Containers
    Is that process inside or outside a container?

    View Slide

  37. Problems with free and top
    Gareth Rushgrove

    View Slide

  38. Gareth Rushgrove
    $ free
    total used free shared buffers cached
    Mem: 1024444 864140 160304 5024 50008 637736
    -/+ buffers/cache: 176396 848048
    Swap: 473084 16 473068
    $ docker exec test-container free
    total used free shared buffers cached
    Mem: 1024444 866440 158004 5024 50000 637732
    -/+ buffers/cache: 178708 845736
    Swap: 473084 16 473068
    Can a container use that memory?

    View Slide

  39. memory stats come from the proc
    filesystem: /proc/meminfo, /
    proc/vmstat, etc.
    Gareth Rushgrove

    View Slide

  40. /proc/meminfo and
    /proc/vmstat are not
    aware of cgroups
    Gareth Rushgrove

    View Slide

  41. Problems with ps
    Gareth Rushgrove

    View Slide

  42. Gareth Rushgrove
    $ ps aux
    USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
    ...
    999 1807 0.2 11.4 867624 464572 ? Ssl 09:38 0:21 mysqld
    Is this process in a container?

    View Slide

  43. Gareth Rushgrove
    $ ps -eo ucmd,cgroup
    COMMAND CGROUP
    ...
    mysqld 9:perf_event:/docker/61e76d2c39121282474ff895b9b3ba2addd775cdea6d2ba89ce76c28
    Which container is that?

    View Slide

  44. Gareth Rushgrove
    Sysdig

    View Slide

  45. Provides a Kernel module, which
    hooks into cgroups and
    namespaces
    Gareth Rushgrove

    View Slide

  46. Gareth Rushgrove
    $ sudo sysdig -c topcontainers_cpu
    CPU% container.name
    -----------------------------------------------------------------------
    90.13% mysql
    15.93% wordpress1
    7.27% haproxy
    3.46% wordpress2
    CPU usage across containers

    View Slide

  47. Gareth Rushgrove
    $ sudo sysdig -pc -c topprocs_cpu container.name=client
    CPU% Process container.name
    ----------------------------------------------
    02.69% bash client
    31.04% curl client
    0.74% sleep client
    CPU usage in a single container

    View Slide

  48. Gareth Rushgrove
    $ sudo sysdig -pc -c topprocs_net
    Bytes Process Host_pid Container_pid container.name
    ---------------------------------------------------------------
    72.06KB haproxy 7385 13 haproxy
    56.96KB docker.io 1775 7039 host
    44.45KB mysqld 6995 91 mysql
    44.45KB mysqld 6995 99 mysql
    29.36KB apache2 7893 124 wordpress1
    29.36KB apache2 26895 126 wordpress4
    29.36KB apache2 26622 131 wordpress2
    29.36KB apache2 27935 132 wordpress3
    29.36KB apache2 27306 125 wordpress4
    22.23KB mysqld 6995 90 mysqlclient
    Network bandwidth

    View Slide

  49. Gareth Rushgrove
    $ sudo sysdig -pc -A -c echo_fds "fd.ip=172.17.0.3 and fd.ip=172.17.0.7"
    ------ Write 103B to [haproxy] [d468ee81543a] 172.17.0.7:37557->172.17.0.3:80 (hapr
    GET / HTTP/1.1
    User-Agent: curl/7.35.0
    Host: 172.17.0.7
    Accept: */*
    X-Forwarded-For: 172.17.0.8
    ------ Read 103B from [wordpress1] [12b8c6a04031] 172.17.0.7:37557->172.17.0.3:80 (
    GET / HTTP/1.1
    User-Agent: curl/7.35.0
    Host: 172.17.0.7
    Accept: */*
    X-Forwarded-For: 172.17.0.8
    ------ Write 346B to [wordpress1] [12b8c6a04031] 172.17.0.7:37557->172.17.0.3:80 (a
    HTTP/1.1 302 Found
    Date: Sat, 21 Feb 2015 22:19:18 GMT
    Traffic between containers

    View Slide

  50. Don’t expect existing debugging
    tools to work
    Gareth Rushgrove
    The Bad

    View Slide

  51. New tools are emerging. Often
    with better interfaces
    Gareth Rushgrove
    The Good

    View Slide

  52. Container Overhead
    Count the performance penalties

    View Slide

  53. Gareth Rushgrove

    View Slide

  54. Containers add very little
    overhead
    Gareth Rushgrove
    The Good

    View Slide

  55. Gareth Rushgrove

    View Slide

  56. Gareth Rushgrove

    View Slide

  57. Memory cgroups can be
    expensive
    Gareth Rushgrove
    The Bad

    View Slide

  58. By default, the memory subsystem
    uses 40 bytes of memory per
    physical page on x86_64 systems.
    These resources are consumed
    even if memory is not used in any
    hierarchy
    Gareth Rushgrove
    https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html

    View Slide

  59. Container networking is hard.
    Overlay networks make it easy.
    But slow.
    Gareth Rushgrove
    The Bad

    View Slide

  60. Gareth Rushgrove
    http://www.generictestdomain.net/docker/weave/networking/stupidity/2015/04/05/weave-is-kinda-slow/

    View Slide

  61. Gareth Rushgrove
    https://arjanschaaf.github.io/is-the-network-the-limit/

    View Slide

  62. Gareth Rushgrove
    The Good

    View Slide

  63. Gareth Rushgrove
    $ qperf between two EC2 c3.8xlarge with 10Gb/s
    tcp_bw:
    bw = 1.2 GB/sec
    udp_lat:
    latency = 48.1 us
    $ qperf over weave network using ODP/VXLAN
    tcp_bw:
    bw = 1.09 GB/sec
    udp_lat:
    latency = 61.9 us

    View Slide

  64. Linux containers don’t
    really contain
    Gareth Rushgrove
    The Bad

    View Slide

  65. Gareth Rushgrove

    View Slide

  66. To get strict isolation guarantees
    you’re going to wrap them in
    virtual machines anyway
    Gareth Rushgrove
    The Bad

    View Slide

  67. Projects like Clear Linux from
    Intel are innovating in this space
    Gareth Rushgrove
    The Good

    View Slide

  68. User namespaces (host/container
    separation) and seccomp (limit
    syscalls) are coming to Docker
    Gareth Rushgrove
    The Good

    View Slide

  69. Conclusions
    Snark free zone

    View Slide

  70. Containers and microservices
    pose new performance challenges
    Gareth Rushgrove

    View Slide

  71. Most problems aren’t
    performance problems
    Gareth Rushgrove

    View Slide

  72. If you’re not already a networking
    expert start learning now
    Gareth Rushgrove

    View Slide

  73. Lots of opportunities for
    new tooling to improve things
    Gareth Rushgrove

    View Slide

  74. Questions?
    And thanks for listening
    Gareth Rushgrove

    View Slide