$30 off During Our Annual Pro Sale. View Details »

production: an owner's manual

production: an owner's manual

from exec(ut) 2018

Igor Wiedler

April 23, 2018
Tweet

More Decks by Igor Wiedler

Other Decks in Programming

Transcript

  1. production:
    an owner's manual

    View Slide

  2. hello!

    View Slide

  3. broken
    computers

    View Slide

  4. View Slide

  5. getting sidetracked now
    so sorry*
    * not sorry

    View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. back to serious business

    View Slide

  10. !

    View Slide

  11. View Slide

  12. a production system is a
    system that serves real users

    View Slide

  13. the goal of operations is to
    ensure services are reliable

    View Slide

  14. in order to provide a
    good user experience

    View Slide

  15. View Slide

  16. failure

    View Slide

  17. app

    View Slide

  18. app
    linux kernel
    cpu dram disk network
    power
    supply
    switches
    load
    balancer dns
    submarine cables
    routers fiber

    View Slide

  19. app
    linux kernel
    the cloud

    View Slide

  20. View Slide

  21. • cosmic rays
    • disk failure
    • power outages
    • software bugs
    • ...

    View Slide

  22. entropy

    View Slide

  23. View Slide

  24. capacity

    View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. cascading failure

    View Slide

  29. View Slide

  30. system
    design

    View Slide

  31. redundancy

    View Slide

  32. "

    View Slide

  33. scale

    View Slide

  34. View Slide

  35. "

    View Slide

  36. p1
    m3
    c1
    m2 m1
    p2 c2

    View Slide

  37. data storage

    View Slide

  38. "

    View Slide

  39. "

    View Slide

  40. protocols

    View Slide

  41. View Slide

  42. monitoring

    View Slide

  43. many components
    many req/s

    View Slide

  44. View Slide

  45. measure all the things?

    View Slide

  46. ✅ ⏱

    View Slide

  47. golden signals
    • latency
    • traffic
    • errors
    • saturation

    View Slide

  48. golden signals
    • latency
    • traffic
    • errors
    • saturation

    View Slide

  49. golden signals
    • latency
    • traffic
    • errors
    • saturation

    View Slide

  50. golden signals
    • latency
    • traffic
    • errors
    • saturation

    View Slide

  51. golden signals
    • latency
    • traffic
    • errors
    • saturation
    0 - 50 [1620]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (74.55%)
    50 - 100 [ 447]: ∎∎∎∎∎∎∎∎∎∎ (20.57%)
    100 - 150 [ 49]: ∎ (2.25%)
    150 - 200 [ 15]: (0.69%)
    200 - 250 [ 15]: (0.69%)
    250 - 300 [ 10]: (0.46%)
    300 - 350 [ 6]: (0.28%)
    350 - 400 [ 1]: (0.05%)
    400 - 450 [ 0]: (0.00%)
    450 - 500 [ 4]: (0.18%)

    View Slide

  52. golden signals
    • latency
    • traffic
    • errors
    • saturation

    View Slide

  53. saturation traffic latency errors

    View Slide

  54. View Slide

  55. humans

    View Slide

  56. View Slide

  57. oops, deleted the
    database

    View Slide

  58. bad human!

    View Slide

  59. why does this button even
    exist?

    View Slide

  60. app
    linux kernel
    cpu dram disk network
    power
    supply
    switches
    load
    balancer dns
    submarine cables
    routers fiber

    View Slide

  61. app
    linux kernel
    cpu dram disk network
    power
    supply
    switches
    load
    balancer dns
    submarine cables
    routers fiber
    humans

    View Slide

  62. app
    linux kernel
    cpu dram disk network
    power
    supply
    switches
    load
    balancer dns
    submarine cables
    routers fiber
    humans
    h
    u
    m
    a
    n
    s

    View Slide

  63. epic failure is almost
    always systemic

    View Slide

  64. failure

    View Slide

  65. recap

    View Slide

  66. • a production system serves
    real users
    • users like things that work
    and are fast
    • epic failure is almost always
    systemic

    View Slide

  67. thx
    @igorwhilefalse

    View Slide

  68. View Slide