$30 off During Our Annual Pro Sale. View Details »

Real World Ruby Performance

Aaron Quint
November 19, 2014

Real World Ruby Performance

My talk from RubyConf 2014 about Ruby Performance and the philosophy of performance.

Aaron Quint

November 19, 2014
Tweet

More Decks by Aaron Quint

Other Decks in Programming

Transcript

  1. Real WORLD
    RUBY PERFORMANCE
    Aaron Quint / @aq / Ruby Conf 2014

    View Slide

  2. @tmm1 @SamSaffron @_ko1
    SHOUTOUT

    View Slide

  3. We’ll come back to who I am later.
    It’s [relatively] unimportant.
    SKIPPING THE INTRO

    View Slide

  4. I’ve learned so much over the past 5 years,
    what could I share?
    This TALK was HARD
    TO WRITE

    View Slide

  5. It’s a ⌘+C ⌘+P culture.
    TIPS And tricks are the
    CLIFF NOTES of tech learning

    View Slide

  6. How to THINK about a problem is much
    more interesting than how to solve it.
    As a mentor I want to teach
    philosophy not snippets

    View Slide

  7. The tools and tricks will change over time.
    Today,
    Take away the process

    View Slide

  8. A multi-step process.
    Ruby Performance
    as therapy

    View Slide

  9. It’s a multi-step process
    Relax, Open up
    We’re going to go deep

    View Slide

  10. Step 1:
    Acceptance

    View Slide

  11. It’s your Fault.

    View Slide

  12. Really?

    View Slide

  13. Yes.

    View Slide

  14. View Slide

  15. It’s not you,
    It’s me.

    View Slide

  16. It’s not you,
    It’s me.

    View Slide

  17. — George Costanza
    (Inventor of “It’s not you, it’s me”)
    It’s not you,
    It’s me.

    View Slide

  18. Performance is
    about context

    View Slide

  19. Doesn’t scale for what? To what degree?
    With what hardware? …
    “X Doesn’t SCALE” IS BS

    View Slide

  20. So when we talk about
    our ruby being slow

    View Slide

  21. View Slide

  22. Rails

    View Slide

  23. Rails 10ms

    View Slide

  24. Rails
    Your application
    10ms

    View Slide

  25. Rails
    Your application
    DB
    10ms

    View Slide

  26. Rails
    Your application
    DB
    10ms
    20ms

    View Slide

  27. Rails
    Your application
    DB
    Cache
    10ms
    20ms

    View Slide

  28. Rails
    Your application
    DB
    Cache
    10ms
    20ms
    10ms

    View Slide

  29. Rails
    Your application
    DB
    Cache
    10ms
    20ms
    10ms
    250ms

    View Slide

  30. IT’s MY FAULT.

    View Slide

  31. Step 2:
    Diagnosis

    View Slide

  32. Where did I go wrong?

    View Slide

  33. METRICS!
    Measurement!
    MMMNUMBERS!
    Milliseconds MATTER!

    View Slide

  34. Use the right one for the job.
    Tools abound!

    View Slide

  35. Step 3:
    Treatment

    View Slide

  36. what are the steps to
    fix this problem?

    View Slide

  37. How many strokes for the lowest #?
    Playing golf.

    View Slide

  38. Two angles of
    optimization

    View Slide

  39. Proxies/Balancers
    Application
    Datastores
    Filesystem/OS/Hardware
    Individual Request Path (Controller#action)

    View Slide

  40. aka, speeding up a single query, controller
    action, or code path
    Vertical:
    Fix individual Elements

    View Slide

  41. aka, Adding more workers per-node,
    buying better hardware
    Horizontal:
    Address hardware or
    software across a cluster

    View Slide

  42. Important Themes:

    View Slide

  43. Context is crucial to
    acceptance

    View Slide

  44. Visibility and Introspect-
    ability are crucial to
    diagnosis

    View Slide

  45. Knowing your tools is
    crucial to treatment

    View Slide

  46. I’m Aaron Quint.
    I’m the chief Scientist
    at Paperless Post.

    View Slide

  47. View Slide

  48. Opposing forces.
    Features vs. speed

    View Slide

  49. We realized that being
    fast meant being stable

    View Slide

  50. CASE STUDIES in
    performance therapy

    View Slide

  51. View Slide

  52. Case 1:
    JSON FOR DAYS

    View Slide

  53. View Slide

  54. View Slide

  55. View Slide

  56. package:7292:1123434234234

    View Slide

  57. package:7290:11234342343424
    partner:8:11234342343424
    partner:8:11234342343424

    View Slide

  58. package:7290:11234342343424
    partner:8:11234342343424
    partner:8:11234342343424
    package:7292:1123434234234

    View Slide

  59. package:7290:11234342343424
    partner:8:11234342343424
    partner:8:11234342343424
    package:7292:1123434234234

    View Slide

  60. Uncached performance
    is still a problem

    View Slide

  61. ppprofiler
    to the rescue

    View Slide

  62. ppprofiler

    View Slide

  63. ppprofiler
    • Auto-cache toggling
    • Benchmark
    • Rblineprof
    • As::Notification Counts (SQL/Cache,
    etc)
    • MemoryProfiler (NEW!)
    • Gist-able (markdown) output

    View Slide

  64. View Slide

  65. View Slide

  66. View Slide

  67. Rinse and Repeat
    Make the slowest lines
    faster

    View Slide

  68. View Slide

  69. View Slide

  70. View Slide

  71. View Slide

  72. View Slide

  73. Case 2:
    FINGER IN THE SOCKET

    View Slide

  74. Before Vday we were
    looking for any wins

    View Slide

  75. IN BETWEEN THE LINES!
    stackprof +
    stackprof-remote

    View Slide

  76. View Slide

  77. Ruby Process (Unicorn)

    View Slide

  78. Ruby Process (Unicorn)

    View Slide

  79. Ruby Process (Unicorn)

    View Slide

  80. Ruby Process (Unicorn)
    AC::Dispatch

    View Slide

  81. Ruby Process (Unicorn)
    AC::Dispatch
    MyController::Create

    View Slide

  82. Ruby Process (Unicorn)
    AC::Dispatch
    MyController::Create
    Template::Render

    View Slide

  83. Ruby Process (Unicorn)
    AC::Dispatch
    MyController::Create
    Template::Render
    Ar::Find

    View Slide

  84. Ruby Process (Unicorn)

    View Slide

  85. Ruby Process (Unicorn)
    StackProf.start rb_profile_frames() rb_profile_frames() rb_profile_frames() rb_profile_frames()
    StackProf.stop
    StackProf.dump

    View Slide

  86. !
    [paperless@production-webapp10 current]$ stackprof tmp/stackprof-cpu-30715-1391204970.dump
    ==================================
    Mode: cpu(1000)
    Samples: 1761 (3.61% miss rate)
    GC: 128 (7.27%)
    ==================================
    TOTAL (pct) SAMPLES (pct) FRAME
    344 (19.5%) 342 (19.4%) Statsd#send_to_socket
    393 (22.3%) 44 (2.5%) Statsd#sampled
    44 (2.5%) 44 (2.5%) block in ActiveRecord::ConnectionAdapters::PostgreSQLPoolAdapter#execute
    56 (3.2%) 29 (1.6%) block in ActiveSupport::Notifications::Fanout#listeners_for
    29 (1.6%) 29 (1.6%) ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#extract_pg_identifier_from_name
    26 (1.5%) 26 (1.5%) ActiveSupport::Notifications::Fanout::Subscribers::Evented#subscribed_to?
    25 (1.4%) 25 (1.4%) String#blank?
    25 (1.4%) 25 (1.4%) block (2 levels) in ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#select
    24 (1.4%) 24 (1.4%) ActiveRecord::Base.scoped_methods
    22 (1.2%) 22 (1.2%) Dalli::Server::KSocket#kgio_wait_readable
    21 (1.2%) 21 (1.2%) ActiveSupport::CoreExtensions::Hash::Keys#assert_valid_keys
    42 (2.4%) 20 (1.1%) block in Dalli::Server::KSocket#readfull
    28 (1.6%) 19 (1.1%) ActiveRecord::ConnectionAdapters::ConnectionHandler#retrieve_connection_pool
    18 (1.0%) 18 (1.0%) #.instrumenter
    17 (1.0%) 16 (0.9%) Dalli::Server#deserialize
    15 (0.9%) 15 (0.9%) block (2 levels) in ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#select_raw
    14 (0.8%) 14 (0.8%) #.decode_www_form_component
    13 (0.7%) 13 (0.7%) Dalli::Server#write
    15 (0.9%) 11 (0.6%) ActiveSupport::CoreExtensions::Time::Calculations#minus_with_coercion
    10 (0.6%) 10 (0.6%) block in ActiveRecord::Base.with_scope
    10 (0.6%) 10 (0.6%) block in ActiveRecord::ConnectionAdapters::QueryCache#cache_sql
    21 (1.2%) 10 (0.6%) Yajl::Encoder.encode
    10 (0.6%) 10 (0.6%) Set#add
    10 (0.6%) 10 (0.6%) block (2 levels) in
    ActiveRecord::ConnectionAdapters::PostgreSQLAdapter#result_as_array
    10 (0.6%) 10 (0.6%)
    ActiveSupport::CoreExtensions::Time::Calculations::ClassMethods#time_with_datetime_fallback
    9 (0.5%) 9 (0.5%) ActiveRecord::DynamicFinderMatch#initialize
    9 (0.5%) 9 (0.5%) ActiveSupport::LogSubscriber.logger
    9 (0.5%) 9 (0.5%) block in ActionController::Base.action_methods
    9 (0.5%) 9 (0.5%) block in ActionController::Base.action_methods
    9 (0.5%) 9 (0.5%) block (2 levels) in ActiveRecord::Base.connection_handler=

    View Slide

  87. Hmm, why is
    statsd slow?

    View Slide

  88. Pull out good old
    benchmark

    View Slide

  89. $ ruby test/profile/statsd.rb
    user system total real
    udp with connect 0.010000 0.000000 0.010000 ( 0.074522)
    udp without connect 0.120000 0.530000 0.650000 ( 13.096515)
    statsd with connect 0.000000 0.090000 0.090000 ( 0.103520)
    statsd without connect 0.100000 0.620000 0.720000 ( 13.483539)

    View Slide

  90. WIN!

    View Slide

  91. View Slide

  92. Case 3:
    THE HOLIDAY SCALE

    View Slide

  93. View Slide

  94. Some times you can
    throw money at the
    problem

    View Slide

  95. View Slide

  96. Case 4:
    SHRINKING THE GAP

    View Slide

  97. Start at the top, work your way down.
    Starting with a
    HITLIST

    View Slide

  98. Number of Requests
    x 90th Percentile Response Time
    Total Time

    View Slide

  99. View Slide

  100. View Slide

  101. Using Stackprof flamegraphs on
    production.

    View Slide

  102. Using Stackprof flamegraphs on
    production.
    SET IT ON FIRE!

    View Slide

  103. View Slide

  104. View Slide

  105. View Slide

  106. View Slide

  107. View Slide

  108. Big wins are
    not the point

    View Slide

  109. If you’re not failing
    you’re not being honest

    View Slide

  110. Don’t just make tools,
    learn to use them

    View Slide

  111. twitter: @aq
    github.com/quirkey
    github.com/paperlesspost
    Thanks!

    View Slide