Fixing Performance and Memory Problems

Fixing performance & Memory problems @fglc2 / dressipi

Part 1: The follow events happen in real-time

Realtime recommender

Realtime recommender Batch Predictions

Realtime recommender Batch Predictions Event Stream Recommender Realtime Predictions

Realtime recommender Batch Predictions Event Stream Recommender Realtime Predictions Results

We deployed to production

API Combine Serialize Batch Predictions Realtime Predictions

class ScoreMap def serialize to_int_array.pack("l<*") end def to_int_array @scores.flat_map do
|key, value| [ key, (value/(scale_factor + 1).to_i] end end def scale_factor @scores.values.max_by {|n| n.abs } .abs / 1000 end end

|key, value| [ key, (value/(scale_factor + 1)).to_i] end end def scale_factor @scores.values.max_by {|n| n.abs }

|key, value| [ key, (value/(scale_factor + 1)).to_i] end end def scale_factor @scores.values.max_by {|n| n.abs } .abs / 1000

def to_int_array @scores.flat_map do |key, value| [ key, (value/(scale_factor +
1)).to_i] end end def scale_factor @scores.values.max_by {|n| n.abs } .abs / 1000 end end

Ruby-Prof is a proﬁler • Records behaviour of your code
• Produces reports on where your code spends time • Use to analyse problems not detect them

require 'ruby-prof'

require 'ruby-prof' scores = load_example_data

require 'ruby-prof' scores = load_example_data result = RubyProf.profile do ScoreMap.new(scores).serialize
end

require 'ruby-prof' scores = load_example_data result = RubyProf.profile do ScoreMap.new(scores).serialize
end printer = RubyProf::GraphHtmlPrinter.new(result) File.open('out.html', 'w') do |file| printer.print(file) end

1)).to_i] end end def scale_factor @scores.values.max_by {|n| n.abs } .abs / 1000 end

1)).to_i] end end def scale_factor @scale_factor ||= @scores.values.max_by {|n| n.abs } .abs / 1000 end

Slows down your code

Can distort things

Sampling proﬁlers

Sampling proﬁlers • ruby-prof's overhead is because it is exact

• Sampling proﬁlers are approximate

• Sampling proﬁlers are approximate • Sampling proﬁlers monitor (sample) call stack periodically and reports how often each method seen

• Sampling profilers are approximate • Sampling profilers monitor (sample) call stack periodically and reports how often each method seen • Changing sampling frequency allows accuracy vs overhead tradeoff

Stackprof StackProf.run(mode: :cpu, out: 'stackprof.dump') do ScoreMap.new(scores).serialize end

================================== Mode: cpu(1000) Samples: 435 (0.00% miss rate) GC: 2
(0.46%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 433 (99.5%) 433 (99.5%) ScoreMap#scale_factor 2 (0.5%) 2 (0.5%) (garbage collection) 433 (99.5%) 0 (0.0%) ScoreMap#to_int_array 433 (99.5%) 0 (0.0%) ScoreMap#serialize 433 (99.5%) 0 (0.0%) block in <main> 433 (99.5%) 0 (0.0%) <main> 433 (99.5%) 0 (0.0%) <main>

Overhead

Overhead • Code on its own: 6.05 s

Overhead • Code on its own: 6.05 s • Rubyprof:
~ 27s

Overhead • Code on its own: 6.05 s • Rubyprof:
~ 27s • Stackprof (default settings): 6.09s

Part 1 Summary

Part 1 Summary • Use monitoring (or angry users) to
detect problems

detect problems • Reﬁne with stackprof (if needed)

detect problems • Reﬁne with stackprof (if needed) • Reproduce the problem in a small benchmark

detect problems • Reﬁne with stackprof (if needed) • Reproduce the problem in a small benchmark • Use rubyprof to understand / ﬁx the problem

detect problems • Refine with stackprof (if needed) • Reproduce the problem in a small benchmark • Use rubyprof to understand / fix the problem • Profit!

Part 2: Memory

• Bitmap GC (1.9) • Generational GC (2.1) • Incremental
GC (2.2) • Symbol GC (2.2) • Transient Heap (2.6) • Compacting GC ( 2.7 )

You still have to worry about memory

NoMemoryError: failed to allocate

0 100 200 300 400

Memory leak

Where do they come from

Where do they come from • Leaks in native code
(C extensions)

(C extensions) • Ruby thinks you are still using it (long lived objects, caches, globals)

(C extensions) • Ruby thinks you are still using it (long lived objects, caches, globals) • Not understanding object lifecycles

class LeakingMiddleware def initialize(app) @app = app end def call(env)
... end def cached_thing(id) @things ||= {} @things[id] ||= Thing.find id end end

Leaks are scary because

Leaks are scary because • Manifest as global problem

Leaks are scary because • Manifest as global problem •
Memory usage is complicated and noisy (caches, threads, lazy load)

Leaks are scary because • Manifest as global problem •
Memory usage is complicated and noisy (caches, threads, lazy load) • Easily hidden in a dependency

Back to the problem

At least a year

At least a year • Slow or rare leak ~
2 weeks to exhaust memory at 3M requests / day

2 weeks to exhaust memory at 3M requests / day • App usually restarted often enough (deploys, autoscaling)

2 weeks to exhaust memory at 3M requests / day • App usually restarted often enough (deploys, autoscaling) • Failed instances automatically detected as unhealthy and replaced

But seriously

I couldn't reproduce the leak

What could be happening?

What could be happening? • Not picking the right endpoint

• Diﬀerent usage patterns

• Diﬀerent usage patterns • Environment diﬀerences

Debugging in production

rbtrace

rbtrace • Count / trace method calls, GCs

rbtrace • Count / trace method calls, GCs • Dump
backtraces

rbtrace • Count / trace method calls, GCs • Dump
backtraces • Execute arbitrary ruby

trace_object_allocations • Part of Objectspace (in stdlib) • Records details
about each memory allocation (when/ where/what) • Dump to a ﬁle for analysis ( heap dump )

Note that this feature introduces a huge performance decrease and
huge memory consumption.

The plan

The plan • Only work on one instance

The plan • Only work on one instance • Use
rbtrace to dump heap from single rails process

rbtrace to dump heap from single rails process • Analyse heap dump oﬄine

rbtrace to dump heap from single rails process • Analyse heap dump oﬄine • Proﬁt

bundle exec rbtrace --pid $PID -e ' Thread.new { require
'objspace' ObjectSpace.trace_object_allocations_start }' Start Tracing

Dump GC.start io=File.open("/tmp/heap.dump", "w"); ObjectSpace.dump_all(output: io); io.close

What's in a heap dump?

"address": "0x7ff5bbdfc3f8",

"address": "0x7ff5bbdfc3f8", "type": "OBJECT",  "class": "0x7ff5ba88b8e8",

"address": "0x7ff5bbdfc3f8", "type": "OBJECT",  "class": "0x7ff5ba88b8e8", "file": "...gems/mongoid-7.0.5/lib/mongoid/document.rb",  "line": 335, 
"method": "allocate",

"method": "allocate", "generation": 62,

"method": "allocate", "generation": 62, "ivars": 3,  "memsize": 40,

"method": "allocate", "generation": 62, "ivars": 3,  "memsize": 40, "references": [  "0x7ff5bbe1c4f0"  ],

"method": "allocate", "generation": 62, "ivars": 3,  "memsize": 40, "references": [  "0x7ff5bbe1c4f0"  ], "flags": {  "wb_protected": true  },

heapy $ heapy read good_heap.dump

heapy $ heapy read good_heap.dump Analyzing Heap  ==============

heapy $ heapy read good_heap.dump Analyzing Heap  ============== Generation: nil
object count: 475742, mem: 0.0 kb

object count: 475742, mem: 0.0 kb Generation: 61 object count: 5334, mem: 1782.5 kb  Generation: 62 object count: 918, mem: 1324.8 kb  Generation: 63 object count: 4, mem: 0.3 kb

object count: 475742, mem: 0.0 kb Generation: 61 object count: 5334, mem: 1782.5 kb  Generation: 62 object count: 918, mem: 1324.8 kb  Generation: 63 object count: 4, mem: 0.3 kb Generation: 2222 object count: 1, mem: 0.0 kb  Generation: 2223 object count: 420, mem: 44.6 kb  Generation: 2224 object count: 635, mem: 42.8 kb

Analyzing Heap ============== Generation: nil object count: 498011, mem: 0.0
kb Generation: 61 object count: 834, mem: 82.2 kb Generation: 62 object count: 36, mem: 3.9 kb Generation: 63 object count: 9, mem: 0.8 kb Generation: 64 object count: 3, mem: 0.4 kb Generation: 65 object count: 23, mem: 1.8 kb Generation: 66 object count: 16, mem: 1.4 kb Generation: 67 object count: 12, mem: 1.2 kb ...

Generation: 218 object count: 11, mem: 1.2 kb Generation: 219
object count: 10, mem: 1.1 kb Generation: 220 object count: 12, mem: 1.2 kb Generation: 221 object count: 12, mem: 1.2 kb Generation: 222 object count: 13, mem: 9.2 kb Generation: 223 object count: 10, mem: 1.0 kb Generation: 224 object count: 12, mem: 1.2 kb Generation: 225 object count: 12, mem: 1.2 kb Generation: 226 object count: 9, mem: 0.9 kb Generation: 227 object count: 11, mem: 1.2 kb ...

Generation: 1874 object count: 8, mem: 0.9 kb Generation: 1875
object count: 8, mem: 1.0 kb Generation: 1876 object count: 10, mem: 1.0 kb Generation: 1877 object count: 6, mem: 0.7 kb Generation: 1878 object count: 8, mem: 1.0 kb Generation: 1879 object count: 10, mem: 1.0 kb Generation: 1880 object count: 6, mem: 0.7 kb Generation: 1881 object count: 8, mem: 1.0 kb Generation: 1882 object count: 9, mem: 0.8 kb Generation: 1883 object count: 6, mem: 0.7 kb ...

$ heapy read heap.dump 1879

$ heapy read heap.dump 1879 allocated by memory (978) (in
bytes)  ==============================  858 .../lib/action_view/template.rb:296  120 .../lib/active_support/core_ext/module/remove_method.rb:4

bytes)  ==============================  858 .../lib/action_view/template.rb:296  120 .../lib/active_support/core_ext/module/remove_method.rb:4 _app_views_mobile_layouts_widgets_html_haml__3857206368409431_70209738354 _app_views_mobile_widgets_outfits_html_haml___3715565700258114_70209720618 _app_views_mobile_layouts_widgets_html_haml__3857206368409431_70209688428

bytes)  ==============================  858 .../lib/action_view/template.rb:296  120 .../lib/active_support/core_ext/module/remove_method.rb:4 _app_views_mobile_layouts_widgets_html_haml__3857206368409431_70209738354 _app_views_mobile_widgets_outfits_html_haml___3715565700258114_70209720618 _app_views_mobile_layouts_widgets_html_haml__3857206368409431_70209688428 "flags":{"wb_protected":true, "old":true, "uncollectible":true, "marked":true}

2 Questions

2 Questions • Why so many of them?

2 Questions • Why so many of them? • Why
is the method name uncollectible?

Ruby 2.2: Symbols can be garbage collected

Ruby 2.2: Symbols can be garbage collected *Terms and conditions
may apply Your statutory rights are not aﬀected

2 Kinds of Symbols

2 Kinds of Symbols • Mortal symbols: created from foo.to_sym

• Immortal symbols: C-level references (rb_intern)

• Immortal symbols: C-level references (rb_intern) Deﬁning a method creates an immortal symbol

Action View template caching

Action View template caching • Each template is compiled to
a method with random name

a method with random name • Each entry in the view search path is a resolver

a method with random name • Each entry in the view search path is a resolver • Resolvers return compiled template from cache or load from disk

before_action :set_view_path def set_view_path if mobile_site_requested? prepend_view_path "app/views/mobile" end end

devise/app/views app/views View Paths

devise/app/views app/views View Paths app/views/mobile

devise/app/views app/views View Paths

devise/app/views app/views View Paths app/views/mobile

New (empty) cache on each request!

Part 2 Summary

Part 2 Summary • Don't be afraid of production!

Part 2 Summary • Don't be afraid of production! •
Understand what/how using memory with heap dumps

Part 2 Summary • Don't be afraid of production! •
Understand what/how using memory with heap dumps • Pay attention to your objects' lifecycles

Thank you! @fglc2 / dressipi

Further Reading • https://samsaﬀron.com/archive/2015/03/31/debugging-memory- leaks-in-ruby • https://samsaﬀron.com/archive/2019/10/08/debugging-unmanaged- and-hidden-memory-leaks-in-ruby • https://blog.codeship.com/tracking-object-allocations-in-ruby/
• https://blog.codeship.com/debugging-a-memory-leak-on-heroku/ • https://www.slideshare.net/authorNari/symbol-gc • https://github.com/schneems/derailed_benchmarks • https://www.schneems.com/2019/11/07/why-does-my-apps- memory-usage-grow-asymptotically-over-time/

Fixing Performance and Memory Problems

Fixing Performance and Memory Problems

More Decks by Frederick Cheung

Other Decks in Technology

Featured

Transcript