Slide 1

Slide 1 text

张运政 追踪 Rails 应用中 的内存泄漏

Slide 2

Slide 2 text

42thcoder ❖ 张运政 ❖ Ruby 准新人, Rails 熟手 ❖ 前端届吃瓜群众 ❖@大搜车

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

2012年成立

Slide 6

Slide 6 text

2012年成立 团队500人

Slide 7

Slide 7 text

D轮数千万美金 2012年成立 团队500人

Slide 8

Slide 8 text

项目介绍

Slide 9

Slide 9 text

拍卖

Slide 10

Slide 10 text

秒杀 App

Slide 11

Slide 11 text

秒杀 App

Slide 12

Slide 12 text

ERP

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

15472

Slide 16

Slide 16 text

应用指标

Slide 17

Slide 17 text

应用指标 100+ 接口, 30+ 屏

Slide 18

Slide 18 text

应用指标 100+ 接口, 30+ 屏 平均 500 rpm, 峰值 6000 rpm

Slide 19

Slide 19 text

应用指标 100+ 接口, 30+ 屏 平均 500 rpm, 峰值 6000 rpm 3 台 ESC ( 4核 8 G) + RDS

Slide 20

Slide 20 text

上线啦

Slide 21

Slide 21 text

死机啦! 内存泄露啦!

Slide 22

Slide 22 text

怎么办?

Slide 23

Slide 23 text

下面我就聊一聊在拍卖项目中, 追踪内存泄露的经历

Slide 24

Slide 24 text

动手解决

Slide 25

Slide 25 text

工欲善其事必先利其器 — 孔子

Slide 26

Slide 26 text

Linux 工具

Slide 27

Slide 27 text

passenger-memory-stats passenger-status top && htop cat /proc/pid/status & cat /proc/[pid]/mem

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

resident set size, the non- swapped physical memory that a task has used. RSS VSZ virtual memory size of the process in KiB. Device mappings are currently excluded; this is subject to change.

Slide 35

Slide 35 text

线程组 Tgid( Thread Group ID) 才是真正意义上的 进程 ID, 即 get_pid 的结果

Slide 36

Slide 36 text

APM

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

动手解决问题吧

Slide 39

Slide 39 text

Survive Address Fix Lesson

Slide 40

Slide 40 text

企业级应用, 需要企业级的稳定 Survive Address Fix Lesson

Slide 41

Slide 41 text

看门狗: 报警 passenger_killer: 完成 N 个请求后杀掉 oom_killer: 内存超过 N 后杀进程, passenger 自动重启 oob: 进程每处理 N 个请求, 自动 GC

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

定位问题 Survive Address Fix Lesson Learned

Slide 46

Slide 46 text

插个话题

Slide 47

Slide 47 text

Is it Memory Bloat?

Slide 48

Slide 48 text

Memory Bloat VS Memory Leak

Slide 49

Slide 49 text

补充⼀一张 oneapm 看 vm 的截图

Slide 50

Slide 50 text

Monitor

Slide 51

Slide 51 text

补⼀一张 scoutapp 看各个接⼝口内存分配的图; 补⼀一张 GC 执⾏行行时间的图

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

Profile

Slide 54

Slide 54 text

Boot App => Hit with Request => Profile Memory

Slide 55

Slide 55 text

Derailed Benchmarks https://github.com/schneems/derailed_benchmarks Go faster, off the Rails - Benchmarks for your whole Rails app

Slide 56

Slide 56 text

Memory Profiler https://github.com/SamSaffron/memory_profiler memory_profiler for ruby

Slide 57

Slide 57 text

进程内存随请求数上涨 TEST_COUNT=10_000 PATH_TO_HIT=/ api/v1/home/counts bundle exec derailed exec perf:mem_over_time

Slide 58

Slide 58 text

单个请求, 内存分配 TEST_COUNT=100 PATH_TO_HIT=/api/v1/home/counts? token=5c78a9adeec3613b7a3ac0d734475e06 bundle exec derailed exec perf:objects

Slide 59

Slide 59 text

接口 X 会生成大量 Timeout 对象, 占用内存过多 总结 profile 要比 monitor 目的性更强 内存泄露确实存在, 内存随时间不断上涨

Slide 60

Slide 60 text

修复问题 Survive Address Fix Lesson

Slide 61

Slide 61 text

接口 X 做了什么? 猜是没有⽤用的,我们继续跟

Slide 62

Slide 62 text

Stackprof https://github.com/tmm1/stackprof a sampling call-stack profiler for ruby 2.1+

Slide 63

Slide 63 text

config.middleware.use(StackProf::Middleware, enabled:true, mode: :wall, interval: 1000, save_every: 5)

Slide 64

Slide 64 text

config.middleware.use(StackProf::Middleware, enabled:true, mode: :wall, interval: 1000, save_every: 5)

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

def write(*args) Timeout.timeout(@write_timeout, TimeoutError) { super } end

Slide 69

Slide 69 text

Gocha! redis-rb 的锅, 不过还是要验证下

Slide 70

Slide 70 text

#!/usr/bin/env ruby
 # encoding: utf-8
 
 require 'memory_profiler'
 
 gem 'redis', ENV['RVERSION']
 require 'redis'
 
 puts Process.pid
 puts Redis::VERSION
 
 MemoryProfiler.report {
 r = Redis.new
 i=0
 100.times do
 r.set "key#{i}", "value#{i}"
 end
 }.pretty_print

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

No content

Slide 73

Slide 73 text

No content

Slide 74

Slide 74 text

经验和教训 Survive Address Fix Lesson Learned

Slide 75

Slide 75 text

寻找内存热点 能否重现? 能否按接口跟踪? 调整是否有用? YES NO

Slide 76

Slide 76 text

git diff 对照组

Slide 77

Slide 77 text

Timeout is pure evil

Slide 78

Slide 78 text

每个人都可能出错

Slide 79

Slide 79 text

No content

Slide 80

Slide 80 text

No content

Slide 81

Slide 81 text

No content

Slide 82

Slide 82 text

可能其他用到的 Gem

Slide 83

Slide 83 text

Rbkit & Rbkit Client https://github.com/code-mancers/rbkit A new profiler for Ruby. With a GUI http://rbkit.codemancers.com 2.3.x 下无法使用

Slide 84

Slide 84 text

Oink https://github.com/noahd1/oink/ Log parser to identify actions which significantly increase VM heap size

Slide 85

Slide 85 text

Memory Logic https://github.com/binarylogic/memorylogic Adds in proccess id and memory usage in your rails logs, great for tracking down memory leaks

Slide 86

Slide 86 text

参考资料 • Ruby Under a Microscope • 垃圾回收的算法与实现 • Ruby Performance Optimization

Slide 87

Slide 87 text

THANKS

Slide 88

Slide 88 text

Q & A