Slide 1

Slide 1 text

Hacking and Pro fi ling Ruby for Performance Daisuke Aritomo (@osyoyu)

Slide 2

Slide 2 text

pp @osyoyu • Daisuke Aritomo (osyoyu, ͓͠ΐʔΏ, ͓͠ΐ͏Ώ) • "osyoyu" is pronounced as "oh show you" • #rubykaigiNOC (Venue Wi-Fi Team) • Software Engineer at Cookpad Inc.

Slide 3

Slide 3 text

Get a free drink from Cookpad's fridge! Cookpad provides RUBY-POWERED FRIDGES to RubyKaigi 2023! Let's make everyday cooking fun together! https://cookpad.careers/

Slide 4

Slide 4 text

Cookpad is doing a look- back RubyKaigi event! 5/18 @ Tokyo • Look back RubyKaigi with 
 hands on! • Let's make the next RubyKaigi together Come to Cookpad booth for details https://cookpad.connpass.com/event/282436/

Slide 5

Slide 5 text

About this Talk • How to pro fi le and tune a Ruby webapp • ... you know nothing about, • ... within 8 hours, • ... as part of a performance tuning competition, • for fun!

Slide 6

Slide 6 text

Do you like performance? 
 🙋

Slide 7

Slide 7 text

: the performance challenge • Contestants are given a VM and a super-slow webapp • Contestants may request a 60-second benchmark during the contest • Scores and standings are decided based on benchmark results • The goal is to get the best score Benchmarker (scorer) Contestant VM (running a Ruby webapp) benchmark requests 
 sent at an intensive rate for 60 seconds

Slide 8

Slide 8 text

Day 2 11:00-

Slide 9

Slide 9 text

The Initial State • 3 VMs (Computing Instances) • Ruby • Sinatra • Puma • Nginx • MySQL • Implementations in other languages • Python, Go, Node.js, ... • Initial performance is designed to be not so different

Slide 10

Slide 10 text

"Make this webapp server as fast as possible - 
 but no scaling up nor out."

Slide 11

Slide 11 text

Hack around with Ruby code Run a benchmark, get your score

Slide 12

Slide 12 text

Hack around with Ruby code Run a benchmark, get your score The highest score wins!

Slide 13

Slide 13 text

Will talks/Won't talks • Why Ruby is a great language to compete in ISUCON • How to track down and pro fi le slow code on the CRuby level • What future Rubies need to shine in ISUCON Wills Won't • Con fi guring Nginx, Linux, etc... • Monitoring on the system level • Itamae recipes we made for ISUCON

Slide 14

Slide 14 text

(Almost) Everything is permitted • Add effective RDBMS indexes • Kill N+1 SQL Queries • → user_ids.each {|id| query("select * from users where id = ?", id) } • → query("select * from users where id in (?)", user_ids) • Replace suboptimal algorithms • Utilize Server Resources (cpu, memory) to the last drop • Adding Puma threads/processes • Caching • Upgrading to ruby/ruby master • (Adding VMs and scaling VMs up are prohibited)

Slide 15

Slide 15 text

(Almost) Everything is permitted • Add effective RDBMS indexes • Kill N+1 SQL Queries • → user_ids.each {|id| query("select * from users where id = ?", id) } • → query("select * from users where id in (?)", user_ids) • Replace suboptimal algorithms • Utilize Server Resources (cpu, memory) to the last drop • Adding Puma threads/processes • Caching • Upgrading to ruby/ruby master • (Adding VMs and scaling VMs up are prohibited) But where should we start from?

Slide 16

Slide 16 text

Hack around with Ruby code Run a benchmark, get your score See pro fi ling results, 
 think what next

Slide 17

Slide 17 text

Dancing with Ruby

Slide 18

Slide 18 text

Dancing with Ruby • You'll be given 500+ lines of Sinatra code • and need to make it really fast within 8 hours • This is where Ruby really shines

Slide 19

Slide 19 text

Ruby Go • Same code, different language • Save reading time and writing time!

Slide 20

Slide 20 text

The mighty Array, Hash, Enumerable • #map • #each_with_object • #sort_by! • Almost anything is possible • Comes in really handy when 
 tacking N+1 queries 
 (without ActiveRecord)

Slide 21

Slide 21 text

Monkey patching • Need something in the Standard Library? Build it on site!

Slide 22

Slide 22 text

Monkey patching • Need something in the Standard Library? Build it on site! binding.irb/pry in production • Debugging work fl ow: • Stop real benchmark requests • Write code in binding.irb/pry using real requests • Con fi rm it works • Copy to editor and save (Note: I don't do this at work - just a competition technique 😁)

Slide 23

Slide 23 text

Challenges in Ruby

Slide 24

Slide 24 text

START BENCHMARK

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

• This... happens • It's fi xable, no worries • Let's hope RBS, TypeProf and other projects solve this problem

Slide 29

Slide 29 text

Pro fi ling the accurate bottlenecks Utilizing 100% cpu in Ruby Achieving high concurrency in Ruby Performance Challenges in Ruby

Slide 30

Slide 30 text

Don't guess, measure! • Random improvements will take you nowhere • Spending time to fi x non-real problems is the 
 last thing you want to do in an 8-hour timeframe • Accurate pro fi ling is the key

Slide 31

Slide 31 text

Pro fi ler choices • Tracing pro fi lers • Tracks everything, but huge performance impact • ruby-prof based on TracePoint • Sampling pro fi lers • Collects samples every 10-100ms, 
 small performance impact • Stackprof (cpu, wall, memory) • based on rb_pro fi le_frames() API • rbspy (wall) • runs as a separate process and reads ruby memory (process_vm_readv(2))

Slide 32

Slide 32 text

Let's pro fi le! Flamegraph Visualizer: jlfwong/speedscope

Slide 33

Slide 33 text

Let's pro fi le! Benchmark window (60 seconds) Time spent in particular handler (space for optimization?) POST /api/condition/...

Slide 34

Slide 34 text

Let's pro fi le! POST /api/condition/... N+1 INSERT query found!

Slide 35

Slide 35 text

Let's pro fi le! POST /api/condition/... N+1 INSERT query found! 👍

Slide 36

Slide 36 text

Flamegraph source: rb_pro fi le_frames() • Stackprof utilizes the rb_pro fi le_frames() API • Returns the call stack that was running when rb_pro fi le_frames() was called • Stackprof calls rb_pro fi le_frames() on SIGPROF (cpu mode) or SIGALRM (wall mode) timers ti me a() a() a() a() b() b() c() 📸 rb_pro fi le_frames() Records call stack [a(), b()]

Slide 37

Slide 37 text

• rb_pro fi le_frames (Stackprof) is inaccurate when I/O comes into action burn_cpu Thread#join No io!?

Slide 38

Slide 38 text

Multithreading in Ruby / GVL • Only 1 Thread can use the CPU at the same time • due to the Global VM Lock (GVL) • I/O ( fi le read/writes, network access, ...) can be performed in the background Thread 1 Thread 2 Thread 3 Release GVL (Do I/O) Release GVL (Do I/O) Acquire GVL Acquire GVL Use CPU Do 
 Do 
 Use 
 Wait Wait

Slide 39

Slide 39 text

Issues in rb_pro fi le_frames() • The current implementation of rb_pro fi le_frames() returns information about the last active Thread (which had the GVL) • Threads doing I/O have low chances to be targeted • Statistics ( fl amegraphs) built from continuous rb_pro fi le_frames() calls may be not accurate, especially when many Threads are doing I/Os • (even in wall mode!) 💥 = Stack frame peeked by rb_pro fi le_frames() Thread 1 Thread 2 Thread 3 Release GVL (Do I/O) Release GVL (Do I/O) Acquire GVL Acquire GVL 💥 = Stack frame peeked by rb_pro fi le_frames()

Slide 40

Slide 40 text

Issues in rb_pro fi le_frames() • Proposal: 
 Add rb_thread_pro fi le_frames() API, a per-thread version of rb_pro fi le_frames() • Accepts VALUE thread as arg • https://github.com/ruby/ruby/ pull/7784 Thread 1 Thread 2 Thread 3 Release GVL (Do I/O) Release GVL (Do I/O) Acquire GVL Acquire GVL 💥 = Stack frame peeked by rb_pro fi le_frames()

Slide 41

Slide 41 text

Pro fi ling the accurate bottlenecks Utilizing 100% cpu in Ruby Achieving high concurrency in Ruby Performance Challenges in Ruby

Slide 42

Slide 42 text

Utilizing 100% cpu in Ruby • Only 1 Thread can be active due to the GVL • CPU tasks and I/O can run simultaneously, but conditions are not always ideal • Resources are very limited in ISUCON; we want to squeeze everything out! 4 Ruby processes, 32 threads isn't enough to burn all CPU we want to see this (1 process, 8 threads in Go)

Slide 43

Slide 43 text

Reducing GVL wait • Less Threads = Less races for the GVL • Ok, Let's reduce Threads! • In the ISUCON webapp case, Puma is creating Threads • We can create more processes in place of Threads to keep the number of workers

Slide 44

Slide 44 text

Tuning Puma for 100% cpu • Adding Server (Puma) processes is effective, but consumes more memory • Memory is precious in ISUCON as MySQL lives in the same VM . . . . . . More Processes (better cpu utilization but more memory) More Threads (lesser memory consumption but suboptimal cpu util)

Slide 45

Slide 45 text

wip: Finding the process/thread balance using GVL stats • GVL event hooks were added in Ruby 3.2 • RUBY_INTERNAL_THREAD_EVENT_* • Shopify/gvltools, ivoanjo/gvl-tracing • I integrated this into pro fi ler results • Usable for tuning # of Puma threads? Try adding Puma threads Perfect! Check GVL wait Reduce threads, 
 add processes Low enough Somewhat high

Slide 46

Slide 46 text

Pro fi ling the accurate bottlenecks Utilizing 100% cpu in Ruby Achieving high concurrency in Ruby Performance Challenges in Ruby

Slide 47

Slide 47 text

Higher concurrency? • Falcon (socketry/async series) • Event-loop based async I/O for Ruby! • Sadly, we couldn't rewrite everything in 8 hours • Truf fl eRuby • ISUCON VMs didn't have enough memory for Truf fl eRuby 😔 • Arming Ractors for lesser GVL waits • Maybe my next challenge!

Slide 48

Slide 48 text

Does everyone use Ruby? (in ISUCON)

Slide 49

Slide 49 text

Does everyone use Ruby? (in ISUCON)

Slide 50

Slide 50 text

Quals Does everyone use Ruby? (in ISUCON)

Slide 51

Slide 51 text

Quals Finals My team! Does everyone use Ruby? (in ISUCON)

Slide 52

Slide 52 text

Ruby vs. Go • Go: Goroutines • Kind of a lightweight threads - this system makes high concurrency very easy • Go: Concurrency deeply embedded in the ecosystem • Contexts: Ability to cancel no longer needed MySQL queries • Timeouts: Same

Slide 53

Slide 53 text

Wrapping up • It's fun to write Ruby 😉 • Pro fi ling is important! • But it's more important to check if those pro fi les are accurate • Let's do ISUCON!

Slide 54

Slide 54 text

Acknowledgements @s4ichi and @koba789, 
 my long-time ISUCON teammates @ko1 and @mame, who gave us many valuable advice

Slide 55

Slide 55 text

Thank you!