Save 37% off PRO during our Black Friday Sale! »

API Optimization Tale: Monitor, Fix and Deploy (on Friday). RailsConf 2021

Ba17945a06aac247b06548d5afe341e8?s=47 mrzasa
April 12, 2021

API Optimization Tale: Monitor, Fix and Deploy (on Friday). RailsConf 2021

Slides for a talk from RailsConf 2021: https://railsconf.com/program/sessions#session-1078

Abstract

I saw a green build on a Friday afternoon. I knew I need to push it to production before the weekend. My gut told me it was a trap. I had already stayed late to revert a broken deploy. I knew the risk.

In the middle of a service extraction project, we decided to migrate from REST to GraphQL and optimize API usage. My deploy was a part of this radical change.

Why was I deploying so late? How did we measure the migration effects? And why was I testing on production? I'll tell you a tale of small steps, monitoring, and old tricks in a new setting. Hope, despair, and broken production included.

Ba17945a06aac247b06548d5afe341e8?s=128

mrzasa

April 12, 2021
Tweet

Transcript

  1. API OPTIMIZATION TALE: API OPTIMIZATION TALE: MONITOR, FIX MONITOR, FIX

    AND DEPLOY AND DEPLOY (ON FRIDAY) (ON FRIDAY) MACIEK RZĄSA MACIEK RZĄSA TOPTAL TOPTAL Photo by on @MJRZASA @MJRZASA Tim Mossholder Unsplash
  2. FRIDAY FRIDAY 16:03 16:03

  3. Photo by on Max Baskakov Unsplash

  4. BACKEND ENGINEER BACKEND ENGINEER @ TOPTAL @ TOPTAL at work

    Ruby & Postgres & Elasticsearch service extraction after work Rzeszów Ruby User Group ( ) Rzeszów University of Technology software that matters, agile text processing, distributed systems rrug.pl
  5. TOPTAL TOPTAL PLATFORM PLATFORM

  6. None
  7. EXTRACTION EXTRACTION

  8. EXTRACTION EXTRACTION

  9. MONITOR MONITOR wait for it FIX: OPTIMIZE FIX: OPTIMIZE wait

    for it DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags .
  10. FROM FROM TO TO EXTRACTION EXTRACTION class Product < ApplicationRecord

    has_many :billing_records end class BillingRecord < ApplicationRecord belongs_to :product end class Product < ApplicationRecord def billing_records @billing_records ||= ::Billing::QueryService .billing_records_for_products(self) end end class BillingRecord def product @product ||= Product.find(product_id) end end
  11. EXTRACTION: FIRST ATTEMPT EXTRACTION: FIRST ATTEMPT

  12. EXTRACTION: FIRST ATTEMPT EXTRACTION: FIRST ATTEMPT

  13. EXTRACTION: GRAPHQL EXTRACTION: GRAPHQL

  14. standard errors (Rollbar/Sentry) performance (NewRelic) custom request instrumentation (Kibana) method

    name arguments stacktrace response time (elapsed) error EXTRACTION: MONITORING EXTRACTION: MONITORING { "payload": { "method": "records_for_product", "arguments": "[[\"gid://platform/Product/12345\"]]", "stacktrace": "[ \"app/models/product.rb:123\", \"app/services/sell_product.rb:43\" ]", "elapsed": 1.128494586795568, "error": null } }
  15. MONITOR MONITOR standard stack (Rollbar/NewRelic) custom request instrumentation FIX: OPTIMIZE

    FIX: OPTIMIZE wait for it DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags .
  16. OPTIMIZE OPTIMIZE

  17. FLOOD OF REQUESTS FLOOD OF REQUESTS PROBLEM: SINGLE VIEW/JOB INITIATES

    PROBLEM: SINGLE VIEW/JOB INITIATES MANY BILLING REQUESTS MANY BILLING REQUESTS HOW MANY? THOUSANDS! HOW MANY? THOUSANDS!
  18. INITIAL INITIAL OPTIMIZED OPTIMIZED FLOOD OF REQUESTS FLOOD OF REQUESTS

    def perform(*) products = Product.eligible products.find_in_batches.each do |product| # one billing request per call DoBusinessLogic.call(product) end end class DoBusinessLogic def call(product) product.billing_records.each {} end end class Product < ApplicationRecord def billing_records @billing_records ||= ::Billing::QueryService .billing_records_for_products(self) end end def perform(*) products = Product.eligible products.find_in_batches do |batch| # one billing request per batch cache_billing_records(batch).each do |p| # no billing requests DoBusinessLogic.call(p) end end end def cache_billing_records(products) indexed_records = ::Billing::QueryService .billing_records_for_products( *products ) .group_by(&:product_gid) products.each do |product| product.cache_billing_records!( indexed_records[product.gid].to_a ) end end
  19. FLOOD OF FLOOD OF REQUESTS? REQUESTS? PRELOAD FOR PRELOAD FOR

    A BATCH AND A BATCH AND (MEM)CACHE (MEM)CACHE ◀ Brad Fitzpatrick, Memcached. Image source: wikipedia.org
  20. FLOOD OF DB QUERIES FLOOD OF DB QUERIES PROBLEM: EVERY

    ELEMENT PROBLEM: EVERY ELEMENT OF A FETCHED COLLECTION (BILLING RECORD) OF A FETCHED COLLECTION (BILLING RECORD) NEEDED PLATFORM DATA (PRODUCT) NEEDED PLATFORM DATA (PRODUCT)
  21. INITIAL INITIAL OPTIMIZED OPTIMIZED FLOOD OF DB QUERIES FLOOD OF

    DB QUERIES def business_logic billing_records = ::Billing::QueryService .billing_records_for_products(*products) billing_records.each do |r| # one query to products table per call BusinessLogic.call(r, r.product) end end class BillingRecord attr_setter :product def product @product ||= Product.find(product_id) end end def business_logic # one query to products table here billing_records = ::Billing::QueryService .billing_records_for_products(*products) billing_records.each do |r| BusinessLogic.call(r, r.product) end end # Billing::QueryService def billing_records_for_products(products) products_by_gid = products.index_by(&:gid) billing_records = fetch_billing_records( product_gids: products_by_gid.keys ) billing_records.each do |billing_record| gid = billing_record.product_gid product = products_by_gid[gid] billing_record.product = product end end
  22. FLOOD OF DB FLOOD OF DB QUERIES? QUERIES? HASH JOINS

    HASH JOINS TO THE TO THE RESCUE! RESCUE! ◀ Ulf Michael Widenius, MySQL. Image source: wikipedia.org
  23. DATA FLOOD DATA FLOOD PROBLEM: GENERIC QUERIES PROBLEM: GENERIC QUERIES

    FETCHING ALL THE DATA THAT MIGHT BE NEEDED FETCHING ALL THE DATA THAT MIGHT BE NEEDED
  24. DATA FLOOD DATA FLOOD # REST response { "gid": "gid://..."

    "clientGid": "gid://..." "productGid": "gid://..." "availability": true "pending": false "frequency": "weekly" "startDate": "2020-08-21" "endDate": "2020-10-28" # ... # 36 fields total # loading 3-4 associations } def billing_records_for_products(*products) fetch_billing_records( filter: {products: products} ).select(&:accessible?) end query($filter: RecordFilter!) { cycles(filter: $filter) { nodes { gid productGid pending frequency } } } def billing_records_for_products(*products) fetch_billing_records( filter: { products: products, accessible: true } ) end
  25. WHAT COULD POSSIBLY GO WRONG? WHAT COULD POSSIBLY GO WRONG?

  26. WHAT COULD POSSIBLY GO WRONG? WHAT COULD POSSIBLY GO WRONG?

    ROOT CAUSE ROOT CAUSE FIX FIX # REST client get('/records', **params.slice(:product_gids)) # DB query in billing def billing_records(product_gids: nil, gids: nil, client_gid: nil) scope = ::BillingRecord scope = scope.where(product_gid: product_gids) if product_gids scope = scope.where(gid: gids) if gids scope = scope.where(client_gid: client_gid) if client_gid scope.all end def billing_records(product_gids: nil, gids: nil, client_gid: nil) return [] if [product_gids, gids, client_gid].all?(&:blank?) # ... end
  27. DATA FLOOD? DATA FLOOD? QUERY QUERY CUSTOMIZATION & CUSTOMIZATION &

    UNDERFETCHING UNDERFETCHING FILTERING ON THE FILTERING ON THE SERVER SIDE SERVER SIDE
  28. TIP? TIP? ALWAYS TEST ALWAYS TEST MANUALLY. MANUALLY. ALWAYS. ALWAYS.

  29. FREQUENTLY NEEDED DATA FREQUENTLY NEEDED DATA PROBLEM: SINGLE FIELD WAS

    FREQUENTLY USED PROBLEM: SINGLE FIELD WAS FREQUENTLY USED (~1K HITS PER DAY) (~1K HITS PER DAY)
  30. PLAN PLAN add field to kafka build a read model

    backfill the data start using the read model remove billing query SOLUTION SOLUTION find that date in local DB verify if it's really the same date use it and remove billing query FREQUENTLY NEEDED DATA FREQUENTLY NEEDED DATA # 1k billing hits per day ::Billing::QueryService .first_successful_record_created_at(client) &.in_time_zone&.to_date # one local DB query client .products.successful .minimum(:start_date)
  31. DATA NEEDED DATA NEEDED FREQUENTLY? FREQUENTLY? USE THE USE THE

    DOMAIN, LUKE! DOMAIN, LUKE! ◀ Image source: starwars.fandom.com
  32. 429 TOO MANY REQUESTS 429 TOO MANY REQUESTS PROBLEM: SPIKE

    OF REQUESTS PROBLEM: SPIKE OF REQUESTS EVERY SUNDAY EVENING EVERY SUNDAY EVENING
  33. PROBLEM PROBLEM WEEK1. WEEK1. SOLUTION: PRELOADING SOLUTION: PRELOADING WEEK2. PROPER

    SOLUTION: JITTER WEEK2. PROPER SOLUTION: JITTER WEEK3. FINAL PROPER SOLUTION WEEK3. FINAL PROPER SOLUTION WEEK4. REALLY FINAL PROPER WEEK4. REALLY FINAL PROPER SOLUTION: RATE LIMITING SOLUTION: RATE LIMITING 429 TOO MANY REQUESTS 429 TOO MANY REQUESTS # scheduling at talent's 5 PM on Sunday eligible_products.each do |p| WeeklyReminder.schedule( product, day: :sunday, time: '17:00' ) end eligible_products.find_in_batches do |batch| with_billing_records_preloaded(batch) do batch.each do |product| WeeklyReminder.schedule( product, day: :sunday, time: '17:00' ) end end end # class WeeklyReminder def scheduling_time(*) super + (SecureRandom.rand * 120 - 60).seconds end # class AnotherWeeklyReminder def scheduling_time(*) super + (SecureRandom.rand * 120 - 60).seconds end Sidekiq::Limiter.window( 'weekly-reminder', RATE_LIMIT_COUNT, RATE_LIMIT_INTERVAL, wait_timeout: 2 )
  34. 429 TOO MANY 429 TOO MANY REQUESTS? REQUESTS? I DON'T

    ALWAYS TEST ON I DON'T ALWAYS TEST ON PRODUCTION PRODUCTION BUT WHEN I DO, I RUN BUT WHEN I DO, I RUN TESTS ON FRIDAY TESTS ON FRIDAY
  35. MONITOR MONITOR standard stack (Rollbar/NewRelic) custom request instrumentation FIX: OPTIMIZE

    FIX: OPTIMIZE preloading to avoid N+1 app-level hash joins server-side filtering using local data underfetching spreading the load DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags . NIHIL NOVI SUB SOLE NIHIL NOVI SUB SOLE
  36. MONITOR MONITOR standard stack (Rollbar/NewRelic) custom request instrumentation FIX: OPTIMIZE

    FIX: OPTIMIZE preloading to avoid N+1 every ORM app-level hash joins even MySQL has hash-join server-side filtering find_all{} vs where() using local data The Best Request Is No Request underfetching SELECT * vs SELECT a, b spreading the load highscalability.com post about YouTube, 2012 DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags NIHIL NOVI SUB SOLE NIHIL NOVI SUB SOLE
  37. FAIL OFTEN SO FAIL OFTEN SO YOU CAN YOU CAN

    SUCCEED SUCCEED SOONER SOONER Tom Kelley Photo: snikologiannis/Flickr; http://ow.ly/CHwhd