Upgrade to Pro — share decks privately, control downloads, hide ads and more …

API Optimization Tale: Monitor, Fix and Deploy (on Friday). Italian Ruby Day

API Optimization Tale: Monitor, Fix and Deploy (on Friday). Italian Ruby Day

Slides for a talk from Italian Ruby Day: https://2021.rubyday.it/.

Abstract

I saw a green build on a Friday afternoon. I knew I need to push it to production before the weekend. My gut told me it was a trap. I had already stayed late to revert a broken deploy. I knew the risk.

In the middle of a service extraction project, we decided to migrate from REST to GraphQL and optimize API usage. My deploy was a part of this radical change.

Why was I deploying so late? How did we measure the migration effects? And why was I testing on production? I'll tell you a tale of small steps, monitoring, and old tricks in a new setting. Hope, despair, and broken production included.

Ba17945a06aac247b06548d5afe341e8?s=128

mrzasa

April 07, 2021
Tweet

Transcript

  1. API OPTIMIZATION TALE: API OPTIMIZATION TALE: MONITOR, FIX MONITOR, FIX

    AND DEPLOY AND DEPLOY (ON FRIDAY) (ON FRIDAY) MACIEK RZĄSA MACIEK RZĄSA TOPTAL TOPTAL Photo by on @MJRZASA @MJRZASA Tim Mossholder Unsplash
  2. FRIDAY FRIDAY 16:03 16:03

  3. Photo by on Max Baskakov Unsplash

  4. Photo by on Fabius Leibrock Unsplash

  5. None
  6. TOPTAL TOPTAL PLATFORM PLATFORM

  7. None
  8. EXTRACTION EXTRACTION

  9. EXTRACTION EXTRACTION

  10. FROM FROM TO TO EXTRACTION EXTRACTION class Product < ApplicationRecord

    has_many :billing_records end class BillingRecord < ApplicationRecord belongs_to :product end class Product < ApplicationRecord def billing_records @billing_records ||= ::Billing::QueryService .billing_records_for_products(self) end end class BillingRecord def product @product ||= Product.find(product_id) end end
  11. MONITOR MONITOR wait for it FIX: OPTIMIZE FIX: OPTIMIZE wait

    for it DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags .
  12. EXTRACTION: FIRST ATTEMPT EXTRACTION: FIRST ATTEMPT

  13. EXTRACTION: FIRST ATTEMPT EXTRACTION: FIRST ATTEMPT

  14. EXTRACTION: GRAPHQL EXTRACTION: GRAPHQL

  15. standard errors (Rollbar/Sentry) performance (NewRelic) custom request instrumentation (Kibana) method

    name arguments stacktrace response time (elapsed) error EXTRACTION: MONITORING EXTRACTION: MONITORING { "payload": { "method": "records_for_product", "arguments": "[[\"gid://platform/Product/12345\"]]", "stacktrace": "[ \"app/models/product.rb:123\", \"app/services/sell_product.rb:43\" ]", "elapsed": 1.128494586795568, "error": null } }
  16. MONITOR MONITOR standard stack (Rollbar/NewRelic) custom request instrumentation FIX: OPTIMIZE

    FIX: OPTIMIZE wait for it DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags .
  17. OPTIMIZE OPTIMIZE

  18. FLOOD OF REQUESTS FLOOD OF REQUESTS PROBLEM: SINGLE VIEW/JOB INITIATES

    PROBLEM: SINGLE VIEW/JOB INITIATES MANY BILLING REQUESTS MANY BILLING REQUESTS HOW MANY? THOUSANDS! HOW MANY? THOUSANDS!
  19. INITIAL INITIAL OPTIMIZED OPTIMIZED FLOOD OF REQUESTS FLOOD OF REQUESTS

    def perform(*) products = Product.eligible products.find_in_batches.each do |product| # one billing request per call DoBusinessLogic.call(product) end end class DoBusinessLogic def call(product) product.billing_records.each {} end end class Product < ApplicationRecord def billing_records @billing_records ||= ::Billing::QueryService .billing_records_for_products(self) end end def perform(*) products = Product.eligible products.find_in_batches do |batch| # one billing request per batch cache_billing_records(batch).each do |p| # no billing requests DoBusinessLogic.call(p) end end end def cache_billing_records(products) indexed_records = ::Billing::QueryService .billing_records_for_products( *products ) .group_by(&:product_gid) products.each do |product| product.cache_billing_records!( indexed_records[product.gid].to_a ) end end
  20. FLOOD OF FLOOD OF REQUESTS? REQUESTS? PRELOAD, PRELOAD, CACHE, CACHE,

    (HASH-)JOIN (HASH-)JOIN ◀ Ulf Michael Widenius, MySQL. Image source: wikipedia.org
  21. FREQUENTLY NEEDED DATA FREQUENTLY NEEDED DATA PROBLEM: SINGLE FIELD WAS

    FREQUENTLY USED PROBLEM: SINGLE FIELD WAS FREQUENTLY USED (~1K HITS PER DAY) (~1K HITS PER DAY)
  22. PLAN PLAN add field to kafka build a read model

    backfill the data start using the read model remove billing query SOLUTION SOLUTION find that date in local DB verify if it's really the same date use it and remove billing query FREQUENTLY NEEDED DATA FREQUENTLY NEEDED DATA # 1k billing hits per day ::Billing::QueryService .first_successful_record_created_at(client) &.in_time_zone&.to_date # one local DB query client .products.successful .minimum(:start_date)
  23. DATA NEEDED DATA NEEDED FREQUENTLY? FREQUENTLY? USE THE USE THE

    DOMAIN, LUKE! DOMAIN, LUKE! ◀ Image source: starwars.fandom.com
  24. DATA FLOOD DATA FLOOD PROBLEM: GENERIC QUERIES PROBLEM: GENERIC QUERIES

    FETCHING ALL THE DATA THAT MIGHT BE NEEDED FETCHING ALL THE DATA THAT MIGHT BE NEEDED
  25. DATA FLOOD DATA FLOOD # REST response { "gid": "gid://..."

    "clientGid": "gid://..." "productGid": "gid://..." "availability": true "pending": false "frequency": "weekly" "startDate": "2020-08-21" "endDate": "2020-10-28" # ... # 36 fields total # loading 3-4 associations } def billing_records_for_products(*products) fetch_billing_records( filter: {products: products} ).select(&:accessible?) end query($filter: RecordFilter!) { cycles(filter: $filter) { nodes { gid productGid pending frequency } } } def billing_records_for_products(*products) fetch_billing_records( filter: { products: products, accessible: true } ) end
  26. Photo by on Erik-Jan Leusink Unsplash

  27. WHAT COULD POSSIBLY GO WRONG? WHAT COULD POSSIBLY GO WRONG?

  28. WHAT COULD POSSIBLY GO WRONG? WHAT COULD POSSIBLY GO WRONG?

    ROOT CAUSE ROOT CAUSE FIX FIX # REST client get('/records', **params.slice(:product_gids)) # DB query in billing def billing_records(product_gids: nil, gids: nil, client_gid: nil) scope = ::BillingRecord scope = scope.where(product_gid: product_gids) if product_gids scope = scope.where(gid: gids) if gids scope = scope.where(client_gid: client_gid) if client_gid scope.all end def billing_records(product_gids: nil, gids: nil, client_gid: nil) return [] if [product_gids, gids, client_gid].all?(&:blank?) # ... end
  29. DATA FLOOD? DATA FLOOD? QUERY QUERY CUSTOMIZATION & CUSTOMIZATION &

    UNDERFETCHING UNDERFETCHING FILTERING ON THE FILTERING ON THE SERVER SIDE SERVER SIDE
  30. TIP? TIP? ALWAYS TEST ALWAYS TEST MANUALLY. MANUALLY. ALWAYS. ALWAYS.

  31. 429 TOO MANY REQUESTS 429 TOO MANY REQUESTS PROBLEM: SPIKE

    OF REQUESTS PROBLEM: SPIKE OF REQUESTS EVERY SUNDAY EVENING EVERY SUNDAY EVENING
  32. PROBLEM PROBLEM WEEK1. WEEK1. SOLUTION: PRELOADING SOLUTION: PRELOADING WEEK2. PROPER

    SOLUTION: JITTER WEEK2. PROPER SOLUTION: JITTER WEEK3. FINAL PROPER SOLUTION WEEK3. FINAL PROPER SOLUTION WEEK4. REALLY FINAL PROPER WEEK4. REALLY FINAL PROPER SOLUTION: RATE LIMITING SOLUTION: RATE LIMITING 429 TOO MANY REQUESTS 429 TOO MANY REQUESTS # scheduling at talent's 5 PM on Sunday eligible_products.each do |p| WeeklyReminder.schedule( product, day: :sunday, time: '17:00' ) end eligible_products.find_in_batches do |batch| with_billing_records_preloaded(batch) do batch.each do |product| WeeklyReminder.schedule( product, day: :sunday, time: '17:00' ) end end end # class WeeklyReminder def scheduling_time(*) super + (SecureRandom.rand * 120 - 60).seconds end # class AnotherWeeklyReminder def scheduling_time(*) super + (SecureRandom.rand * 120 - 60).seconds end Sidekiq::Limiter.window( 'weekly-reminder', RATE_LIMIT_COUNT, RATE_LIMIT_INTERVAL, wait_timeout: 2 )
  33. 429 TOO MANY 429 TOO MANY REQUESTS? REQUESTS? I DON'T

    ALWAYS TEST ON I DON'T ALWAYS TEST ON PRODUCTION PRODUCTION BUT WHEN I DO, I RUN BUT WHEN I DO, I RUN TESTS ON FRIDAY TESTS ON FRIDAY
  34. MONITOR MONITOR standard stack (Rollbar/NewRelic) custom request instrumentation FIX: OPTIMIZE

    FIX: OPTIMIZE preloading to avoid N+1 server-side filtering using local data underfetching spreading the load DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags . NIHIL NOVI SUB SOLE NIHIL NOVI SUB SOLE
  35. MONITOR MONITOR standard stack (Rollbar/NewRelic) custom request instrumentation FIX: OPTIMIZE

    FIX: OPTIMIZE preloading to avoid N+1 every ORM server-side filtering find_all{} vs where() using local data The Best Request Is No Request underfetching SELECT * vs SELECT a, b spreading the load highscalability.com post about YouTube, 2012 DEPLOY DEPLOY CI checks easy & reliable rollback safe env with a fallback feature flags NIHIL NOVI SUB SOLE NIHIL NOVI SUB SOLE
  36. FAIL OFTEN SO FAIL OFTEN SO YOU CAN YOU CAN

    SUCCEED SUCCEED SOONER SOONER Tom Kelley Photo: snikologiannis/Flickr; http://ow.ly/CHwhd
  37. MACIEK MACIEK RZĄSA RZĄSA Q&A Q&A  @mjrzasa