Upgrade to Pro — share decks privately, control downloads, hide ads and more …

API Optimization Tale: Monitor, Fix and Deploy (on Friday). RailsConf 2021

mrzasa
April 12, 2021

API Optimization Tale: Monitor, Fix and Deploy (on Friday). RailsConf 2021

Slides for a talk from RailsConf 2021: https://railsconf.com/program/sessions#session-1078

Abstract

I saw a green build on a Friday afternoon. I knew I need to push it to production before the weekend. My gut told me it was a trap. I had already stayed late to revert a broken deploy. I knew the risk.

In the middle of a service extraction project, we decided to migrate from REST to GraphQL and optimize API usage. My deploy was a part of this radical change.

Why was I deploying so late? How did we measure the migration effects? And why was I testing on production? I'll tell you a tale of small steps, monitoring, and old tricks in a new setting. Hope, despair, and broken production included.

mrzasa

April 12, 2021
Tweet

More Decks by mrzasa

Other Decks in Programming

Transcript

  1. API OPTIMIZATION TALE:
    API OPTIMIZATION TALE:
    MONITOR, FIX
    MONITOR, FIX
    AND DEPLOY
    AND DEPLOY
    (ON FRIDAY)
    (ON FRIDAY)
    MACIEK RZĄSA
    MACIEK RZĄSA
    TOPTAL
    TOPTAL
    Photo by on
    @MJRZASA
    @MJRZASA
    Tim Mossholder Unsplash

    View full-size slide

  2. FRIDAY
    FRIDAY
    16:03
    16:03

    View full-size slide

  3. Photo by on
    Max Baskakov Unsplash

    View full-size slide

  4. BACKEND ENGINEER
    BACKEND ENGINEER
    @ TOPTAL
    @ TOPTAL
    at work
    Ruby & Postgres & Elasticsearch
    service extraction
    after work
    Rzeszów Ruby User Group ( )
    Rzeszów University of Technology
    software that matters, agile
    text processing, distributed systems
    rrug.pl

    View full-size slide

  5. TOPTAL
    TOPTAL
    PLATFORM
    PLATFORM

    View full-size slide

  6. EXTRACTION
    EXTRACTION

    View full-size slide

  7. EXTRACTION
    EXTRACTION

    View full-size slide

  8. MONITOR
    MONITOR
    wait for it
    FIX: OPTIMIZE
    FIX: OPTIMIZE
    wait for it
    DEPLOY
    DEPLOY
    CI checks
    easy & reliable rollback
    safe env with a fallback
    feature flags
    .

    View full-size slide

  9. FROM
    FROM TO
    TO
    EXTRACTION
    EXTRACTION
    class Product < ApplicationRecord
    has_many :billing_records
    end
    class BillingRecord < ApplicationRecord
    belongs_to :product
    end
    class Product < ApplicationRecord
    def billing_records
    @billing_records ||=
    ::Billing::QueryService
    .billing_records_for_products(self)
    end
    end
    class BillingRecord
    def product
    @product ||= Product.find(product_id)
    end
    end

    View full-size slide

  10. EXTRACTION: FIRST ATTEMPT
    EXTRACTION: FIRST ATTEMPT

    View full-size slide

  11. EXTRACTION: FIRST ATTEMPT
    EXTRACTION: FIRST ATTEMPT

    View full-size slide

  12. EXTRACTION: GRAPHQL
    EXTRACTION: GRAPHQL

    View full-size slide

  13. standard
    errors (Rollbar/Sentry)
    performance (NewRelic)
    custom request instrumentation (Kibana)
    method name
    arguments
    stacktrace
    response time (elapsed)
    error
    EXTRACTION: MONITORING
    EXTRACTION: MONITORING
    {
    "payload": {
    "method": "records_for_product",
    "arguments":
    "[[\"gid://platform/Product/12345\"]]",
    "stacktrace":
    "[
    \"app/models/product.rb:123\",
    \"app/services/sell_product.rb:43\"
    ]",
    "elapsed": 1.128494586795568,
    "error": null
    }
    }

    View full-size slide

  14. MONITOR
    MONITOR
    standard stack
    (Rollbar/NewRelic)
    custom request
    instrumentation
    FIX: OPTIMIZE
    FIX: OPTIMIZE
    wait for it
    DEPLOY
    DEPLOY
    CI checks
    easy & reliable rollback
    safe env with a fallback
    feature flags
    .

    View full-size slide

  15. OPTIMIZE
    OPTIMIZE

    View full-size slide

  16. FLOOD OF REQUESTS
    FLOOD OF REQUESTS
    PROBLEM: SINGLE VIEW/JOB INITIATES
    PROBLEM: SINGLE VIEW/JOB INITIATES
    MANY BILLING REQUESTS
    MANY BILLING REQUESTS
    HOW MANY? THOUSANDS!
    HOW MANY? THOUSANDS!

    View full-size slide

  17. INITIAL
    INITIAL OPTIMIZED
    OPTIMIZED
    FLOOD OF REQUESTS
    FLOOD OF REQUESTS
    def perform(*)
    products = Product.eligible
    products.find_in_batches.each do |product|
    # one billing request per call
    DoBusinessLogic.call(product)
    end
    end
    class DoBusinessLogic
    def call(product)
    product.billing_records.each {}
    end
    end
    class Product < ApplicationRecord
    def billing_records
    @billing_records ||=
    ::Billing::QueryService
    .billing_records_for_products(self)
    end
    end
    def perform(*)
    products = Product.eligible
    products.find_in_batches do |batch|
    # one billing request per batch
    cache_billing_records(batch).each do |p|
    # no billing requests
    DoBusinessLogic.call(p)
    end
    end
    end
    def cache_billing_records(products)
    indexed_records =
    ::Billing::QueryService
    .billing_records_for_products(
    *products
    )
    .group_by(&:product_gid)
    products.each do |product|
    product.cache_billing_records!(
    indexed_records[product.gid].to_a
    )
    end
    end

    View full-size slide

  18. FLOOD OF
    FLOOD OF
    REQUESTS?
    REQUESTS?
    PRELOAD FOR
    PRELOAD FOR
    A BATCH AND
    A BATCH AND
    (MEM)CACHE
    (MEM)CACHE
    ◀ Brad Fitzpatrick, Memcached. Image source: wikipedia.org

    View full-size slide

  19. FLOOD OF DB QUERIES
    FLOOD OF DB QUERIES
    PROBLEM: EVERY ELEMENT
    PROBLEM: EVERY ELEMENT
    OF A FETCHED COLLECTION (BILLING RECORD)
    OF A FETCHED COLLECTION (BILLING RECORD)
    NEEDED PLATFORM DATA (PRODUCT)
    NEEDED PLATFORM DATA (PRODUCT)

    View full-size slide

  20. INITIAL
    INITIAL OPTIMIZED
    OPTIMIZED
    FLOOD OF DB QUERIES
    FLOOD OF DB QUERIES
    def business_logic
    billing_records = ::Billing::QueryService
    .billing_records_for_products(*products)
    billing_records.each do |r|
    # one query to products table per call
    BusinessLogic.call(r, r.product)
    end
    end
    class BillingRecord
    attr_setter :product
    def product
    @product ||= Product.find(product_id)
    end
    end
    def business_logic
    # one query to products table here
    billing_records = ::Billing::QueryService
    .billing_records_for_products(*products)
    billing_records.each do |r|
    BusinessLogic.call(r, r.product)
    end
    end
    # Billing::QueryService
    def billing_records_for_products(products)
    products_by_gid =
    products.index_by(&:gid)
    billing_records =
    fetch_billing_records(
    product_gids: products_by_gid.keys
    )
    billing_records.each do |billing_record|
    gid = billing_record.product_gid
    product = products_by_gid[gid]
    billing_record.product = product
    end
    end

    View full-size slide

  21. FLOOD OF DB
    FLOOD OF DB
    QUERIES?
    QUERIES?
    HASH JOINS
    HASH JOINS
    TO THE
    TO THE
    RESCUE!
    RESCUE!
    ◀ Ulf Michael Widenius, MySQL. Image source: wikipedia.org

    View full-size slide

  22. DATA FLOOD
    DATA FLOOD
    PROBLEM: GENERIC QUERIES
    PROBLEM: GENERIC QUERIES
    FETCHING ALL THE DATA THAT MIGHT BE NEEDED
    FETCHING ALL THE DATA THAT MIGHT BE NEEDED

    View full-size slide

  23. DATA FLOOD
    DATA FLOOD
    # REST response
    {
    "gid": "gid://..."
    "clientGid": "gid://..."
    "productGid": "gid://..."
    "availability": true
    "pending": false
    "frequency": "weekly"
    "startDate": "2020-08-21"
    "endDate": "2020-10-28"
    # ...
    # 36 fields total
    # loading 3-4 associations
    }
    def billing_records_for_products(*products)
    fetch_billing_records(
    filter: {products: products}
    ).select(&:accessible?)
    end
    query($filter: RecordFilter!) {
    cycles(filter: $filter) {
    nodes {
    gid
    productGid
    pending
    frequency
    }
    }
    }
    def billing_records_for_products(*products)
    fetch_billing_records(
    filter: {
    products: products,
    accessible: true
    }
    )
    end

    View full-size slide

  24. WHAT COULD POSSIBLY GO WRONG?
    WHAT COULD POSSIBLY GO WRONG?

    View full-size slide

  25. WHAT COULD POSSIBLY GO WRONG?
    WHAT COULD POSSIBLY GO WRONG?
    ROOT CAUSE
    ROOT CAUSE
    FIX
    FIX
    # REST client
    get('/records', **params.slice(:product_gids))
    # DB query in billing
    def billing_records(product_gids: nil, gids: nil, client_gid: nil)
    scope = ::BillingRecord
    scope = scope.where(product_gid: product_gids) if product_gids
    scope = scope.where(gid: gids) if gids
    scope = scope.where(client_gid: client_gid) if client_gid
    scope.all
    end
    def billing_records(product_gids: nil, gids: nil, client_gid: nil)
    return [] if [product_gids, gids, client_gid].all?(&:blank?)
    # ...
    end

    View full-size slide

  26. DATA FLOOD?
    DATA FLOOD?
    QUERY
    QUERY
    CUSTOMIZATION &
    CUSTOMIZATION &
    UNDERFETCHING
    UNDERFETCHING
    FILTERING ON THE
    FILTERING ON THE
    SERVER SIDE
    SERVER SIDE

    View full-size slide

  27. TIP?
    TIP?
    ALWAYS TEST
    ALWAYS TEST
    MANUALLY.
    MANUALLY.
    ALWAYS.
    ALWAYS.

    View full-size slide

  28. FREQUENTLY NEEDED DATA
    FREQUENTLY NEEDED DATA
    PROBLEM: SINGLE FIELD WAS FREQUENTLY USED
    PROBLEM: SINGLE FIELD WAS FREQUENTLY USED
    (~1K HITS PER DAY)
    (~1K HITS PER DAY)

    View full-size slide

  29. PLAN
    PLAN
    add field to kafka
    build a read model
    backfill the data
    start using the read model
    remove billing query
    SOLUTION
    SOLUTION
    find that date in local DB
    verify if it's really the same date
    use it and remove billing query
    FREQUENTLY NEEDED DATA
    FREQUENTLY NEEDED DATA
    # 1k billing hits per day
    ::Billing::QueryService
    .first_successful_record_created_at(client)
    &.in_time_zone&.to_date
    # one local DB query
    client
    .products.successful
    .minimum(:start_date)

    View full-size slide

  30. DATA NEEDED
    DATA NEEDED
    FREQUENTLY?
    FREQUENTLY?
    USE THE
    USE THE
    DOMAIN, LUKE!
    DOMAIN, LUKE!
    ◀ Image source: starwars.fandom.com

    View full-size slide

  31. 429 TOO MANY REQUESTS
    429 TOO MANY REQUESTS
    PROBLEM: SPIKE OF REQUESTS
    PROBLEM: SPIKE OF REQUESTS
    EVERY SUNDAY EVENING
    EVERY SUNDAY EVENING

    View full-size slide

  32. PROBLEM
    PROBLEM
    WEEK1.
    WEEK1. SOLUTION: PRELOADING
    SOLUTION: PRELOADING
    WEEK2. PROPER SOLUTION: JITTER
    WEEK2. PROPER SOLUTION: JITTER
    WEEK3. FINAL PROPER SOLUTION
    WEEK3. FINAL PROPER SOLUTION
    WEEK4. REALLY FINAL PROPER
    WEEK4. REALLY FINAL PROPER
    SOLUTION: RATE LIMITING
    SOLUTION: RATE LIMITING
    429 TOO MANY REQUESTS
    429 TOO MANY REQUESTS
    # scheduling at talent's 5 PM on Sunday
    eligible_products.each do |p|
    WeeklyReminder.schedule(
    product, day: :sunday, time: '17:00'
    )
    end
    eligible_products.find_in_batches do |batch|
    with_billing_records_preloaded(batch) do
    batch.each do |product|
    WeeklyReminder.schedule(
    product, day: :sunday, time: '17:00'
    )
    end
    end
    end
    # class WeeklyReminder
    def scheduling_time(*)
    super +
    (SecureRandom.rand * 120 - 60).seconds
    end
    # class AnotherWeeklyReminder
    def scheduling_time(*)
    super +
    (SecureRandom.rand * 120 - 60).seconds
    end
    Sidekiq::Limiter.window(
    'weekly-reminder',
    RATE_LIMIT_COUNT,
    RATE_LIMIT_INTERVAL,
    wait_timeout: 2
    )

    View full-size slide

  33. 429 TOO MANY
    429 TOO MANY
    REQUESTS?
    REQUESTS?
    I DON'T ALWAYS TEST ON
    I DON'T ALWAYS TEST ON
    PRODUCTION
    PRODUCTION
    BUT WHEN I DO, I RUN
    BUT WHEN I DO, I RUN
    TESTS ON FRIDAY
    TESTS ON FRIDAY

    View full-size slide

  34. MONITOR
    MONITOR
    standard stack
    (Rollbar/NewRelic)
    custom request
    instrumentation
    FIX: OPTIMIZE
    FIX: OPTIMIZE
    preloading to avoid N+1
    app-level hash joins
    server-side filtering
    using local data
    underfetching
    spreading the load
    DEPLOY
    DEPLOY
    CI checks
    easy & reliable rollback
    safe env with a fallback
    feature flags
    .
    NIHIL NOVI SUB SOLE
    NIHIL NOVI SUB SOLE

    View full-size slide

  35. MONITOR
    MONITOR
    standard stack
    (Rollbar/NewRelic)
    custom request
    instrumentation
    FIX: OPTIMIZE
    FIX: OPTIMIZE
    preloading to avoid N+1
    every ORM
    app-level hash joins
    even MySQL has hash-join
    server-side filtering
    find_all{} vs where()
    using local data
    The Best Request Is No Request
    underfetching
    SELECT * vs
    SELECT a, b
    spreading the load
    highscalability.com post about
    YouTube, 2012
    DEPLOY
    DEPLOY
    CI checks
    easy & reliable rollback
    safe env with a fallback
    feature flags
    NIHIL NOVI SUB SOLE
    NIHIL NOVI SUB SOLE

    View full-size slide

  36. FAIL OFTEN SO
    FAIL OFTEN SO
    YOU CAN
    YOU CAN
    SUCCEED
    SUCCEED
    SOONER
    SOONER
    Tom Kelley
    Photo: snikologiannis/Flickr; http://ow.ly/CHwhd

    View full-size slide