$30 off During Our Annual Pro Sale. View Details »

API Optimization Tale: Monitor, Fix and Deploy (on Friday). Italian Ruby Day

mrzasa
April 07, 2021

API Optimization Tale: Monitor, Fix and Deploy (on Friday). Italian Ruby Day

Slides for a talk from Italian Ruby Day: https://2021.rubyday.it/.

Abstract

I saw a green build on a Friday afternoon. I knew I need to push it to production before the weekend. My gut told me it was a trap. I had already stayed late to revert a broken deploy. I knew the risk.

In the middle of a service extraction project, we decided to migrate from REST to GraphQL and optimize API usage. My deploy was a part of this radical change.

Why was I deploying so late? How did we measure the migration effects? And why was I testing on production? I'll tell you a tale of small steps, monitoring, and old tricks in a new setting. Hope, despair, and broken production included.

mrzasa

April 07, 2021
Tweet

More Decks by mrzasa

Other Decks in Programming

Transcript

  1. API OPTIMIZATION TALE:
    API OPTIMIZATION TALE:
    MONITOR, FIX
    MONITOR, FIX
    AND DEPLOY
    AND DEPLOY
    (ON FRIDAY)
    (ON FRIDAY)
    MACIEK RZĄSA
    MACIEK RZĄSA
    TOPTAL
    TOPTAL
    Photo by on
    @MJRZASA
    @MJRZASA
    Tim Mossholder Unsplash

    View Slide

  2. FRIDAY
    FRIDAY
    16:03
    16:03

    View Slide

  3. Photo by on
    Max Baskakov Unsplash

    View Slide

  4. Photo by on
    Fabius Leibrock Unsplash

    View Slide

  5. View Slide

  6. TOPTAL
    TOPTAL
    PLATFORM
    PLATFORM

    View Slide

  7. View Slide

  8. EXTRACTION
    EXTRACTION

    View Slide

  9. EXTRACTION
    EXTRACTION

    View Slide

  10. FROM
    FROM TO
    TO
    EXTRACTION
    EXTRACTION
    class Product < ApplicationRecord
    has_many :billing_records
    end
    class BillingRecord < ApplicationRecord
    belongs_to :product
    end
    class Product < ApplicationRecord
    def billing_records
    @billing_records ||=
    ::Billing::QueryService
    .billing_records_for_products(self)
    end
    end
    class BillingRecord
    def product
    @product ||= Product.find(product_id)
    end
    end

    View Slide

  11. MONITOR
    MONITOR
    wait for it
    FIX: OPTIMIZE
    FIX: OPTIMIZE
    wait for it
    DEPLOY
    DEPLOY
    CI checks
    easy & reliable rollback
    safe env with a fallback
    feature flags
    .

    View Slide

  12. EXTRACTION: FIRST ATTEMPT
    EXTRACTION: FIRST ATTEMPT

    View Slide

  13. EXTRACTION: FIRST ATTEMPT
    EXTRACTION: FIRST ATTEMPT

    View Slide

  14. EXTRACTION: GRAPHQL
    EXTRACTION: GRAPHQL

    View Slide

  15. standard
    errors (Rollbar/Sentry)
    performance (NewRelic)
    custom request instrumentation (Kibana)
    method name
    arguments
    stacktrace
    response time (elapsed)
    error
    EXTRACTION: MONITORING
    EXTRACTION: MONITORING
    {
    "payload": {
    "method": "records_for_product",
    "arguments":
    "[[\"gid://platform/Product/12345\"]]",
    "stacktrace":
    "[
    \"app/models/product.rb:123\",
    \"app/services/sell_product.rb:43\"
    ]",
    "elapsed": 1.128494586795568,
    "error": null
    }
    }

    View Slide

  16. MONITOR
    MONITOR
    standard stack
    (Rollbar/NewRelic)
    custom request
    instrumentation
    FIX: OPTIMIZE
    FIX: OPTIMIZE
    wait for it
    DEPLOY
    DEPLOY
    CI checks
    easy & reliable rollback
    safe env with a fallback
    feature flags
    .

    View Slide

  17. OPTIMIZE
    OPTIMIZE

    View Slide

  18. FLOOD OF REQUESTS
    FLOOD OF REQUESTS
    PROBLEM: SINGLE VIEW/JOB INITIATES
    PROBLEM: SINGLE VIEW/JOB INITIATES
    MANY BILLING REQUESTS
    MANY BILLING REQUESTS
    HOW MANY? THOUSANDS!
    HOW MANY? THOUSANDS!

    View Slide

  19. INITIAL
    INITIAL OPTIMIZED
    OPTIMIZED
    FLOOD OF REQUESTS
    FLOOD OF REQUESTS
    def perform(*)
    products = Product.eligible
    products.find_in_batches.each do |product|
    # one billing request per call
    DoBusinessLogic.call(product)
    end
    end
    class DoBusinessLogic
    def call(product)
    product.billing_records.each {}
    end
    end
    class Product < ApplicationRecord
    def billing_records
    @billing_records ||=
    ::Billing::QueryService
    .billing_records_for_products(self)
    end
    end
    def perform(*)
    products = Product.eligible
    products.find_in_batches do |batch|
    # one billing request per batch
    cache_billing_records(batch).each do |p|
    # no billing requests
    DoBusinessLogic.call(p)
    end
    end
    end
    def cache_billing_records(products)
    indexed_records =
    ::Billing::QueryService
    .billing_records_for_products(
    *products
    )
    .group_by(&:product_gid)
    products.each do |product|
    product.cache_billing_records!(
    indexed_records[product.gid].to_a
    )
    end
    end

    View Slide

  20. FLOOD OF
    FLOOD OF
    REQUESTS?
    REQUESTS?
    PRELOAD,
    PRELOAD,
    CACHE,
    CACHE,
    (HASH-)JOIN
    (HASH-)JOIN
    ◀ Ulf Michael Widenius, MySQL. Image source: wikipedia.org

    View Slide

  21. FREQUENTLY NEEDED DATA
    FREQUENTLY NEEDED DATA
    PROBLEM: SINGLE FIELD WAS FREQUENTLY USED
    PROBLEM: SINGLE FIELD WAS FREQUENTLY USED
    (~1K HITS PER DAY)
    (~1K HITS PER DAY)

    View Slide

  22. PLAN
    PLAN
    add field to kafka
    build a read model
    backfill the data
    start using the read model
    remove billing query
    SOLUTION
    SOLUTION
    find that date in local DB
    verify if it's really the same date
    use it and remove billing query
    FREQUENTLY NEEDED DATA
    FREQUENTLY NEEDED DATA
    # 1k billing hits per day
    ::Billing::QueryService
    .first_successful_record_created_at(client)
    &.in_time_zone&.to_date
    # one local DB query
    client
    .products.successful
    .minimum(:start_date)

    View Slide

  23. DATA NEEDED
    DATA NEEDED
    FREQUENTLY?
    FREQUENTLY?
    USE THE
    USE THE
    DOMAIN, LUKE!
    DOMAIN, LUKE!
    ◀ Image source: starwars.fandom.com

    View Slide

  24. DATA FLOOD
    DATA FLOOD
    PROBLEM: GENERIC QUERIES
    PROBLEM: GENERIC QUERIES
    FETCHING ALL THE DATA THAT MIGHT BE NEEDED
    FETCHING ALL THE DATA THAT MIGHT BE NEEDED

    View Slide

  25. DATA FLOOD
    DATA FLOOD
    # REST response
    {
    "gid": "gid://..."
    "clientGid": "gid://..."
    "productGid": "gid://..."
    "availability": true
    "pending": false
    "frequency": "weekly"
    "startDate": "2020-08-21"
    "endDate": "2020-10-28"
    # ...
    # 36 fields total
    # loading 3-4 associations
    }
    def billing_records_for_products(*products)
    fetch_billing_records(
    filter: {products: products}
    ).select(&:accessible?)
    end
    query($filter: RecordFilter!) {
    cycles(filter: $filter) {
    nodes {
    gid
    productGid
    pending
    frequency
    }
    }
    }
    def billing_records_for_products(*products)
    fetch_billing_records(
    filter: {
    products: products,
    accessible: true
    }
    )
    end

    View Slide

  26. Photo by on
    Erik-Jan Leusink Unsplash

    View Slide

  27. WHAT COULD POSSIBLY GO WRONG?
    WHAT COULD POSSIBLY GO WRONG?

    View Slide

  28. WHAT COULD POSSIBLY GO WRONG?
    WHAT COULD POSSIBLY GO WRONG?
    ROOT CAUSE
    ROOT CAUSE
    FIX
    FIX
    # REST client
    get('/records', **params.slice(:product_gids))
    # DB query in billing
    def billing_records(product_gids: nil, gids: nil, client_gid: nil)
    scope = ::BillingRecord
    scope = scope.where(product_gid: product_gids) if product_gids
    scope = scope.where(gid: gids) if gids
    scope = scope.where(client_gid: client_gid) if client_gid
    scope.all
    end
    def billing_records(product_gids: nil, gids: nil, client_gid: nil)
    return [] if [product_gids, gids, client_gid].all?(&:blank?)
    # ...
    end

    View Slide

  29. DATA FLOOD?
    DATA FLOOD?
    QUERY
    QUERY
    CUSTOMIZATION &
    CUSTOMIZATION &
    UNDERFETCHING
    UNDERFETCHING
    FILTERING ON THE
    FILTERING ON THE
    SERVER SIDE
    SERVER SIDE

    View Slide

  30. TIP?
    TIP?
    ALWAYS TEST
    ALWAYS TEST
    MANUALLY.
    MANUALLY.
    ALWAYS.
    ALWAYS.

    View Slide

  31. 429 TOO MANY REQUESTS
    429 TOO MANY REQUESTS
    PROBLEM: SPIKE OF REQUESTS
    PROBLEM: SPIKE OF REQUESTS
    EVERY SUNDAY EVENING
    EVERY SUNDAY EVENING

    View Slide

  32. PROBLEM
    PROBLEM
    WEEK1.
    WEEK1. SOLUTION: PRELOADING
    SOLUTION: PRELOADING
    WEEK2. PROPER SOLUTION: JITTER
    WEEK2. PROPER SOLUTION: JITTER
    WEEK3. FINAL PROPER SOLUTION
    WEEK3. FINAL PROPER SOLUTION
    WEEK4. REALLY FINAL PROPER
    WEEK4. REALLY FINAL PROPER
    SOLUTION: RATE LIMITING
    SOLUTION: RATE LIMITING
    429 TOO MANY REQUESTS
    429 TOO MANY REQUESTS
    # scheduling at talent's 5 PM on Sunday
    eligible_products.each do |p|
    WeeklyReminder.schedule(
    product, day: :sunday, time: '17:00'
    )
    end
    eligible_products.find_in_batches do |batch|
    with_billing_records_preloaded(batch) do
    batch.each do |product|
    WeeklyReminder.schedule(
    product, day: :sunday, time: '17:00'
    )
    end
    end
    end
    # class WeeklyReminder
    def scheduling_time(*)
    super +
    (SecureRandom.rand * 120 - 60).seconds
    end
    # class AnotherWeeklyReminder
    def scheduling_time(*)
    super +
    (SecureRandom.rand * 120 - 60).seconds
    end
    Sidekiq::Limiter.window(
    'weekly-reminder',
    RATE_LIMIT_COUNT,
    RATE_LIMIT_INTERVAL,
    wait_timeout: 2
    )

    View Slide

  33. 429 TOO MANY
    429 TOO MANY
    REQUESTS?
    REQUESTS?
    I DON'T ALWAYS TEST ON
    I DON'T ALWAYS TEST ON
    PRODUCTION
    PRODUCTION
    BUT WHEN I DO, I RUN
    BUT WHEN I DO, I RUN
    TESTS ON FRIDAY
    TESTS ON FRIDAY

    View Slide

  34. MONITOR
    MONITOR
    standard stack
    (Rollbar/NewRelic)
    custom request
    instrumentation
    FIX: OPTIMIZE
    FIX: OPTIMIZE
    preloading to avoid N+1
    server-side filtering
    using local data
    underfetching
    spreading the load
    DEPLOY
    DEPLOY
    CI checks
    easy & reliable rollback
    safe env with a fallback
    feature flags
    .
    NIHIL NOVI SUB SOLE
    NIHIL NOVI SUB SOLE

    View Slide

  35. MONITOR
    MONITOR
    standard stack
    (Rollbar/NewRelic)
    custom request
    instrumentation
    FIX: OPTIMIZE
    FIX: OPTIMIZE
    preloading to avoid N+1
    every ORM
    server-side filtering
    find_all{} vs where()
    using local data
    The Best Request Is No Request
    underfetching
    SELECT * vs
    SELECT a, b
    spreading the load
    highscalability.com post about
    YouTube, 2012
    DEPLOY
    DEPLOY
    CI checks
    easy & reliable rollback
    safe env with a fallback
    feature flags
    NIHIL NOVI SUB SOLE
    NIHIL NOVI SUB SOLE

    View Slide

  36. FAIL OFTEN SO
    FAIL OFTEN SO
    YOU CAN
    YOU CAN
    SUCCEED
    SUCCEED
    SOONER
    SOONER
    Tom Kelley
    Photo: snikologiannis/Flickr; http://ow.ly/CHwhd

    View Slide

  37. MACIEK
    MACIEK
    RZĄSA
    RZĄSA
    Q&A
    Q&A
     @mjrzasa

    View Slide