Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Riak to 25MM Ops/Day at Kiip

Scaling Riak to 25MM Ops/Day at Kiip

This talk goes over how we scaled one part of our technology stack at Kiip over the last 18 months, and how we ended up on Riak for this specific use case.

Mitchell Hashimoto

May 23, 2012
Tweet

More Decks by Mitchell Hashimoto

Other Decks in Programming

Transcript

  1. Scaling Riak to
    25MM Ops/Day at Kiip

    View Slide

  2. Armon Dadgar
    @armondadgar
    Mitchell Hashimoto
    @mitchellh

    View Slide

  3. View Slide

  4. View Slide

  5. API Flow
    Session Start
    Moment
    Reward
    Session End
    0..n times

    View Slide

  6. The Numbers
    x million unique devices per day
    About 4 API calls per session
    = ~25 million API calls per day

    View Slide

  7. The Journey of Scale
    A Story of MongoDB
    * Let’s talk about our journey
    scaling, specifically with
    MongoDB.
    * We started with MySQL, but
    switched before we had any real
    traffic to MongoDB.

    View Slide

  8. 1. Write Limit Hit by Analytics
    * Analytics sent hundreds of
    atomic updates per second.
    * Hit limit w/ global write
    lock.
    * Solution: Aggregate over 10
    seconds, send small bursts of
    updates, resulting in lower lock
    % on average.
    Solution: Aggregate over 10 seconds.

    View Slide

  9. 2. Too Many Reads (1000s/s)
    * We were reading too much,
    hit max throughput of
    MongoDB.
    * Solution: Cache everywhere.
    Solution: Heavy Caching.

    View Slide

  10. 3. Slow, Uncachable Queries
    Example: “Has device X played game
    Y today/last week/this month/in all
    time?”
    * Touches lots of data
    * Requires lots of index space
    * Not cachable
    * MongoDB just... slow.
    Solution: Bloom filters.
    Solution: Bloom filters!

    View Slide

  11. 4. Write Limit Hit, Again
    Basic model updates were hitting
    MongoDB’s write throughput limit.
    Solution: Use two distinct
    MongoDB clusters for disjunct
    datasets to avoid global write lock.
    One for analytics (heavy writes).
    One for everything else.
    Solution: Two clusters (lol global write lock)

    View Slide

  12. 5. Index Size Hit Memory Limits
    We didn’t vertically scale because
    we’re pretty operationally frugal
    and the data was growing very
    fast.
    ETL = Extract/Transform/Load,
    archive data to S3, remove from
    main DB.
    Solution: ETL, Reap old data.

    View Slide

  13. 6. ETL Overwhelmed
    ETL of 24 hours of data took longer
    than 24 hours to extract, limited
    by MongoDB read throughput.
    We decided to let it break and
    continue reaping data. Solved in the
    future by continuous ETL solution
    separate from our main DB.
    Solution: Punted, solved by custom solution

    View Slide

  14. 7. Central Bottleneck by Mongo
    Noticed that _all_ API response
    times were directly correlated to
    write load of MongoDB.
    Our only choice left here was to
    look into a new DB solution.
    Solution: Research new DBs!

    View Slide

  15. Researching a new DB

    View Slide

  16. RDBMS
    In the cloud, without
    horizontal scalability, I/O
    would hit a limit REAL fast.
    Didn’t want to deal with
    custom sharding layer.

    View Slide

  17. Cassandra
    Our cofounders are from Digg.
    Enough said.

    View Slide

  18. HBase
    Saw PyCodeConf talk about system
    at Mozilla based on HBase. We
    talked to speaker:
    * Operational nightmare
    * Took 1 year
    * No JVM experience at Kiip
    Not reasonable, for us.

    View Slide

  19. CouchDB
    * No auto horizontal scaling, you
    have to do it at the app level.
    * Features weren’t compelling
    (master/master syncing with
    phones, CouchApps, etc.).
    * We didn’t know anyone who used
    it.

    View Slide

  20. Riak
    * Attracted to solid academic foundation
    * Visited and talked with Basho developers.
    * Confident 100% in Basho team before
    even using product.
    * Meetups showed real world usage at scale
    + dev & ops happiness.

    View Slide

  21. Data Migration

    View Slide

  22. Identify Fast-Growing Data
    •Data we needed horizontally scalable
    •Session/Device data grew at exponential rate.
    •Move that data first, keep the rest in MongoDB
    (for now).

    View Slide

  23. Identify Fast-Growing Data
    Session Growth

    View Slide

  24. Session Migration

    View Slide

  25. Sessions First
    • Obviously K/V
    •Key: UUID, Value: JSON blob.
    •Larger and faster growing than devices.

    View Slide

  26. Data Access Patterns
    • By UUID (key) for all API calls
    • Fraud: By device ID and IP of session.
    • 2i compatible

    View Slide

  27. Update ORM
    • Added Riak backing store driver
    • No application-level changes were necessary
    • Riak Python client pains Python client pains:
    * Protocol buffer interface
    buggy
    * No keep alive (fixed)
    * Poor error handling (partially
    fixed, needs work)

    View Slide

  28. Migrate
    • Write new data to Riak
    • Read from Riak, fallback to MongoDB if
    missing
    • After one week, remove MongoDB read-only
    Didn’t migrate
    data because ETL
    sent it to S3
    anyways.

    View Slide

  29. Device Migration

    View Slide

  30. Devices
    • Huge
    • Growing
    • But... not obviously K/V.

    View Slide

  31. Not Obviously K/V
    • Canonical ID (UUID), assigned by us.
    • Vendor ID (ADID, UDID, etc.), assigned by
    device vendor.
    • Uniqueness constraint on each, so 2i not
    possible.

    View Slide

  32. Uniqueness in Riak, Part 1
    Device
    Key: Canonical ID
    Value: JSON Blob
    Device_UUID
    Key: Vendor ID
    Value: Canonical ID
    Simulate uniqueness using If-None-Match
    Cross fingers and hope consistency isn’t too bad.

    View Slide

  33. Part 1: Results
    FAILURE

    View Slide

  34. Part 1: Results
    • Latency: At least 200ms, at most 2000ms
    • Map/Reduce JS VMs quickly overwhelmed
    • Hundreds of inconsistencies per hour

    View Slide

  35. Uniqueness, Part 2
    • Just don’t do it.
    • Canonical ID = SHA1(Vendor ID)
    • Backfill old data (30MM rows, days of backfill)
    • Success, use Riak as a K/V store!

    View Slide

  36. Riak In Production
    Our experience over 3 months.

    View Slide

  37. DISCLAIMER
    Riak has been extremely solid.
    However, there are minor pain points that
    could and have been addressed.

    View Slide

  38. Scale Early
    * Latencies explode under
    heavy I/O. Attempting to add a
    new node adds more I/O
    pressure for handoff.
    * Add new nodes early.
    * Hard to know when just
    beginning. Watch your FSM
    latencies carefully.
    Scaling at the red line is painful.

    View Slide

  39. 2i is slow, don’t use in real time
    * Normal EC2 get: 5ms
    * 2i EC2 get: 2000ms
    Fine for occasional background
    queries, not okay for queries
    on live requests.

    View Slide

  40. JS Map/Reduce is slow,
    easily overwhelmed.
    Slow, to be expected, so
    don’t use for live requests.
    JS VMs take a lot of RAM,
    limited quantity, you can
    run out very quickly. Riak
    currently doesn’t handle
    this well, but they’re
    working on it.

    View Slide

  41. LevelDB: More Levels, More Pain
    Each additional level adds a
    disk seek, which is killer in
    the cloud.
    We use it because we need 2I.
    In EC2 ephemeral, each
    additional disk seek adds
    about 10ms

    View Slide

  42. Riak Control
    Unusable with slow internet
    connection due to PJAX bullshit.
    Really bad for Ops people on the
    road (MiFis, international, etc.).
    Otherwise great.
    Basho is aware of the problem.
    Requires low-latency connection

    View Slide

  43. Operational Issues, Part 1

    View Slide

  44. Operational Issues, Part 2
    • Cluster state under exceptional conditions
    doesn’t converge.
    • Add/Remove the same node many times
    (usually do to automation craziness)
    • EC2 partial node failures + LevelDB?

    View Slide

  45. Killing MongoDB
    So much fire.

    View Slide

  46. Non K/V Data
    • Not fast growing
    • Rich querying needed
    • Solution: PostgreSQL
    • Highly recommended.

    View Slide

  47. Geo
    • We actually still use MongoDB, for now.
    • Will move to PostGIS eventually.
    • Not high pressure, low priority.

    View Slide

  48. Closing Remarks
    • Scaling is hard
    • Nothing is a magic bullet
    • Look for easy wins that matter.
    • Rinse and repeat, converge to a scalable
    system.

    View Slide

  49. Closing Remarks
    For horizontally scalable key/value data,
    Riak is the right choice.

    View Slide

  50. Thanks!
    Q/A?

    View Slide