Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning to Build Distributed Systems the Hard Way 

8c21306523b16ba5dd35c3549bf90994?s=47 Theo Hultberg
September 20, 2012

Learning to Build Distributed Systems the Hard Way 

I’ve learned how to build distributed systems the hard way; I’ve failed, and failed again. I’ve made many of the common mistakes and tried a few other things that turned out to be a disappointment. You shouldn't have to make those mistakes too. In this talk I'll tell the story of how I built a real time advertising analytics platform that tracks and reports on millions of impressions every day, and all the things I did wrong before I got it to work. I’ll also tell you what I did right, and the choices I don’t regret.

8c21306523b16ba5dd35c3549bf90994?s=128

Theo Hultberg

September 20, 2012
Tweet

Transcript

  1. LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY @iconara

  2. speakerdeck.com/u/iconara (real time!)

  3. Theo / @iconara

  4. Chief Architect at

  5. let’s make online advertising a great experience

  6. MAKING THIS

  7. INTO THIS

  8. HOW HARD CAN IT BE?

  9. TRACKING AD IMPRESSIONS track page views and all their ads

    track visibility and send updates on changes track events, track activity, sync cookies, and track visits
  10. track page views and all their ads track visibility and

    send updates on changes track events, track activity, sync cookies, and track visits LOADED VISIBLE HIDDEN VISIBLE LOADED
  11. ASSEMBLING SESSIONS assemble ad impressions, page views and visits, to

    be able to calculate things like total visible duration mix in demographics, revenue, and third-party data
  12. assemble ad impressions, page views and visits, to be able

    to calculate things like total visible duration mix in demographics, revenue, and third-party data WAS LOADED BECAME ACTIVE BECAME VISIBLE WAS HIDDEN BECAME VISIBLE AGAIN A CLICK! { "user_id": "M9L6R5TD0YXK", "session_id": "MAI3QAGNAIYT", "timestamp": 1347896675038, "placement_name": "example", "category": "frontpage", "embed_url": "http://example.com/", "visible_duration": 1340 "browser": "Chrome", "device_type": "computer", "click": true, "ad_dimensions":"980x300" } 3rd PARTY DATA & OTHER GOODIES
  13. ANALYTICS precompute metrics, count uniques, build visitor histories for attribution

  14. precompute metrics, count uniques, build visitor histories for attribution

  15. HOW HARD CAN IT BE?

  16. 25K REQUESTS PER SECOND ~1 billion requests per day, 1

    TB raw data
  17. ONE VISIT CAN CHANGE UP TO 100K COUNTERS hundreds of

    millions of individual counters per day, plus counting uniques and visitor histories
  18. IN REAL TIME or near real time, if you want

    to be pedantic ×
  19. START WITH TWO OF EVERYTHING going from one to two

    is the hardest
  20. GIVE A LOT OF THOUGHT TO YOUR KEYS AND IDS

    it will save you lots of pain
  21. MANLO0 JME57Z monotonically increasing, sorts nicely a timestamp something random

  22. JME57Z MANLO0 uniformly distributed, works nicely with sharding something random

    a timestamp
  23. PUT BUFFERS BETWEEN LAYERS queues can even out peaks, let

    you scale layers independently, and let you restart services without loosing data
  24. SEPARATE PROCESSING FROM STORAGE that way you can scale each

    independently
  25. PLAN HOW TO GET RID OF YOUR DATA deleting stuff

    is harder than you might think × × × × × × ×
  26. NoDB keep things streaming ×

  27. STREAM PARTITIONING

  28. RANDOMLY when you have no interdependencies between things it’s easy

    to scale out (or round robin, it’s basically the same)
  29. CONSISTENTLY when there are interdependencies you need to route using

    some property of the objects, but make sure you get a uniform distribution
  30. NUMEROLOGY

  31. 12

  32. 2 | 12 3 | 12 4 | 12 6

    | 12
  33. 8 | 24 5 | 60

  34. 12, 60, 120, 360 superior highly composite numbers

  35. 12, 60, 120, 360 superior highly composite numbers

  36. 12, 60, 120, 360 superior highly composite numbers

  37. 12, 60, 120, 360 superior highly composite numbers

  38. 12, 60, 120, 360 superior highly composite numbers

  39. 12, 60, 120, 360 superior highly composite numbers

  40. 12, 60, 120, 360 superior highly composite numbers

  41. 12, 60, 120, 360 superior highly composite numbers

  42. for maximal flexibility partition with multiples of 12

  43. for maximal flexibility partition with multiples of 12

  44. A SHORT DIVERSION ABOUT COUNTING TO 60 the reason why

    there’s 60 seconds to a minute, and 360 degrees to a circle
  45. 3 SEGMENTS ON EACH FINGER = 12

  46. 3 SEGMENTS ON EACH FINGER = 12 FIVE FINGERS ON

    OTHER HAND = 60
  47. log2(366) ≈ 31

  48. $-$ (ASCII code 36)-----

  49. log2(366) ≈ 31

  50. log2(366) ≈ 31 six characters 0-9, A-Z can represent 31

    bits, which is kind of almost very close to four bytes
  51. MANLO0

  52. MANLO0 a timestamp Time.now.to_i.to_s(36).upcase

  53. DO YOU REALLY NEED A BACKUP? if you got 3x

    replication over multiple availability zones, is that backup really worth it?
  54. PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT when thousands of

    things happen every second, new, weird and unforeseen things happen all the time, your tests can only cover the foreseeable =
  55. KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com

  56. COME TO SWEDEN IN MARCH AND TALK ABOUT BIG DATA

    scandevconf.se/2013/call-for-proposals
  57. IDEMPOTENCE

  58. f(f(x)) = f(x) doing something again doesn’t change the outcome

  59. IDEMPOTENCE if you don’t have to worry about things accidentally

    happening twice, everything becomes much simpler
  60. COUNTING UNIQUES when adding to a set it doesn’t matter

    how many times you do it, the end result is the same
  61. INC X VS SET X increments are not idempotent, and

    very scary, if you can avoid non-idempotent operations, try
  62. KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com