Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons in Scalability

Manish Gill
September 09, 2017

Lessons in Scalability

In this talk I present some good fundamental principles that you should follow in your journey towards achieving scalability

Manish Gill

September 09, 2017
Tweet

Other Decks in Technology

Transcript

  1. LESSONS GYAAN IN SCALABILITY

    View Slide

  2. ABOUT ME
    MANISH GILL
    @MGILL25

    View Slide

  3. LETS CREATE AN ANALYTICS SYSTEM

    View Slide

  4. CREATING AN ANALYTICS SYSTEM
    ▸ Track and Report Website Traffic
    ▸ Identify users
    ▸ Define goals
    ▸ Behavior Analysis
    ▸ Retroactive queries
    ▸ Realtime?

    View Slide

  5. View Slide

  6. YOUR TRAFFIC IS NOW MY PROBLEM

    View Slide

  7. MODELING THE DATA
    ▸ Start with an RDBMS. Proven and reliable way of modeling schemas.
    ▸ Central facts table centered around the User
    ▸ Joined with other tables to do analytical queries
    ▸ Number of hits, Number of conversions and so on

    View Slide

  8. DEEP DIVING INTO USER BEHAVIOR
    ▸ Clickstream data - HTML + CSS data capture and visualization
    ▸ Record an entire user session on site and then replay it later.
    ▸ “Send an HTTP call when the user moves the mouse.”
    ▸ Considerable increase in volume

    View Slide

  9. VISUALIZING CLICKSTREAM

    View Slide

  10. YOUR TRAFFIC IS NOW MAKING ME MISERABLE

    View Slide

  11. ITERATION #1
    ▸ mkdir my_fancy_analytics_engine
    ▸ git init
    ▸ vim app.py
    ▸ “Database? Lemme just use sqlite for now…”

    View Slide

  12. PERILS OF SQLITE
    ▸ Minimal Concurrency Support. Only for Reading.
    ▸ Database locking
    ▸ “When any process wants to write, it must lock the entire
    database file for the duration of its update.”
    ▸ This results in direct data loss. RIP

    View Slide

  13. DATABASE LOCKS IN ACTION
    TRANSACTION IN PROGRESS
    FAILING CONCURRENT INSERT

    View Slide

  14. SCALING SQLITE?
    ▸ Maybe I can have a lot of sqlite DB files instead of one.
    ▸ Divide them up based on customer_id
    ▸ Divide them up even further based on time
    ▸ Daily is too frequent, monthly too infrequent.
    ▸ account_id/week_{week_number}.db looks good

    View Slide

  15. MEH
    ▸ My locking problem has still not been fixed
    ▸ Too primitive and bare-bones.
    ▸ I really don’t wanna manage 10k database files myself.
    ▸ Please don’t do this.

    View Slide

  16. THINK ABOUT PERF FROM THE
    START
    LESSON #1

    View Slide

  17. ITERATION #2
    ▸ PostgreSQL!
    ▸ SQL standard compliant, good JSON support.
    ▸ Parallel query execution
    ▸ MVCC - Each process has its own snapshot of the DB
    ▸ Before implementing it, lets do some benchmarking.

    View Slide

  18. DETOUR - BENCHMARKING
    ▸ Isn’t an exact science.
    ▸ Wrong: “My code takes n seconds to run”
    ▸ Kinda sorta right:
    ▸ “My code takes n seconds to run on a 4 core CPU, 12Gig
    RAM machine which has no other processes running, and
    has an SSD capable of reaching 4k maximum IOPS. I can
    optimize it to insert k messages per second.”
    ▸ TL;DR: It’s complicated.

    View Slide

  19. SYSTEM I/O

    View Slide

  20. BENCHMARKING TIPS
    ▸ Keep it close to the real use case if you can.
    ▸ Utilities like iostat, iotop, htop, ftop are your friends.
    ▸ Tweak a config and see the impact.
    ▸ Don’t use resource-heavy monitoring tools.
    ▸ Don’t believe blog posts! Do your own benchmarking.

    View Slide

  21. DO EXTENSIVE BENCHMARKING
    LESSON #2

    View Slide

  22. PORTING LEGACY CODE
    ▸ “Sqlite to Postgres migration should be simple!”
    ▸ Nope. SQL is just one part of the equation. Different
    systems work differently.
    ▸ Still easier than re-writing for a NoSQL system from
    scratch.

    View Slide

  23. BACK TO ITERATION #2
    ▸ We now have a system that is
    ▸ Concurrent. No more database locks. Yay!
    ▸ Keeps data consistent.
    ▸ Handles a decent amount of load.
    ▸ What can go wrong?

    View Slide

  24. PROBLEMS WITH ITERATION #2
    ▸ Data insertion performance will drop in a few days or
    weeks.
    ▸ Or even hours during your load testing.
    ▸ Huge tables with hundreds of millions of rows.
    ▸ Worse: Large indexes on these tables.

    View Slide

  25. INDEXES ARE A SCAM!
    ▸ Great for READ operations
    ▸ Suck for WRITE operations. 1 Write = 1 table write + 1
    index update.
    ▸ On disk in PG if can’t store in its shared buffers.
    ▸ We’ll end up with higher disk I/O.

    View Slide

  26. ITERATION #3 - IMPROVING PG
    ▸ Avoid indexes in the beginning.
    ▸ Partition one huge table into many smaller tables. PG
    offers partitioning based on inheritance.
    ▸ Divide up your data -
    ▸ random(data)
    ▸ my_custom_algorithm(data)

    View Slide

  27. TABLE PARTITIONING IN POSTGRESQL
    MASTER TABLE
    CHILD 3
    CHILD 2
    CHILD 1

    View Slide

  28. QUEUES TO MANAGE VOLUME - KAFKA
    https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-
    real-time-datas-unifying

    View Slide

  29. USE QUEUES
    PARTITION YOUR DATA
    LESSON #3

    View Slide

  30. ITERATION #4
    ▸ We now have our DB running perfectly well…for a while.
    ▸ What is going to happen really soon?

    View Slide

  31. ITERATION #4
    ▸ Run out of disk space!
    96%
    4%

    View Slide

  32. SHARDING
    ▸ Shard means “a small part of a whole”.
    ▸ How to choose where should an incoming message go?
    ▸ database_node = (message_id % 2) + 1
    ▸ database_node = random(1, 2)

    View Slide

  33. SHARDING
    NODE 1
    NODE 2
    MSG

    View Slide

  34. SHARDING IS UNAVOIDABLE AT
    SCALE
    LESSON #4

    View Slide

  35. ITERATION #5
    ▸ Our shard can still die.
    ▸ We should at least have read queries enabled for
    customer reporting.
    ▸ Replication (Hot Standby)

    View Slide

  36. REPLICATION
    ▸ Your primary DB should always have a slave running with
    it.
    ▸ The slave becomes master when the old master dies.
    ▸ Question: What happens when the old master comes back
    to life? We now have 2 masters and no slaves!

    View Slide

  37. REPLICATION
    MASTER
    1
    MASTER
    2
    MSG
    SLAVE
    2
    SLAVE
    1

    View Slide

  38. USE REPLICATION
    LESSON #5

    View Slide

  39. ITERATION #6
    ▸ We now have a cluster of DB nodes
    ▸ Each node has a replica
    ▸ We are HA or “Highly Available”.
    ▸ Can something still go wrong? Why do you hate me?

    View Slide

  40. ITERATION #6
    ▸ On master machine, run the following:
    ▸ “DROP DATABASE”
    ▸ What happens to slave ? It friggin’ deletes everything too.
    ▸ Slaves are Dumb!

    View Slide

  41. BACKUPS!
    ▸ Replicas are not backups
    ▸ Take regular backups of your entire database.
    ▸ PG offers fantastic support for base backups + WAL
    archiving.
    ▸ An untested backup is no backup at all.

    View Slide

  42. BACKUPS ARE GOING TO SAVE
    YOU
    LESSON #6

    View Slide

  43. “PREMATURE OPTIMIZATION IS
    THE ROOT OF ALL EVIL”
    Donald Knuth

    View Slide

  44. “THAT KNUTH QUOTE IS
    USELESS”
    Manish Gill

    View Slide

  45. 1. THINK ABOUT PERF FROM THE START
    2. DO EXTENSIVE BENCHMARKING
    3. USE QUEUES AND PARTITION YOUR DATA
    4. SHARDING IS UNAVOIDABLE AT SCALE
    5. USE REPLICATION
    6. BACKUPS ARE GOING TO SAVE YOU
    LESSONS IN SCALABILITY

    View Slide

  46. QUESTIONS?

    View Slide