Lessons in Scalability - Speaker Deck

Slide 1

Slide 1 text

LESSONS GYAAN IN SCALABILITY

Slide 2

Slide 2 text

ABOUT ME MANISH GILL @MGILL25

Slide 3

Slide 3 text

LETS CREATE AN ANALYTICS SYSTEM

Slide 4

Slide 4 text

CREATING AN ANALYTICS SYSTEM ▸ Track and Report Website Trafﬁc ▸ Identify users ▸ Deﬁne goals ▸ Behavior Analysis ▸ Retroactive queries ▸ Realtime?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

YOUR TRAFFIC IS NOW MY PROBLEM

Slide 7

Slide 7 text

MODELING THE DATA ▸ Start with an RDBMS. Proven and reliable way of modeling schemas. ▸ Central facts table centered around the User ▸ Joined with other tables to do analytical queries ▸ Number of hits, Number of conversions and so on

Slide 8

Slide 8 text

DEEP DIVING INTO USER BEHAVIOR ▸ Clickstream data - HTML + CSS data capture and visualization ▸ Record an entire user session on site and then replay it later. ▸ “Send an HTTP call when the user moves the mouse.” ▸ Considerable increase in volume

Slide 9

Slide 9 text

VISUALIZING CLICKSTREAM

Slide 10

Slide 10 text

YOUR TRAFFIC IS NOW MAKING ME MISERABLE

Slide 11

Slide 11 text

ITERATION #1 ▸ mkdir my_fancy_analytics_engine ▸ git init ▸ vim app.py ▸ “Database? Lemme just use sqlite for now…”

Slide 12

Slide 12 text

PERILS OF SQLITE ▸ Minimal Concurrency Support. Only for Reading. ▸ Database locking ▸ “When any process wants to write, it must lock the entire database ﬁle for the duration of its update.” ▸ This results in direct data loss. RIP

Slide 13

Slide 13 text

DATABASE LOCKS IN ACTION TRANSACTION IN PROGRESS FAILING CONCURRENT INSERT

Slide 14

Slide 14 text

SCALING SQLITE? ▸ Maybe I can have a lot of sqlite DB ﬁles instead of one. ▸ Divide them up based on customer_id ▸ Divide them up even further based on time ▸ Daily is too frequent, monthly too infrequent. ▸ account_id/week_{week_number}.db looks good

Slide 15

Slide 15 text

MEH ▸ My locking problem has still not been ﬁxed ▸ Too primitive and bare-bones. ▸ I really don’t wanna manage 10k database ﬁles myself. ▸ Please don’t do this.

Slide 16

Slide 16 text

THINK ABOUT PERF FROM THE START LESSON #1

Slide 17

Slide 17 text

ITERATION #2 ▸ PostgreSQL! ▸ SQL standard compliant, good JSON support. ▸ Parallel query execution ▸ MVCC - Each process has its own snapshot of the DB ▸ Before implementing it, lets do some benchmarking.

Slide 18

Slide 18 text

DETOUR - BENCHMARKING ▸ Isn’t an exact science. ▸ Wrong: “My code takes n seconds to run” ▸ Kinda sorta right: ▸ “My code takes n seconds to run on a 4 core CPU, 12Gig RAM machine which has no other processes running, and has an SSD capable of reaching 4k maximum IOPS. I can optimize it to insert k messages per second.” ▸ TL;DR: It’s complicated.

Slide 19

Slide 19 text

SYSTEM I/O

Slide 20

Slide 20 text

BENCHMARKING TIPS ▸ Keep it close to the real use case if you can. ▸ Utilities like iostat, iotop, htop, ftop are your friends. ▸ Tweak a conﬁg and see the impact. ▸ Don’t use resource-heavy monitoring tools. ▸ Don’t believe blog posts! Do your own benchmarking.

Slide 21

Slide 21 text

DO EXTENSIVE BENCHMARKING LESSON #2

Slide 22

Slide 22 text

PORTING LEGACY CODE ▸ “Sqlite to Postgres migration should be simple!” ▸ Nope. SQL is just one part of the equation. Different systems work differently. ▸ Still easier than re-writing for a NoSQL system from scratch.

Slide 23

Slide 23 text

BACK TO ITERATION #2 ▸ We now have a system that is ▸ Concurrent. No more database locks. Yay! ▸ Keeps data consistent. ▸ Handles a decent amount of load. ▸ What can go wrong?

Slide 24

Slide 24 text

PROBLEMS WITH ITERATION #2 ▸ Data insertion performance will drop in a few days or weeks. ▸ Or even hours during your load testing. ▸ Huge tables with hundreds of millions of rows. ▸ Worse: Large indexes on these tables.

Slide 25

Slide 25 text

INDEXES ARE A SCAM! ▸ Great for READ operations ▸ Suck for WRITE operations. 1 Write = 1 table write + 1 index update. ▸ On disk in PG if can’t store in its shared buffers. ▸ We’ll end up with higher disk I/O.

Slide 26

Slide 26 text

ITERATION #3 - IMPROVING PG ▸ Avoid indexes in the beginning. ▸ Partition one huge table into many smaller tables. PG offers partitioning based on inheritance. ▸ Divide up your data - ▸ random(data) ▸ my_custom_algorithm(data)

Slide 27

Slide 27 text

TABLE PARTITIONING IN POSTGRESQL MASTER TABLE CHILD 3 CHILD 2 CHILD 1

Slide 28

Slide 28 text

QUEUES TO MANAGE VOLUME - KAFKA https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about- real-time-datas-unifying

Slide 29

Slide 29 text

USE QUEUES PARTITION YOUR DATA LESSON #3

Slide 30

Slide 30 text

ITERATION #4 ▸ We now have our DB running perfectly well…for a while. ▸ What is going to happen really soon?

Slide 31

Slide 31 text

ITERATION #4 ▸ Run out of disk space! 96% 4%

Slide 32

Slide 32 text

SHARDING ▸ Shard means “a small part of a whole”. ▸ How to choose where should an incoming message go? ▸ database_node = (message_id % 2) + 1 ▸ database_node = random(1, 2)

Slide 33

Slide 33 text

SHARDING NODE 1 NODE 2 MSG

Slide 34

Slide 34 text

SHARDING IS UNAVOIDABLE AT SCALE LESSON #4

Slide 35

Slide 35 text

ITERATION #5 ▸ Our shard can still die. ▸ We should at least have read queries enabled for customer reporting. ▸ Replication (Hot Standby)

Slide 36

Slide 36 text

REPLICATION ▸ Your primary DB should always have a slave running with it. ▸ The slave becomes master when the old master dies. ▸ Question: What happens when the old master comes back to life? We now have 2 masters and no slaves!

Slide 37

Slide 37 text

REPLICATION MASTER 1 MASTER 2 MSG SLAVE 2 SLAVE 1

Slide 38

Slide 38 text

USE REPLICATION LESSON #5

Slide 39

Slide 39 text

ITERATION #6 ▸ We now have a cluster of DB nodes ▸ Each node has a replica ▸ We are HA or “Highly Available”. ▸ Can something still go wrong? Why do you hate me?

Slide 40

Slide 40 text

ITERATION #6 ▸ On master machine, run the following: ▸ “DROP DATABASE” ▸ What happens to slave ? It friggin’ deletes everything too. ▸ Slaves are Dumb!

Slide 41

Slide 41 text

BACKUPS! ▸ Replicas are not backups ▸ Take regular backups of your entire database. ▸ PG offers fantastic support for base backups + WAL archiving. ▸ An untested backup is no backup at all.

Slide 42

Slide 42 text

BACKUPS ARE GOING TO SAVE YOU LESSON #6

Slide 43

Slide 43 text

“PREMATURE OPTIMIZATION IS THE ROOT OF ALL EVIL” Donald Knuth

Slide 44

Slide 44 text

“THAT KNUTH QUOTE IS USELESS” Manish Gill

Slide 45

Slide 45 text

1. THINK ABOUT PERF FROM THE START 2. DO EXTENSIVE BENCHMARKING 3. USE QUEUES AND PARTITION YOUR DATA 4. SHARDING IS UNAVOIDABLE AT SCALE 5. USE REPLICATION 6. BACKUPS ARE GOING TO SAVE YOU LESSONS IN SCALABILITY

Slide 46

Slide 46 text

QUESTIONS?