Lessons in Scalability

LESSONS GYAAN IN SCALABILITY

ABOUT ME MANISH GILL @MGILL25

LETS CREATE AN ANALYTICS SYSTEM

CREATING AN ANALYTICS SYSTEM ▸ Track and Report Website Trafﬁc
▸ Identify users ▸ Deﬁne goals ▸ Behavior Analysis ▸ Retroactive queries ▸ Realtime?

YOUR TRAFFIC IS NOW MY PROBLEM

MODELING THE DATA ▸ Start with an RDBMS. Proven and
reliable way of modeling schemas. ▸ Central facts table centered around the User ▸ Joined with other tables to do analytical queries ▸ Number of hits, Number of conversions and so on

DEEP DIVING INTO USER BEHAVIOR ▸ Clickstream data - HTML
+ CSS data capture and visualization ▸ Record an entire user session on site and then replay it later. ▸ “Send an HTTP call when the user moves the mouse.” ▸ Considerable increase in volume

VISUALIZING CLICKSTREAM

YOUR TRAFFIC IS NOW MAKING ME MISERABLE

ITERATION #1 ▸ mkdir my_fancy_analytics_engine ▸ git init ▸ vim
app.py ▸ “Database? Lemme just use sqlite for now…”

PERILS OF SQLITE ▸ Minimal Concurrency Support. Only for Reading.
▸ Database locking ▸ “When any process wants to write, it must lock the entire database ﬁle for the duration of its update.” ▸ This results in direct data loss. RIP

DATABASE LOCKS IN ACTION TRANSACTION IN PROGRESS FAILING CONCURRENT INSERT

SCALING SQLITE? ▸ Maybe I can have a lot of
sqlite DB ﬁles instead of one. ▸ Divide them up based on customer_id ▸ Divide them up even further based on time ▸ Daily is too frequent, monthly too infrequent. ▸ account_id/week_{week_number}.db looks good

MEH ▸ My locking problem has still not been ﬁxed
▸ Too primitive and bare-bones. ▸ I really don’t wanna manage 10k database ﬁles myself. ▸ Please don’t do this.

THINK ABOUT PERF FROM THE START LESSON #1

ITERATION #2 ▸ PostgreSQL! ▸ SQL standard compliant, good JSON
support. ▸ Parallel query execution ▸ MVCC - Each process has its own snapshot of the DB ▸ Before implementing it, lets do some benchmarking.

DETOUR - BENCHMARKING ▸ Isn’t an exact science. ▸ Wrong:
“My code takes n seconds to run” ▸ Kinda sorta right: ▸ “My code takes n seconds to run on a 4 core CPU, 12Gig RAM machine which has no other processes running, and has an SSD capable of reaching 4k maximum IOPS. I can optimize it to insert k messages per second.” ▸ TL;DR: It’s complicated.

SYSTEM I/O

BENCHMARKING TIPS ▸ Keep it close to the real use
case if you can. ▸ Utilities like iostat, iotop, htop, ftop are your friends. ▸ Tweak a conﬁg and see the impact. ▸ Don’t use resource-heavy monitoring tools. ▸ Don’t believe blog posts! Do your own benchmarking.

DO EXTENSIVE BENCHMARKING LESSON #2

PORTING LEGACY CODE ▸ “Sqlite to Postgres migration should be
simple!” ▸ Nope. SQL is just one part of the equation. Different systems work differently. ▸ Still easier than re-writing for a NoSQL system from scratch.

BACK TO ITERATION #2 ▸ We now have a system
that is ▸ Concurrent. No more database locks. Yay! ▸ Keeps data consistent. ▸ Handles a decent amount of load. ▸ What can go wrong?

PROBLEMS WITH ITERATION #2 ▸ Data insertion performance will drop
in a few days or weeks. ▸ Or even hours during your load testing. ▸ Huge tables with hundreds of millions of rows. ▸ Worse: Large indexes on these tables.

INDEXES ARE A SCAM! ▸ Great for READ operations ▸
Suck for WRITE operations. 1 Write = 1 table write + 1 index update. ▸ On disk in PG if can’t store in its shared buffers. ▸ We’ll end up with higher disk I/O.

ITERATION #3 - IMPROVING PG ▸ Avoid indexes in the
beginning. ▸ Partition one huge table into many smaller tables. PG offers partitioning based on inheritance. ▸ Divide up your data - ▸ random(data) ▸ my_custom_algorithm(data)

TABLE PARTITIONING IN POSTGRESQL MASTER TABLE CHILD 3 CHILD 2
CHILD 1

QUEUES TO MANAGE VOLUME - KAFKA https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about- real-time-datas-unifying

USE QUEUES PARTITION YOUR DATA LESSON #3

ITERATION #4 ▸ We now have our DB running perfectly
well…for a while. ▸ What is going to happen really soon?

ITERATION #4 ▸ Run out of disk space! 96% 4%

SHARDING ▸ Shard means “a small part of a whole”.
▸ How to choose where should an incoming message go? ▸ database_node = (message_id % 2) + 1 ▸ database_node = random(1, 2)

SHARDING NODE 1 NODE 2 MSG

SHARDING IS UNAVOIDABLE AT SCALE LESSON #4

ITERATION #5 ▸ Our shard can still die. ▸ We
should at least have read queries enabled for customer reporting. ▸ Replication (Hot Standby)

REPLICATION ▸ Your primary DB should always have a slave
running with it. ▸ The slave becomes master when the old master dies. ▸ Question: What happens when the old master comes back to life? We now have 2 masters and no slaves!

REPLICATION MASTER 1 MASTER 2 MSG SLAVE 2 SLAVE 1

USE REPLICATION LESSON #5

ITERATION #6 ▸ We now have a cluster of DB
nodes ▸ Each node has a replica ▸ We are HA or “Highly Available”. ▸ Can something still go wrong? Why do you hate me?

ITERATION #6 ▸ On master machine, run the following: ▸
“DROP DATABASE” ▸ What happens to slave ? It friggin’ deletes everything too. ▸ Slaves are Dumb!

BACKUPS! ▸ Replicas are not backups ▸ Take regular backups
of your entire database. ▸ PG offers fantastic support for base backups + WAL archiving. ▸ An untested backup is no backup at all.

BACKUPS ARE GOING TO SAVE YOU LESSON #6

“PREMATURE OPTIMIZATION IS THE ROOT OF ALL EVIL” Donald Knuth

“THAT KNUTH QUOTE IS USELESS” Manish Gill

1. THINK ABOUT PERF FROM THE START 2. DO EXTENSIVE
BENCHMARKING 3. USE QUEUES AND PARTITION YOUR DATA 4. SHARDING IS UNAVOIDABLE AT SCALE 5. USE REPLICATION 6. BACKUPS ARE GOING TO SAVE YOU LESSONS IN SCALABILITY

QUESTIONS?

Lessons in Scalability

Lessons in Scalability

Other Decks in Technology

Featured

Transcript