Slide 1

Slide 1 text

Scaling the Web: Databases & NoSQL Richard Schneeman @schneems works for @Gowalla Wed Nov 10 2011

Slide 2

Slide 2 text

whoami • @Schneems • BSME with Honors from Georgia Tech • 5 + years experience Ruby & Rails • Work for @Gowalla • Rails 3.1 contributor : ) • 3 + years technical teaching

Slide 3

Slide 3 text

Traffic

Slide 4

Slide 4 text

Compounding Traffic ex. Wikipedia

Slide 5

Slide 5 text

Compounding Traffic ex. Wikipedia

Slide 6

Slide 6 text

Gowalla

Slide 7

Slide 7 text

Gowalla • 50 best websites NYTimes 2010 • Founded 2009 @ SXSW • 1 million+ Users • Undisclosed Visitors • Loves/highlights/comments/stories/guides • Facebook/Foursquare/Twitter integration • iphone/android/web apps • public API

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Gowalla Backend • Ruby on Rails • Uses the Ruby Language • Rails is the Framework

Slide 10

Slide 10 text

The Web is Data • Username => String • Birthday => Int/ Int/ Int • Blog Post => Text • Image => Binary-file/blob Data needs to be stored to be useful

Slide 11

Slide 11 text

Database

Slide 12

Slide 12 text

Gowalla Database • PostgreSQL • Relational (RDBMS) • Open Source • Competitor to MySQL • ACID compliant • Running on a Dedicated Managed Server

Slide 13

Slide 13 text

Need for Speed • Throughput: • The number of operations per minute that can be performed • Pure Speed: • How long an individual operation takes.

Slide 14

Slide 14 text

Potential Problems • Hardware • Slow Network • Slow hard-drive • Insufficient CPU • Insufficient Ram • Software • too many Reads • too many Writes

Slide 15

Slide 15 text

Scaling Up versus Out • Scale Up: • More CPU, Bigger HD, More Ram etc. • Scale Out: • More machines • More machines • More machines • ...

Slide 16

Slide 16 text

Scale Up • Bigger faster machine • More Ram • More CPU • Bigger ethernet bus • ... • Moores Law • Diminishing returns

Slide 17

Slide 17 text

Scale Out • Forget Moores law... • Add more nodes • Master/ Slave Database • Sharding

Slide 18

Slide 18 text

Master DB Slave DB Slave DB Slave DB Slave DB Write Copy Read Master/Slave

Slide 19

Slide 19 text

Master & Slave +/- • Pro • Increased read speed • Takes read load off of master • Allows us to Join across all tables • Con • Doesn’t buy increased write throughput • Single Point of Failure in Master Node

Slide 20

Slide 20 text

Users in USA Read Sharding Write Users in Europe Users in Asia Users in Africa

Slide 21

Slide 21 text

Sharding +/- • Pro • Increased Write & Read throughput • No Single Point of failure • Individual features can fail • Con • Cannot Join queries between shards

Slide 22

Slide 22 text

What is a Database? • Relational Database Managment System (RDBMS) • Stores Data Using Schema • A.C.I.D. compliant • Atomic • Consistent • Isolated • Durable

Slide 23

Slide 23 text

RDBMS • Relational • Matches data on common characteristics in data • Enables “Join” & “Union” queries • Makes data modular

Slide 24

Slide 24 text

Relational +/- • Pros • Data is modular • Highly flexible data layout • Cons • Getting desired data can be tricky • Over modularization leads to many join queries • Trade off performance for search-ability

Slide 25

Slide 25 text

Schema Storage • Blueprint for data storage • Break data into tables/columns/rows • Give data types to your data • Integer • String • Text • Boolean • ...

Slide 26

Slide 26 text

Schema +/- • Pros • Regularize our data • Helps keep data consistent • Converts to programming “types” easily • Cons • Must seperatly manage schema • Adding columns & indexes to existing large tables can be painful & slow

Slide 27

Slide 27 text

ACID • Properties that guarante a database transaction are processed reliably • Atomic • Consistent • Isolated • Durable

Slide 28

Slide 28 text

ACID • Atomic • Any database Transaction is all or nothing. • If one part of the transaction fails it all fails “An Incomplete Transaction Cannot Exist”

Slide 29

Slide 29 text

ACID • Consistent • Any transaction will take the database from one consistent state to another “Only Consistent data is allowed to be written”

Slide 30

Slide 30 text

ACID • Isolated • No transaction should be able to interfere with another transaction “the same field cannot be updated by two sources at the exact same time” a = 0 a += 1 a += 2 } a = ??

Slide 31

Slide 31 text

ACID • Durable • Once a transaction Is committed it will stay that way “Save it once, read it forever”

Slide 32

Slide 32 text

What is a Database? • RDBMS • Relational • Flexible • Has a schema • Most likely ACID compliant • Typically fast under low load or when optimized

Slide 33

Slide 33 text

What is SQL? • Structured Query Language • The language databases speak • Based on relational algebra • Insert • Query • Update • Delete “SELECT Company, Country FROM Customers WHERE Country = 'USA' ”

Slide 34

Slide 34 text

Why people <3 SQL • Relational algebra is powerful • SQL is proven • well understood • well documented

Slide 35

Slide 35 text

Why people 3 SQL • Relational algebra Is hard • Different databases support different SQL syntax • Yet another programming language to learn

Slide 36

Slide 36 text

SQL != Database • SQL is used to talk to a RDBMS (database) • SQL is not a RDBMS

Slide 37

Slide 37 text

What is NoSQL? Not A Relational Database

Slide 38

Slide 38 text

RDBMS

Slide 39

Slide 39 text

Types of NoSQL • Distributed Systems • Document Store • Graph Database • Key-Value Store • Eventually Consistent Systems Mix And Match ↑

Slide 40

Slide 40 text

Key Value Stores • Non Relational • Typically No Schema • Map one Key (a string) to a Value (some object) Example: Redis

Slide 41

Slide 41 text

Key Value Example redis = Redis.new redis.set(“foo”, “bar”) redis.get(“foo”) >> “bar”

Slide 42

Slide 42 text

Key Value Example redis = Redis.new redis.set(“foo”, “bar”) redis.get(“foo”) >> “bar” Key Value Key Value

Slide 43

Slide 43 text

Key Value • Like a databse that can only ever use primary Key (id) YES select * from users where id = ‘3’; NO select * from users where name = ‘schneems’;

Slide 44

Slide 44 text

NoSQL @ Gowalla • Redis (key-value store) • Store “Likes” & Analytics • Memcache (key-value store) • Cache Database results • Cassandra • (eventually consistent, with-schema, key value store) • Store “feeds” or “timelines” • Solr (search index)

Slide 45

Slide 45 text

Memcache • Key-Value Store • Open Source • Distributed • In memory (ram) only • fast, but volatile • Not ACID • Memory object caching system

Slide 46

Slide 46 text

Memcache Example memcache = Memcache.new memcache.set(“foo”, “bar”) memcache.get(“foo”) >> “bar”

Slide 47

Slide 47 text

Memcache • Can store whole objects memcache = Memcache.new user = User.where(:username => “schneems”) memcache.set(“user:3”, user) user_from_cache = memcache.get(“user:3”) user_from_cache == user >> true user_from_cache.username >> “Schneems”

Slide 48

Slide 48 text

Memcache @ Gowalla • Cache Common Queries • Decreases Load on DB (postgres) • Enables higher throughput from DB • Faster response than DB • Users see quicker page load time

Slide 49

Slide 49 text

What to Cache? • Objects that change infrequently • users • spots (places) • etc. • Expensive(ish) sql queries • Friend ids for users • User ids for people visiting spots • etc.

Slide 50

Slide 50 text

Memcache Distributed B C A

Slide 51

Slide 51 text

Memcache Distributed B C A Easily add more nodes D

Slide 52

Slide 52 text

Memcache <3’s DB • We use them Together • If memcache doesn’t have a value • Fetch from the database • Set the key from database • Hard • Cache Invalidation : (

Slide 53

Slide 53 text

Redis • Key Value Store • Open Source • Not Distributed (yet) • Extremely Quick • “Data structure server”

Slide 54

Slide 54 text

Redis Example, again redis = Redis.new redis.set(“foo”, “bar”) redis.get(“foo”) >> “bar”

Slide 55

Slide 55 text

Redis - Has Data Types • Strings • Hashes • Lists • Sets • Sorted Sets

Slide 56

Slide 56 text

Redis Example, sets redis = Redis.new redis.sadd(“foo”, “bar”) redis.members(“foo”) >> [“bar”] redis.sadd(“foo”, “fly”) redis.members(“foo”) >> [“bar”, “fly”]

Slide 57

Slide 57 text

Redis => Likeable • Very Fast response • ~ 50 queries per page view • ~ 1 ms per query • http://github.com/Gowalla/likeable

Slide 58

Slide 58 text

Cassandra • Open Source • Distributed • Key Value Store • Eventually Consistent • Sortof not ACID • Uses A Schema • ColumnFamilies

Slide 59

Slide 59 text

Cassandra Distributed B C A Eventual Consistency D Data In Copied To Extra Nodes ... Eventually

Slide 60

Slide 60 text

Cassandra @ Gowalla{ Activity Feeds

Slide 61

Slide 61 text

Cassandra @ Gowalla • Chronologic • http://github.com/Gowalla/chronologic

Slide 62

Slide 62 text

Should I use NoSQL?

Slide 63

Slide 63 text

Which One?

Slide 64

Slide 64 text

Pick the right tool

Slide 65

Slide 65 text

Tradeoffs • Every Data store has them • Know your data store • Strengths • Weaknesses

Slide 66

Slide 66 text

NoSQL vs. RDBMS • No Magic Bullet • Use Both!!! • Model data in a datastore you understand • Switch to when/if you need to • Understand Your Options

Slide 67

Slide 67 text

Questions? Richard Schneeman @schneems works for @Gowalla