Scaling the Web:
Databases &
NoSQL
Richard Schneeman
@schneems works for @Gowalla
Wed Nov 10
2011
Slide 2
Slide 2 text
whoami
• @Schneems
• BSME with Honors from Georgia Tech
• 5 + years experience Ruby & Rails
• Work for @Gowalla
• Rails 3.1 contributor : )
• 3 + years technical teaching
Slide 3
Slide 3 text
Traffic
Slide 4
Slide 4 text
Compounding Traffic
ex. Wikipedia
Slide 5
Slide 5 text
Compounding Traffic
ex. Wikipedia
Slide 6
Slide 6 text
Gowalla
Slide 7
Slide 7 text
Gowalla
• 50 best websites NYTimes 2010
• Founded 2009 @ SXSW
• 1 million+ Users
• Undisclosed Visitors
• Loves/highlights/comments/stories/guides
• Facebook/Foursquare/Twitter integration
• iphone/android/web apps
• public API
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
Gowalla Backend
• Ruby on Rails
• Uses the Ruby Language
• Rails is the Framework
Slide 10
Slide 10 text
The Web is Data
• Username => String
• Birthday => Int/ Int/ Int
• Blog Post => Text
• Image => Binary-file/blob
Data needs to be stored
to be useful
Slide 11
Slide 11 text
Database
Slide 12
Slide 12 text
Gowalla Database
• PostgreSQL
• Relational (RDBMS)
• Open Source
• Competitor to MySQL
• ACID compliant
• Running on a Dedicated Managed Server
Slide 13
Slide 13 text
Need for Speed
• Throughput:
• The number of operations per minute that
can be performed
• Pure Speed:
• How long an individual operation takes.
Slide 14
Slide 14 text
Potential Problems
• Hardware
• Slow Network
• Slow hard-drive
• Insufficient CPU
• Insufficient Ram
• Software
• too many Reads
• too many Writes
Slide 15
Slide 15 text
Scaling Up versus Out
• Scale Up:
• More CPU, Bigger HD, More Ram etc.
• Scale Out:
• More machines
• More machines
• More machines
• ...
Slide 16
Slide 16 text
Scale Up
• Bigger faster machine
• More Ram
• More CPU
• Bigger ethernet bus
• ...
• Moores Law
• Diminishing returns
Slide 17
Slide 17 text
Scale Out
• Forget Moores law...
• Add more nodes
• Master/ Slave Database
• Sharding
Slide 18
Slide 18 text
Master DB
Slave DB Slave DB Slave DB Slave DB
Write
Copy
Read
Master/Slave
Slide 19
Slide 19 text
Master & Slave +/-
• Pro
• Increased read speed
• Takes read load off of master
• Allows us to Join across all tables
• Con
• Doesn’t buy increased write throughput
• Single Point of Failure in Master Node
Slide 20
Slide 20 text
Users in
USA
Read
Sharding
Write
Users in
Europe
Users in
Asia
Users in
Africa
Slide 21
Slide 21 text
Sharding +/-
• Pro
• Increased Write & Read throughput
• No Single Point of failure
• Individual features can fail
• Con
• Cannot Join queries between shards
Slide 22
Slide 22 text
What is a Database?
• Relational Database Managment System
(RDBMS)
• Stores Data Using Schema
• A.C.I.D. compliant
• Atomic
• Consistent
• Isolated
• Durable
Slide 23
Slide 23 text
RDBMS
• Relational
• Matches data on common characteristics
in data
• Enables “Join” & “Union” queries
• Makes data modular
Slide 24
Slide 24 text
Relational +/-
• Pros
• Data is modular
• Highly flexible data layout
• Cons
• Getting desired data can be tricky
• Over modularization leads to many join
queries
• Trade off performance for search-ability
Slide 25
Slide 25 text
Schema Storage
• Blueprint for data storage
• Break data into tables/columns/rows
• Give data types to your data
• Integer
• String
• Text
• Boolean
• ...
Slide 26
Slide 26 text
Schema +/-
• Pros
• Regularize our data
• Helps keep data consistent
• Converts to programming “types” easily
• Cons
• Must seperatly manage schema
• Adding columns & indexes to existing
large tables can be painful & slow
Slide 27
Slide 27 text
ACID
• Properties that guarante a database
transaction are processed reliably
• Atomic
• Consistent
• Isolated
• Durable
Slide 28
Slide 28 text
ACID
• Atomic
• Any database Transaction is all or nothing.
• If one part of the transaction fails it all fails
“An Incomplete Transaction Cannot Exist”
Slide 29
Slide 29 text
ACID
• Consistent
• Any transaction will take the database
from one consistent state to another
“Only Consistent data is allowed to be
written”
Slide 30
Slide 30 text
ACID
• Isolated
• No transaction should be able to interfere
with another transaction
“the same field cannot be updated by two
sources at the exact same time”
a = 0
a += 1
a += 2
} a = ??
Slide 31
Slide 31 text
ACID
• Durable
• Once a transaction Is committed it will stay
that way
“Save it once, read it forever”
Slide 32
Slide 32 text
What is a Database?
• RDBMS
• Relational
• Flexible
• Has a schema
• Most likely ACID compliant
• Typically fast under low load or when
optimized
Slide 33
Slide 33 text
What is SQL?
• Structured Query Language
• The language databases speak
• Based on relational algebra
• Insert
• Query
• Update
• Delete
“SELECT Company, Country FROM Customers
WHERE Country = 'USA' ”
Slide 34
Slide 34 text
Why people <3 SQL
• Relational algebra is powerful
• SQL is proven
• well understood
• well documented
Slide 35
Slide 35 text
Why people 3 SQL
• Relational algebra Is hard
• Different databases support different SQL
syntax
• Yet another programming language to learn
Slide 36
Slide 36 text
SQL != Database
• SQL is used to talk to a RDBMS (database)
• SQL is not a RDBMS
Slide 37
Slide 37 text
What is NoSQL?
Not A
Relational
Database
Slide 38
Slide 38 text
RDBMS
Slide 39
Slide 39 text
Types of NoSQL
• Distributed Systems
• Document Store
• Graph Database
• Key-Value Store
• Eventually Consistent Systems
Mix And Match ↑
Slide 40
Slide 40 text
Key Value Stores
• Non Relational
• Typically No Schema
• Map one Key (a string) to a Value (some
object)
Example: Redis
Slide 41
Slide 41 text
Key Value Example
redis = Redis.new
redis.set(“foo”, “bar”)
redis.get(“foo”)
>> “bar”
Slide 42
Slide 42 text
Key Value Example
redis = Redis.new
redis.set(“foo”, “bar”)
redis.get(“foo”)
>> “bar”
Key Value
Key
Value
Slide 43
Slide 43 text
Key Value
• Like a databse that can only ever use
primary Key (id)
YES
select * from users where id = ‘3’;
NO
select * from users where name = ‘schneems’;
Slide 44
Slide 44 text
NoSQL @ Gowalla
• Redis (key-value store)
• Store “Likes” & Analytics
• Memcache (key-value store)
• Cache Database results
• Cassandra
• (eventually consistent, with-schema, key
value store)
• Store “feeds” or “timelines”
• Solr (search index)
Slide 45
Slide 45 text
Memcache
• Key-Value Store
• Open Source
• Distributed
• In memory (ram) only
• fast, but volatile
• Not ACID
• Memory object caching system
Slide 46
Slide 46 text
Memcache Example
memcache = Memcache.new
memcache.set(“foo”, “bar”)
memcache.get(“foo”)
>> “bar”
Slide 47
Slide 47 text
Memcache
• Can store whole objects
memcache = Memcache.new
user = User.where(:username => “schneems”)
memcache.set(“user:3”, user)
user_from_cache = memcache.get(“user:3”)
user_from_cache == user
>> true
user_from_cache.username
>> “Schneems”
Slide 48
Slide 48 text
Memcache @ Gowalla
• Cache Common Queries
• Decreases Load on DB (postgres)
• Enables higher throughput from DB
• Faster response than DB
• Users see quicker page load time
Slide 49
Slide 49 text
What to Cache?
• Objects that change infrequently
• users
• spots (places)
• etc.
• Expensive(ish) sql queries
• Friend ids for users
• User ids for people visiting spots
• etc.
Slide 50
Slide 50 text
Memcache Distributed
B
C
A
Slide 51
Slide 51 text
Memcache Distributed
B C
A
Easily add more nodes
D
Slide 52
Slide 52 text
Memcache <3’s DB
• We use them Together
• If memcache doesn’t have a value
• Fetch from the database
• Set the key from database
• Hard
• Cache Invalidation : (
Slide 53
Slide 53 text
Redis
• Key Value Store
• Open Source
• Not Distributed (yet)
• Extremely Quick
• “Data structure server”
Slide 54
Slide 54 text
Redis Example, again
redis = Redis.new
redis.set(“foo”, “bar”)
redis.get(“foo”)
>> “bar”
Slide 55
Slide 55 text
Redis - Has Data Types
• Strings
• Hashes
• Lists
• Sets
• Sorted Sets