Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling the Web: Databases & NoSQL

Scaling the Web: Databases & NoSQL

This is an introduction to relational and non-relational databases and how their performance affects to scaling a web application.

This is a recording of a guest Lecture I gave at the University of Texas school of Information.

In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.

Find more on my blog:
http://schneems.com

Richard Schneeman

November 10, 2011
Tweet

More Decks by Richard Schneeman

Other Decks in Programming

Transcript

  1. whoami • @Schneems • BSME with Honors from Georgia Tech

    • 5 + years experience Ruby & Rails • Work for @Gowalla • Rails 3.1 contributor : ) • 3 + years technical teaching
  2. Gowalla • 50 best websites NYTimes 2010 • Founded 2009

    @ SXSW • 1 million+ Users • Undisclosed Visitors • Loves/highlights/comments/stories/guides • Facebook/Foursquare/Twitter integration • iphone/android/web apps • public API
  3. Gowalla Backend • Ruby on Rails • Uses the Ruby

    Language • Rails is the Framework
  4. The Web is Data • Username => String • Birthday

    => Int/ Int/ Int • Blog Post => Text • Image => Binary-file/blob Data needs to be stored to be useful
  5. Gowalla Database • PostgreSQL • Relational (RDBMS) • Open Source

    • Competitor to MySQL • ACID compliant • Running on a Dedicated Managed Server
  6. Need for Speed • Throughput: • The number of operations

    per minute that can be performed • Pure Speed: • How long an individual operation takes.
  7. Potential Problems • Hardware • Slow Network • Slow hard-drive

    • Insufficient CPU • Insufficient Ram • Software • too many Reads • too many Writes
  8. Scaling Up versus Out • Scale Up: • More CPU,

    Bigger HD, More Ram etc. • Scale Out: • More machines • More machines • More machines • ...
  9. Scale Up • Bigger faster machine • More Ram •

    More CPU • Bigger ethernet bus • ... • Moores Law • Diminishing returns
  10. Scale Out • Forget Moores law... • Add more nodes

    • Master/ Slave Database • Sharding
  11. Master DB Slave DB Slave DB Slave DB Slave DB

    Write Copy Read Master/Slave
  12. Master & Slave +/- • Pro • Increased read speed

    • Takes read load off of master • Allows us to Join across all tables • Con • Doesn’t buy increased write throughput • Single Point of Failure in Master Node
  13. Sharding +/- • Pro • Increased Write & Read throughput

    • No Single Point of failure • Individual features can fail • Con • Cannot Join queries between shards
  14. What is a Database? • Relational Database Managment System (RDBMS)

    • Stores Data Using Schema • A.C.I.D. compliant • Atomic • Consistent • Isolated • Durable
  15. RDBMS • Relational • Matches data on common characteristics in

    data • Enables “Join” & “Union” queries • Makes data modular
  16. Relational +/- • Pros • Data is modular • Highly

    flexible data layout • Cons • Getting desired data can be tricky • Over modularization leads to many join queries • Trade off performance for search-ability
  17. Schema Storage • Blueprint for data storage • Break data

    into tables/columns/rows • Give data types to your data • Integer • String • Text • Boolean • ...
  18. Schema +/- • Pros • Regularize our data • Helps

    keep data consistent • Converts to programming “types” easily • Cons • Must seperatly manage schema • Adding columns & indexes to existing large tables can be painful & slow
  19. ACID • Properties that guarante a database transaction are processed

    reliably • Atomic • Consistent • Isolated • Durable
  20. ACID • Atomic • Any database Transaction is all or

    nothing. • If one part of the transaction fails it all fails “An Incomplete Transaction Cannot Exist”
  21. ACID • Consistent • Any transaction will take the database

    from one consistent state to another “Only Consistent data is allowed to be written”
  22. ACID • Isolated • No transaction should be able to

    interfere with another transaction “the same field cannot be updated by two sources at the exact same time” a = 0 a += 1 a += 2 } a = ??
  23. ACID • Durable • Once a transaction Is committed it

    will stay that way “Save it once, read it forever”
  24. What is a Database? • RDBMS • Relational • Flexible

    • Has a schema • Most likely ACID compliant • Typically fast under low load or when optimized
  25. What is SQL? • Structured Query Language • The language

    databases speak • Based on relational algebra • Insert • Query • Update • Delete “SELECT Company, Country FROM Customers WHERE Country = 'USA' ”
  26. Why people <3 SQL • Relational algebra is powerful •

    SQL is proven • well understood • well documented
  27. Why people </3 SQL • Relational algebra Is hard •

    Different databases support different SQL syntax • Yet another programming language to learn
  28. SQL != Database • SQL is used to talk to

    a RDBMS (database) • SQL is not a RDBMS
  29. Types of NoSQL • Distributed Systems • Document Store •

    Graph Database • Key-Value Store • Eventually Consistent Systems Mix And Match ↑
  30. Key Value Stores • Non Relational • Typically No Schema

    • Map one Key (a string) to a Value (some object) Example: Redis
  31. Key Value • Like a databse that can only ever

    use primary Key (id) YES select * from users where id = ‘3’; NO select * from users where name = ‘schneems’;
  32. NoSQL @ Gowalla • Redis (key-value store) • Store “Likes”

    & Analytics • Memcache (key-value store) • Cache Database results • Cassandra • (eventually consistent, with-schema, key value store) • Store “feeds” or “timelines” • Solr (search index)
  33. Memcache • Key-Value Store • Open Source • Distributed •

    In memory (ram) only • fast, but volatile • Not ACID • Memory object caching system
  34. Memcache • Can store whole objects memcache = Memcache.new user

    = User.where(:username => “schneems”) memcache.set(“user:3”, user) user_from_cache = memcache.get(“user:3”) user_from_cache == user >> true user_from_cache.username >> “Schneems”
  35. Memcache @ Gowalla • Cache Common Queries • Decreases Load

    on DB (postgres) • Enables higher throughput from DB • Faster response than DB • Users see quicker page load time
  36. What to Cache? • Objects that change infrequently • users

    • spots (places) • etc. • Expensive(ish) sql queries • Friend ids for users • User ids for people visiting spots • etc.
  37. Memcache <3’s DB • We use them Together • If

    memcache doesn’t have a value • Fetch from the database • Set the key from database • Hard • Cache Invalidation : (
  38. Redis • Key Value Store • Open Source • Not

    Distributed (yet) • Extremely Quick • “Data structure server”
  39. Redis - Has Data Types • Strings • Hashes •

    Lists • Sets • Sorted Sets
  40. Redis Example, sets redis = Redis.new redis.sadd(“foo”, “bar”) redis.members(“foo”) >>

    [“bar”] redis.sadd(“foo”, “fly”) redis.members(“foo”) >> [“bar”, “fly”]
  41. Redis => Likeable • Very Fast response • ~ 50

    queries per page view • ~ 1 ms per query • http://github.com/Gowalla/likeable
  42. Cassandra • Open Source • Distributed • Key Value Store

    • Eventually Consistent • Sortof not ACID • Uses A Schema • ColumnFamilies
  43. Cassandra Distributed B C A Eventual Consistency D Data In

    Copied To Extra Nodes ... Eventually
  44. Tradeoffs • Every Data store has them • Know your

    data store • Strengths • Weaknesses
  45. NoSQL vs. RDBMS • No Magic Bullet • Use Both!!!

    • Model data in a datastore you understand • Switch to when/if you need to • Understand Your Options