Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web Scale with NoSQL

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Web Scale with NoSQL

Avatar for Sergejus

Sergejus

April 09, 2011
Tweet

More Decks by Sergejus

Other Decks in Technology

Transcript

  1. Who Am I?  Architect at  Running NoSQL servers

    in production  Blogger (http://sergejus.blogas.lt, @sergejusb)  Community member (http://dotnetgroup.lt)  Contact me via [email protected]
  2. in numbers  600 000 000 users  30 000

    servers  20+ TB raw data per day  >20 PB stored data
  3. Why NoSQL  Limited SQL scalability  Sharding and vertical

    partitioning  Limited SQL availability  Master / slave configuration  Limited SQL speed of read operations  Multiple read replicas  SQL limitations for huge amount of data  Key / value / type columns
  4. NoSQL history  2009, Eric Evans, no:sql(est)  NoSQL –

    open source distributed databases, not relational SQL databases  NoSQL – not only SQL  NoSQL → Big Data
  5. NoSQL characteristics (1/2)  Scalability  The ability to horizontally

    scale simple- operation throughput over many servers  BASE  A “weaker” concurrency model than the ACID transactions in most SQL systems
  6. NoSQL characteristics (2/2)  Distributed  Efficient use of distributed

    indexes and RAM for data storage  Schema-less  The ability to dynamically define new attributes or data schema
  7. CAP theorem  2000, Eric Brewer  It is impossible

    for a distributed computer system to simultaneously provide all three of the following guarantees:  Consistency  Availability  Partition tolerance
  8. NoSQL categories  Key / value store  Document database

     Graph database  Columnar database
  9. Key / value store  <key, value> or Tuple<key, v1,.

    ., vn>  Simple operations  Get  Put  Delete Byte[] Byte[] Key Value
  10. Key / value stores  Redis  (+)messaging  (-)no

    shards  Voldermort  Membase  (+)memcache interface  Riak
  11. Document database  Document == complex object  XML 

    YAML  JSON / BSON  Support for secondary indexes  Schema can be defined at runtime  Optional support for simple querying using Map / Reduce
  12. Graph database  Graph == network  Basic constructs 

    Node  Edge  Properties sergejus sergejus.blogas.lt tdagys knows knows
  13. Graph databases  Neo4j  (-)paid version required for scaling

     FlockDB  (+)fast  (-)limited functionality
  14. Columnar database  For HUGE amount of data  Columns

    are added at a runtime  Great scalability  Horizontal  Vertical
  15. Columnar database  Unusual data model  Key Space →

    Database  Column Family → Table  Columns and Super Columns  Super Column → array of Columns  Column → Tuple<Key, Value, Timestamp, TTL>
  16. Columnar database  Cassandra  (+)easy scalable  HBase 

    (+)consistent  (+)part of Hadoop  Hypertable
  17. NoSQL limitations  ORDER BY ?  Natural key order

     GROUP BY ?  Map / Reduce*  JOIN ?  Multiple Map / Reduce*  SELECT * ?  Multi-machine Map / Reduce* *if possible