Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web Scale with NoSQL

Web Scale with NoSQL

Sergejus

April 09, 2011
Tweet

More Decks by Sergejus

Other Decks in Technology

Transcript

  1. Who Am I?  Architect at  Running NoSQL servers

    in production  Blogger (http://sergejus.blogas.lt, @sergejusb)  Community member (http://dotnetgroup.lt)  Contact me via [email protected]
  2. in numbers  600 000 000 users  30 000

    servers  20+ TB raw data per day  >20 PB stored data
  3. Why NoSQL  Limited SQL scalability  Sharding and vertical

    partitioning  Limited SQL availability  Master / slave configuration  Limited SQL speed of read operations  Multiple read replicas  SQL limitations for huge amount of data  Key / value / type columns
  4. NoSQL history  2009, Eric Evans, no:sql(est)  NoSQL –

    open source distributed databases, not relational SQL databases  NoSQL – not only SQL  NoSQL → Big Data
  5. NoSQL characteristics (1/2)  Scalability  The ability to horizontally

    scale simple- operation throughput over many servers  BASE  A “weaker” concurrency model than the ACID transactions in most SQL systems
  6. NoSQL characteristics (2/2)  Distributed  Efficient use of distributed

    indexes and RAM for data storage  Schema-less  The ability to dynamically define new attributes or data schema
  7. CAP theorem  2000, Eric Brewer  It is impossible

    for a distributed computer system to simultaneously provide all three of the following guarantees:  Consistency  Availability  Partition tolerance
  8. NoSQL categories  Key / value store  Document database

     Graph database  Columnar database
  9. Key / value store  <key, value> or Tuple<key, v1,.

    ., vn>  Simple operations  Get  Put  Delete Byte[] Byte[] Key Value
  10. Key / value stores  Redis  (+)messaging  (-)no

    shards  Voldermort  Membase  (+)memcache interface  Riak
  11. Document database  Document == complex object  XML 

    YAML  JSON / BSON  Support for secondary indexes  Schema can be defined at runtime  Optional support for simple querying using Map / Reduce
  12. Graph database  Graph == network  Basic constructs 

    Node  Edge  Properties sergejus sergejus.blogas.lt tdagys knows knows
  13. Graph databases  Neo4j  (-)paid version required for scaling

     FlockDB  (+)fast  (-)limited functionality
  14. Columnar database  For HUGE amount of data  Columns

    are added at a runtime  Great scalability  Horizontal  Vertical
  15. Columnar database  Unusual data model  Key Space →

    Database  Column Family → Table  Columns and Super Columns  Super Column → array of Columns  Column → Tuple<Key, Value, Timestamp, TTL>
  16. Columnar database  Cassandra  (+)easy scalable  HBase 

    (+)consistent  (+)part of Hadoop  Hypertable
  17. NoSQL limitations  ORDER BY ?  Natural key order

     GROUP BY ?  Map / Reduce*  JOIN ?  Multiple Map / Reduce*  SELECT * ?  Multi-machine Map / Reduce* *if possible