Slide 1

Slide 1 text

NoSQL (Not Only SQL) Next generation web- scale databases A brief look at Apache Cassandra Distributed Database

Slide 2

Slide 2 text

Who am I • Joe Alex – Software Architect / Data Scientist Loves to code in Java, Scala – Areas of Interest: Big Data, Data Analytics, Machine Learning, Hadoop, Cassandra – Currently working as Team Lead for Managed Security Services Portal at Verizon

Slide 3

Slide 3 text

3 New Face of data Scale out not up •Big Data –user generated; Amazon, Social Networks: Twitter, Facebook, Four Square –machine generated; credit cards, RFID, POS, cell phones, GPS, firewalls, routers –more and more connected –less structured –data sets becoming larger and larger –joins and relationships are exploding –cloud computing - scaling and tolerance needs –backing up is replaced with having multiple active copies –nodes can crash and applications should survive –nodes can be added or removed at any point of time

Slide 4

Slide 4 text

4 New Face of data Internet of Things (real-world objects connect to the Internet) – 'Internet of Things' will infuse intelligence into all our systems and present us with a whole new way to run a home, an enterprise, a community or an economy. In a 4G world, wireless will connect everything and that there's really no limit to the number of connections that can be part of the mobile grid: vehicles, appliances, buildings, roads, medical monitors.“ – recently announced a partnership with American Security Logistics (ASL), to "wirelessly connect a series of location based tracking devices that can be used to help keep tabs on an array of valuables - from people to pets to pallets. – 2013, the number of devices connected to the Internet will reach 1 trillion - up from 500 million in 2007.

Slide 5

Slide 5 text

5 New Face of data Scale out not up •Traditional RDBMS – neither economical or capable – scaling up doesn't work – scaling out with traditional DB is not easy • scaling reads to a relational DB is hard • scaling writes is almost impossible – when you try to do, it is not relational anymore – sharding scales • but you lose all features that make RDBMS useful • operational nightmare – volumes of data strain commercial RDBMS – cloud computing – rethink how we store data. Understand your data, find the most efficient model – de-normalization. normalization strives to remove duplication but duplication is an interesting alternative to joins

Slide 6

Slide 6 text

6 New Face of data What is wrong with RDBMS •Pros –SQL lets you query all data at once –enforces data integrity –minimizes repetition –proven –familiar to DBA, users •Cons –rigidly schematic –joins rapidly become a bottleneck –difficult to scale up –gets in way of parallization –optimization may mitigate benefits of normalization (Sharding)

Slide 7

Slide 7 text

7 New Face of data What is good with NRDBMS •Pros –schemaless –master-master replication –scales well –everything runs in parallel –built for the web •Cons –integrity-enforcement migrates to code –limited ORM tooling –significant learning curve –proven only in a sub-set of cases –Unlearning normalization is difficult

Slide 8

Slide 8 text

8 New Face of data What is good with NRDBMS – Relational databases do not fit every problem – stuffing files in to an RDBMS, maybe there is something better – using RDBMS for caching, perhaps a lighter weight solution is better – cramming log data into a RDBMS, perhaps a KeyValue store is better – trying to do parallel processing with a DB maybe Hadoop MapReduce is better – executing a long running process taking few hours, may be MapReduce with Hadoop/Hbase is better and get it done in minutes – Despite the hype, RDBMS are not doomed, but – their role and place will certainly change – Scaling is a real challenge for relational db • sharding is a band-aid, not feasible beyond a few nodes – There is a hit in overcoming the initial leaning curve • it changes how you build applications (jsp, jsf, jpa) – Drop ACID and think about data

Slide 9

Slide 9 text

9 New Face of data What is good with NRDBMS –Webapps need • elastic scalability • flexible schemas • geographic distribution • high availability • reliable storage –Webapps can do without • complicated queries • strong transactions ( some form of consistency is still desirable) –DB vs NoSQL • Strong consistency vs Eventual consistency • Big dataset vs Huge Datasets • Scaling is possible vs Scaling is easy • SQL vs MapReduce, API etc • Good availability vs Very high availability

Slide 10

Slide 10 text

10 CAP Theorem You cant have it all –What is ACID • Atomic • Consistent • Isolated • Durable –ACID trips when • downtime is unacceptable • reliability is >= 2 nodes • challenging over Networks

Slide 11

Slide 11 text

11 CAP Theorem You cant have it all •What is CAP Theorem – Distributed systems can have any two • Consistency (data is correct at all times) – ACID transactions • Availability (read and write all the time) – Total Redundancy • Partition Tolerance (plug and play nodes) – Infinite scale out – CA - corruption is possible if live nodes cant communicate – CP - completely inaccessible if any nodes are dead – AP - always available, but not always read most recent – Cassandra chooses A and P but allows them to be tunable to have more C – RDBMS are typically CA

Slide 12

Slide 12 text

12 CAP Theorem You cant have it all •What is BASE – ACID Alternative – Basically Available (appears to work all the time) – Soft state (doesn't have to be consistent all the time) – Eventually consistent (but eventually it will be) –BASE (basically available, soft state, eventually consistent) rather than ACID (atomicity, consistency, isolation, durability )

Slide 13

Slide 13 text

13 NoSQL It is really Not Only SQL •What problems does it solve –Reliable and simple scaling –No single point of failure (all nodes are identical) –High write throughput –Large data sets –Scale out not up –Online load balancing, cluster growth –flexible schema –key-oriented queries –CAP aware

Slide 14

Slide 14 text

14 NoSQL It is really Not Only SQL •Many choices –Key/Value Stores (distributed hash tables) Stores entities as key value pairs in large hash tables – Voldemort, Redis, Riak, SimpleDB, Tokyo Cabinet, Dynomite, MemcacheDB –Column Oriented (semi-structured) Stores entities by Column – Cassandra, Bigtable, HBase, Hypertable, Azure table services –Document (semi-structured) stores documents (JSON) – CouchDB, MongoDB –Graph (stores entities as nodes and edges) – Neo4j

Slide 15

Slide 15 text

15 NoSQL It is really Not Only SQL

Slide 16

Slide 16 text

16 Cassandra Highly scalable distributed database • Created at Facebook – Designed by Avinash Lakshman and Prashant Malik – Open sourced by Facebook in 2008 – Apache Incubator – Graduated in March 2009 – Dynamo's fully distributed design – Bigtable's Column Family-based data model

Slide 17

Slide 17 text

17 Cassandra Highly scalable distributed database – Proven • largest production cluster has over 100 TB of data in over 150 machines. – Fault Tolerant • automatically replicated to multiple nodes for fault-tolerance • Replication across multiple data centers supported • Failed nodes can be replaced with no downtime – Decentralized • Every node in the cluster is identical • no network bottlenecks • no SPOF – You're in control • Choose between synchronous or asynchronous replication for each update • Highly available asynchronous operations are optimized with features like Hinted Handoff and Read Repair – Rich Data Model • Allows efficient use for many applications beyond simple key/value – Elastic • Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to application – Durable • Cassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down

Slide 18

Slide 18 text

18 Cassandra Highly scalable distributed database –High Availability. Writes never fail. –Incremental scalability –Eventually Consistent (Hinted Handoff, Read Repair) –Tunable tradeoffs between consistency and latency – partitioning, replication –Minimal administration –No Single Point Of Failure (SPOF) –Key-Value store (with some structure) –Schemaless –MapReduce support –Two read paths available: high-performance weak reads/quorum reads –Reads and writes atomic within a single Column Family –Versioning and conflict resolution (last update wins)

Slide 19

Slide 19 text

19 Cassandra Who is using it • Used by – Twitter – Facebook – Digg – Rackspace – Reddit – IBM – Cisco – SimpleGeo – Cloudkick – Comcast – Mahalo – Ooyala – OpenX

Slide 20

Slide 20 text

20 Dynamo architecture & Lookup

Slide 21

Slide 21 text

21 Cassandra Highly scalable distributed database

Slide 22

Slide 22 text

22 Memtable SSTable

Slide 23

Slide 23 text

23 Cassandra Highly scalable distributed database • Writes – no reads – no seeks – sequential disk access – atomic within CF – Fast – Any node – Always writable (hinted hand-off) – Writes go to a commit log and in-memory storage (memtable) – Memtable is occasionally flushed to disk (SSTable) – The SSTables are periodically compacted – Partitioner – Wait for W responses – client issues a write req to a random node in the cassandra cluster partitioner determines the nodes responsible for the data – No locks in critical path – always writable - accepts writes during failure scenarios

Slide 24

Slide 24 text

24 Cassandra Highly scalable distributed database • Reads – Any nodes – read repair – usual cache conventions apply – Bloom Filters before SSTable – reads (memtable, sstable) – Partitioner – Wait for N – R responses in the background and perform read repair – Read multiple SSTables – Slower than writes (but still fast) – Scales to billions of rows – Read repair when out of synch – Row Cache avoid SSTable lookup – key cache avoid index scan

Slide 25

Slide 25 text

25 Cassandra Highly scalable distributed database Messaging service Gossip Failure detection Cluster state Partitioner Replication Commit log Memtable SSTable Indexes Compaction Tombstones Hinted handoff Read repair Bootstrap Monitoring Admin tools

Slide 26

Slide 26 text

26 Compared with MySQL • MySQL – 300ms write – 350ms read • Cassandra – 0.12 ms write – 15ms read – on 50GB data

Slide 27

Slide 27 text

27 Clients • Most common way to access is via Thrift Interface. • Other clients for most languages • http://wiki.apache.org/cassandra/ClientExamples • Fauna – Twitter’s Ruby client • Lazyboy - Digg’s Python library

Slide 28

Slide 28 text

28 Datamodels • Cluster: machines (nodes) in logical Cassandra instance. Clusters can contain multiple keyspaces. • Keyspace: namespace for ColumnFamilies. (Analogous to DB schema) • ColumnFamilies: contain multiple columns, referenced by row keys. (Analogous to table) • SuperColumns: columns that themselves have subcolumns.

Slide 29

Slide 29 text

29 Datamodel

Slide 30

Slide 30 text

30 Column • Lowest increment of data. Analogous to Name/Value pairs or Attribute. Key is ID. • { "name": "emailAddress", "value": "[email protected]", "timestamp": 123456789 }

Slide 31

Slide 31 text

31 SuperColumn • Value is a Map of Columns • {name: “address", value: { street: {name: "street", value: “888 anywhere", timestamp: 123456789}, city: {name: "city", value: “reston", timestamp: 123456789}, zip: {name: "zip", value: “20190", timestamp: 123456789}, } }

Slide 32

Slide 32 text

32 Column Families • Analogous to Tables. Rows can have different columns. Columns can be created dynamically. Columns are always sorted in row by Column name. • User = { keyhole : { username: “keyhole", email: " [email protected]“}, spacer: { username: “spacer", email: “[email protected]", phone: "(888) 888-8888“} }

Slide 33

Slide 33 text

33 Column Families

Slide 34

Slide 34 text

34 Super Column Families

Slide 35

Slide 35 text

35 Column Families • Analogous to Tables. Rows can have different columns. Columns can be created dynamically. Columns are always sorted in row by Column name. • User = { keyhole : { username: “keyhole", email: " [email protected]“}, spacer: { username: “spacer", email: “[email protected]", phone: "(888) 888-8888“} }

Slide 36

Slide 36 text

36 Type of Queries • Single column • Slice • Key range • Quering : get(), multiget(), get_slice(), multiget_slice(0, get_count, get_range_slice() • Column comparators - TimeuUID, LexicalUUID, UTF8, Long, Bytes, ... • Updating - insert(), batch_insert(), remove(), batch_mutate(), remove key range

Slide 37

Slide 37 text

37 Cassandra • Conclusions – You probably do not need an NRDBMS now, but ought to learn one anyway – Its not just for Twitter and bleeding edge startups Amazon, Facebook, Google, IBM, Microsoft all get this – Sometimes it is simply the right tool for the job – if you are in the cloud you are going to use them – best of both worlds - external mapping layer JPA driver – Next Big thing - In Memory elastic DB • memory can be much more efficient than disk • RAMClouds become much more attractive for apps with high throughputs requirements

Slide 38

Slide 38 text

38 More… •Other articles/videos about Cassandra –http://wiki.apache.org/cassandra/ –#cassandra on irc.freenode.net –http://wiki.apache.org/cassandra/ArticlesAndPresent ations

Slide 39

Slide 39 text

Questions Twitter @joealex Email [email protected]