Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Cassandra

Jeff Chao
April 07, 2014

An Introduction to Cassandra

Architecture, data modeling, and cross-datacenter operations.

Jeff Chao

April 07, 2014
Tweet

More Decks by Jeff Chao

Other Decks in Technology

Transcript

  1. Cassandra is… a distributed columnar data store with no single

    point of failure that is optimized for writes and availability
  2. Cassandra is… a distributed columnar data store with no single

    point of failure that is optimized for writes and availability with tunable consistency and node auto-discovery
  3. When a node is down or unresponsive Coordinator will store

    a hint locally Hinted writes don’t count toward consistency level requirements
  4. CQL

  5. Example 1 name age role alice 30 writer bob 35

    legal steve 38 ceo SELECT * from employees; SELECT * from employees WHERE name = ‘alice’; SELECT * from employees WHERE age = ’30’;
  6. Example 1 age role alice 30 writer age role bob

    35 legal age role steve 38 ceo
  7. Example 1 alice age: 30 role: writer bob age: 35

    role: legal steve age: 38 role: ceo
  8. CREATE TABLE employees ( company varchar name varchar, age int,

    role varchar, primary key (company, name) ); Example 2
  9. Example 2 company name age role CRM alice 30 writer

    CRM bob 35 legal GOOG steve 38 ceo GOOG ben 43 dev GOOG mary 25 dev SELECT * from employees; SELECT * from employees WHERE company = ‘CRM’; SELECT * from employees WHERE company = ‘CRM’ and name = ‘alice’; SELECT * from employees WHERE name = ’alice’;
  10. Example 2 alice:age alice:role bob:age bob:role CRM 30 writer 35

    legal steve:age steve:role ben:age ben:role mary:age mary:role GOOG 38 ceo 43 dev 25 dev
  11. CREATE TABLE employees ( company varchar, status varchar, role varchar,

    loc varchar, name varchar, age int, primary key ((company, status), role, loc) ); Example 3
  12. Example 3 company status role loc name age CRM new

    writer CA alice 30 CRM new legal TX bob 35 CRM new ceo WA steve 38 CRM old dev MD ben 43 GOOG mid dev AZ mary 25 SELECT * from employees; SELECT * from employees WHERE company = ‘CRM’ and status = ‘new’; SELECT * from employees WHERE company = ‘CRM’ and status=‘new’ AND role=‘ceo’; SELECT * from employees WHERE company = ‘CRM’; SELECT * from employees WHERE name = ‘alice’; SELECT * from employees WHERE company = ‘CRM’ AND status=‘new’ and loc=‘WA’;
  13. Example 3 dev:MD:name dev:MD:age CRM, old ben 43 writer:CA:name writer:CA:age

    legal:TX:name legal:TX:age ceo:WA:name ceo:WA:age CRM, new alice 30 bob 35 steve 38 dev:AZ:name dev:AZ:age GOOG, mid mary 25
  14. Feature Cassandra Dynamo TTL yes no Hadoop integration m/r, hive,

    pig m/r, hive multi-datacenter full cross-region multi availability zones only idempotent write batches yes no largest value supported 2GB 64KB conditional updates no yes backups snapshot, incremental manually with EMR, s3
  15. In closing… • Distributed store, no SPoF • Tunable consistency

    • Really fast writes • Fast reads (with good data modeling) • Trade-off between data modeling/ app code with scaling and ops