Slide 1

Slide 1 text

©2014 DataStax Confidential. Do not distribute without consent. @rstml Rustam Aliyev Solution Architect Deep dive into CQL and CQL improvements in Cassandra 2.1 1

Slide 2

Slide 2 text

What is CQL? * Cassandra Query Language (CQL) * SQL-like language for communicating with Cassandra * Simpler than the Thrift API * An abstraction layer that hides implementation details This is what we want to understand

Slide 3

Slide 3 text

Use Case * Messaging Application * Group Conversations * Attachments

Slide 4

Slide 4 text

Simple CQL Table CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) );

Slide 5

Slide 5 text

TimeUUID * Also known as a Version 1 UUID * Sortable Timestamp to Microsecond + UUID = TimeUUID 04d580b0-9412-11e3-baa8-0800200c9a66 12 February 2014 13:18:06 GMT http://www.famkruithof.net/uuid/uuidgen" =

Slide 6

Slide 6 text

Primary Key CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender uuid, PRIMARY KEY (conversation_id, message_id) ); Partition Key Clustering Column * Also Primary Index

Slide 7

Slide 7 text

Partition Key conversation_id: 04d580b0-9412-…9a66 Replica * Determines partition (and replicas) * Remaining columns are stored on the determined partition RF=3

Slide 8

Slide 8 text

Clustering Column Merged, Sorted and Stored Sequentially 04d580b0-9412-…9a66 2013-04-03 07:01:00 content: Hi! sender: [email protected] 2013-04-03 07:03:20 content: Hello! Sender: tom@example… 2013-04-03 07:04:52 content: Where are you? sender: [email protected] 2013-04-03 07:05:01 content: in Istanbul sender: tom@example… 2013-04-03 07:06:32 content: wow! how come sender: [email protected] * Data on disk is ordered based on Clustering Column * Efficient retrieval with range queries (slice) SELECT * FROM messages WHERE conversation_id = '04d580b0-9412-…9a66' AND message_id > minTimeuuid('2013-04-03 07:04:00') AND message_id < maxTimeuuid('2013-04-03 07:10:00');

Slide 9

Slide 9 text

Data on Disk Partition Key (Row Key) Column Name 1 Column Value 1 Column Name 2 Column Value 2 Column Name 3 Column Value 3 ... Column Name N Column Value N

Slide 10

Slide 10 text

Data on Disk 04d580b0-9412-3a00-93d1-46196ee79a66 dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f: dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi! dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected] 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f: 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:content Hello! 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:sender [email protected] ... Clustering Column (message_id) Column Name Column Value Partition Key (conversation_id)  INSERT  INTO  messages  (conversation_id,  message_id,  content,  sender)  VALUES      (04d580b0-­‐9412-­‐3a00-­‐93d1-­‐46196ee79a66,  2f3feb0f-­‐9c24-­‐11e2-­‐7f7f-­‐7f7f7f7f7f7f,          'Hello!',  '[email protected]');  

Slide 11

Slide 11 text

Order of Clustering Keys CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) ) WITH CLUSTERING ORDER BY (message_id DESC); * We need only most recent N messages * Storing messages in reverse TimeUUID order will speedup queries

Slide 12

Slide 12 text

Static Columns CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) ); * Let’s add conversation owner (admin) * Owner is related to conversation (Partition Key) not message (Clustering Key)

Slide 13

Slide 13 text

Static Columns UPDATE messages SET conversation_owner = '[email protected]' WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66; * Same UPDATE with non-static field will fail

Slide 14

Slide 14 text

Static Columns on Disk 04d580b0-9412-3a00-93d1-46196ee79a66 :null:conversation_owner [email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f: dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi! dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected] 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f: 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:content Hello! 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:sender [email protected] ... Static Column

Slide 15

Slide 15 text

Collections: Set CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set, PRIMARY KEY (conversation_id, message_id) ); * We want to keep message recipients * List of recipients may vary as people join and leave conversation

Slide 16

Slide 16 text

Collections: Set UPDATE messages SET recipients = {'[email protected]', '[email protected]'} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Slide 17

Slide 17 text

Set on Disk 04d580b0-9412-3a00-93d1-46196ee79a66 :null:conversation_owner [email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f: dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi! dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected] 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f: ... Set

Slide 18

Slide 18 text

Collections: Map CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set, attachments map, PRIMARY KEY (conversation_id, message_id) ); * Let’s add attachments to message * Each attachment would have name and location (URI)

Slide 19

Slide 19 text

Collections: Map UPDATE messages SET attachments = {'picture.png':'http://cdn.exmpl.com/1234.png', 'audio.wav':'http://cdn.exmpl.com/5678.wav'} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Slide 20

Slide 20 text

Map on Disk 04d580b0-9412-3a00-93d1-46196ee79a66 :null:conversation_owner [email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f: dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi! dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:attachments:picture.png http://cdn.exmpl.com/1234.png dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:attachments:audio.wav http://cdn.exmpl.com/5678.wav 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f: ... Map Name Key Value

Slide 21

Slide 21 text

Collections: List CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set, attachments map, seen_by list, PRIMARY KEY (conversation_id, message_id) ); * We want to know which participants have seen message and preserve order

Slide 22

Slide 22 text

Collections: List UPDATE messages SET seen_by = ['[email protected]', '[email protected]'] WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Slide 23

Slide 23 text

List on Disk 04d580b0-9412-3a00-93d1-46196ee79a66 :null:conversation_owner [email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f: dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi! dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected] dbcd9d0f-...-7f7f-7f7f7f7f7f7f:seen_by:26017c10-f487-11e2-801f-df9895e5d0f8 [email protected] dbcd9d0f-...-7f7f-7f7f7f7f7f7f:seen_by:26017c11-f487-11e2-801f-df9895e5d0f8 [email protected] 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f: ... List Name Element ID (TimeUUID) Value

Slide 24

Slide 24 text

User Defined Types (UDT) CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set, seen_by list, attachments map, PRIMARY KEY (conversation_id, message_id) ); * New in Cassandra 2.1 * Let’s add more attributes to attachments CREATE TYPE attachment ( size int, mime text, uri text );

Slide 25

Slide 25 text

User Defined Types UPDATE messages SET attachments = attachments + { 'picture.png': { size: 10240, mime: 'image/png', uri: 'http://cdn.exmpl.com/1234.png' }} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Slide 26

Slide 26 text

UDT on Disk 04d580b0-9412-3a00-93d1-46196ee79a66 :null:conversation_owner [email protected] dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f: dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi! dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected] dbcd9d0f-...-7f7f7f7f7f7f:recipient:[email protected] dbcd9d0f-...-7f7f7f7f7f7f:recipient:[email protected] dbcd9d0f-...-7f7f7f7f7f7f:attachments:picture.png 10240:'image/png':'http://cdn.exmpl.com/ 1234.png' 2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f: ... Map Key UDT Value

Slide 27

Slide 27 text

Secondary Indexes CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set, seen_by list, attachments map, PRIMARY KEY (conversation_id, message_id) ); * What if we want to lookup messages by sender? CREATE INDEX sender_idx ON messages(sender); "

Slide 28

Slide 28 text

Secondary Indexes

Slide 29

Slide 29 text

Secondary Indexes Internally sender_idx { "[email protected]" { 54bbfd0f-9c02-11e2-7f7f-7f7f7f7f7f7f : null, df04610f-9c02-11e2-7f7f-7f7f7f7f7f7f : null }, "[email protected]" { a82e4b0f-9c02-11e2-7f7f-7f7f7f7f7f7f : null } } * Each node will keep reverse index for local data only

Slide 30

Slide 30 text

Indexes on Collections CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set, seen_by list, attachments map, PRIMARY KEY (conversation_id, message_id) ); * New in Cassandra 2.1 CREATE INDEX recipients_idx ON messages(recipients); "

Slide 31

Slide 31 text

Indexes on Collections

Slide 32

Slide 32 text

Way more information • 5 minute interviews • Use cases • Free training! www.planetcassandra.org

Slide 33

Slide 33 text

Questions?