Slide 1

Slide 1 text

Apache Cassandra and Drivers Overview of Apache Cassandra and DataStax Drivers Bulat Shakirzyanov @avalanche123 Sandeep Tamhankar @stamhankar999 https://goo.gl/cBsRVv

Slide 2

Slide 2 text

Introduction Cassandra Overview

Slide 3

Slide 3 text

© 2015 DataStax, All Rights Reserved. Datacenter Datacenter Cassandra Topology 3 Node Node Node Node Client Client Node Node Node Node Client Client Cluster

Slide 4

Slide 4 text

© 2015 DataStax, All Rights Reserved. Datacenter Datacenter Request Coordinator 4 Node Node Node Node Client Client Node Node Coordinator Node Client Client Coordinator node: Forwards requests to corresponding replicas

Slide 5

Slide 5 text

© 2015 DataStax, All Rights Reserved. Datacenter Row Replica 5 Replica Node Node Replica Client Client Datacenter Node Node Replica Client Client Coordinator Replica node: Stores a slice of total rows of each keyspace

Slide 6

Slide 6 text

© 2015 DataStax, All Rights Reserved. Token Ring 6 12 1 2 3 4 5 6 7 8 9 10 11

Slide 7

Slide 7 text

© 2015 DataStax, All Rights Reserved. Token Ring 6 -263 … (+263 - 1) Murmur3 Partitioner

Slide 8

Slide 8 text

© 2015 DataStax, All Rights Reserved. Token Ring 6 Node 11…12 Node 12…1 Node 1…2 Node 2…3 Node 3…4 Node 4…5 Node 5…6 Node 6…7 Node 7…8 Node 8…9 Node 9…10 Node 10…11 -263 … (+263 - 1) Murmur3 Partitioner

Slide 9

Slide 9 text

© 2015 DataStax, All Rights Reserved. Keyspaces 7 CREATE KEYSPACE default WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 }

Slide 10

Slide 10 text

© 2015 DataStax, All Rights Reserved. C* Data Partitioning 8 Keyspace Row token(PK) = 1 RF = 3 Partitioner: Gets a token by hashing the primary key of a row

Slide 11

Slide 11 text

© 2015 DataStax, All Rights Reserved. C* Replication Strategy 9 Keyspace 1 Row RF = 3 Replication strategy: Determines the first replica for the row token(PK) = 1

Slide 12

Slide 12 text

© 2015 DataStax, All Rights Reserved. C* Replication Factor 10 Keyspace Row RF = 3 Replication factor: Specifies total number of replicas for each row token(PK) = 1

Slide 13

Slide 13 text

© 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 Replica Application Consistency Level RF = 3, CL = Quorum

Slide 14

Slide 14 text

© 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 Replica Application Consistency Level RF = 3, CL = Quorum INSERT

Slide 15

Slide 15 text

© 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 Replica Application Consistency Level RF = 3, CL = Quorum INSERT

Slide 16

Slide 16 text

© 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 Replica Application Consistency Level RF = 3, CL = Quorum INSERT

Slide 17

Slide 17 text

© 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 Replica Application Consistency Level RF = 3, CL = Quorum INSERT

Slide 18

Slide 18 text

DataStax Drivers Smart clients for Apache Cassandra

Slide 19

Slide 19 text

© 2015 DataStax, All Rights Reserved. Goals of DataStax Drivers • Consistent set of features across languages • Asynchronous execution of requests • Load balancing • Fault tolerant • Address Resolution (multi-region!) • Automatic cluster discovery and reconnection • Flexible to the core • Consistent terminology • Open source 13

Slide 20

Slide 20 text

© 2015 DataStax, All Rights Reserved. 14

Slide 21

Slide 21 text

Asynchronous Execution IO Reactor, Request Pipelining and Future Composition

Slide 22

Slide 22 text

© 2015 DataStax, All Rights Reserved. Asynchronous Core 16 Application Thread Business Logic Driver Background Thread IO Reactor

Slide 23

Slide 23 text

© 2015 DataStax, All Rights Reserved. Request Pipelining 17 Client Without Request Pipelining Server Client Server With Request Pipelining 1 2 2 3 1 3 1 2 3 1 2 3

Slide 24

Slide 24 text

© 2015 DataStax, All Rights Reserved. What is a Future? • Represents the result of an asynchronous operation • Returned by any *_async method in the Ruby driver • execute_async • prepare_async • Will block if asked for the true result 18

Slide 25

Slide 25 text

© 2015 DataStax, All Rights Reserved. Future Composition 19 select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get

Slide 26

Slide 26 text

© 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 20

Slide 27

Slide 27 text

© 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 21

Slide 28

Slide 28 text

© 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 22

Slide 29

Slide 29 text

© 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 23

Slide 30

Slide 30 text

© 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 24

Slide 31

Slide 31 text

© 2015 DataStax, All Rights Reserved. Future Composition 25 [# ... >, ... ]

Slide 32

Slide 32 text

© 2015 DataStax, All Rights Reserved. Pop Quiz: How to make this faster? 26 select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get

Slide 33

Slide 33 text

© 2015 DataStax, All Rights Reserved. Pop Quiz: How to make this faster? 27 user_future = session.prepare_async(‘SELECT * FROM users WHERE id = ?') page_future = session.prepare_async(‘SELECT * FROM pages WHERE slug = ?’) user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(user_future.get, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(page_future.get, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get

Slide 34

Slide 34 text

Load Balancing Principles and Implementations

Slide 35

Slide 35 text

© 2015 DataStax, All Rights Reserved. Application Driver Load Balancing 29 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy

Slide 36

Slide 36 text

© 2015 DataStax, All Rights Reserved. Application Driver Load Balancing 29 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy

Slide 37

Slide 37 text

© 2015 DataStax, All Rights Reserved. Application Driver Load Balancing 29 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy

Slide 38

Slide 38 text

© 2015 DataStax, All Rights Reserved. Datacenter Datacenter DataCenter Aware Balancing 30 Node Node Node Client Node Node Node Client Client Client Client Client Local nodes are queried first, if none are available, the request could be sent to a remote node.

Slide 39

Slide 39 text

© 2015 DataStax, All Rights Reserved. Token Aware Balancing 31 Route request directly to Replicas Node Node Replica Node Client Replica Replica Uses prepared statement metadata to get the token

Slide 40

Slide 40 text

© 2015 DataStax, All Rights Reserved. Other built-in policies • Round Robin Policy • ignores topology • White List Policy • only connect with certain hosts 32

Slide 41

Slide 41 text

Fault Tolerance Sources of Failure and Error Handling

Slide 42

Slide 42 text

© 2015 DataStax, All Rights Reserved. Fault Tolerance 34 Coordinator Node Replica Replica Replica Node Business Logic Driver Application

Slide 43

Slide 43 text

© 2015 DataStax, All Rights Reserved. 35 Coordinator Node Replica Replica Replica Node Business Logic Driver Application Invalid Requests Network Timeouts Server Errors Possible Failures

Slide 44

Slide 44 text

© 2015 DataStax, All Rights Reserved. Application Driver Automatic Retry of Server Errors 36 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy

Slide 45

Slide 45 text

© 2015 DataStax, All Rights Reserved. Application Driver Automatic Retry of Server Errors 36 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy

Slide 46

Slide 46 text

© 2015 DataStax, All Rights Reserved. Application Driver Automatic Retry of Server Errors 36 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy

Slide 47

Slide 47 text

© 2015 DataStax, All Rights Reserved. 37 Coordinator Node Replica Replica Replica Node Business Logic Driver Application Unreachable Consistency

Slide 48

Slide 48 text

© 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 38 Replica Business Logic Driver Application Read / Write Timeout Error

Slide 49

Slide 49 text

© 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 38 Replica Business Logic Driver Application Read / Write Timeout Error

Slide 50

Slide 50 text

© 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 38 Replica Business Logic Driver Application Read / Write Timeout Error read / write timeout

Slide 51

Slide 51 text

© 2015 DataStax, All Rights Reserved. 39 Coordinator Node Replica Replica Replica Node Business Logic Driver Application Unavailable Error

Slide 52

Slide 52 text

© 2015 DataStax, All Rights Reserved. 39 Coordinator Node Replica Replica Replica Node Business Logic Driver Application Unavailable Error unavailable

Slide 53

Slide 53 text

© 2015 DataStax, All Rights Reserved. 40 Error Handling

Slide 54

Slide 54 text

Address Resolution Topology Aware Client

Slide 55

Slide 55 text

© 2015 DataStax, All Rights Reserved. Datacenter Datacenter Multiple Addresses 42 Node Node Node Node Client Client Node Node Node Node Client Client Within Datacenter:
 Private IPs Across Datacenters:
 Public IPs

Slide 56

Slide 56 text

© 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Application Thread Application Thread Client Cluster

Slide 57

Slide 57 text

© 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Node Cluster Application Thread Application Thread Client Cluster Address Resolution Policy

Slide 58

Slide 58 text

© 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Node Cluster Application Thread Application Thread Client Cluster Node Node Node Address Resolution Policy Control Connection

Slide 59

Slide 59 text

© 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Node Cluster Application Thread Application Thread Client Cluster Node Node Node Address Resolution Policy Control Connection

Slide 60

Slide 60 text

© 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Node Pool Cluster Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Address Resolution Policy Control Connection Session

Slide 61

Slide 61 text

© 2015 DataStax, All Rights Reserved. EC2 Multi-Region Address Resolution 44

Slide 62

Slide 62 text

© 2015 DataStax, All Rights Reserved. More • Request Tracing • Execution Information • which node was used, # retries for query, etc. • State Listeners • node goes down/comes up, schema changes, etc. • Result Paging • SSL and Authentication 45

Slide 63

Slide 63 text

Questions