Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Cassandra and Drivers

Apache Cassandra and Drivers

From Cassandra user group meetup:

This is a two part talk in which we'll go over the architecture that enables Apache Cassandra’s linear scalability as well as how DataStax Drivers are able to take full advantage of it to provide developers with nicely designed and speedy clients extendable to the core.

In the first part of this talk, Bulat will demystify Cassandra terms like replica, coordinator, replication factor, token map, conflict resolution and consistency level. And after that, Sandeep will teach you everything there is to know about various features of the DataStax Drivers, such as asynchronous APIs, futures, error handling, load balancing and retry policies, address resolution, schema metadata and host state listeners.

Bulat Shakirzyanov

January 28, 2016
Tweet

More Decks by Bulat Shakirzyanov

Other Decks in Programming

Transcript

  1. Apache Cassandra and Drivers Overview of Apache Cassandra and DataStax

    Drivers Bulat Shakirzyanov @avalanche123 Sandeep Tamhankar @stamhankar999 https://goo.gl/cBsRVv
  2. © 2015 DataStax, All Rights Reserved. Datacenter Datacenter Cassandra Topology

    3 Node Node Node Node Client Client Node Node Node Node Client Client Cluster
  3. © 2015 DataStax, All Rights Reserved. Datacenter Datacenter Request Coordinator

    4 Node Node Node Node Client Client Node Node Coordinator Node Client Client Coordinator node: Forwards requests to corresponding replicas
  4. © 2015 DataStax, All Rights Reserved. Datacenter Row Replica 5

    Replica Node Node Replica Client Client Datacenter Node Node Replica Client Client Coordinator Replica node: Stores a slice of total rows of each keyspace
  5. © 2015 DataStax, All Rights Reserved. Token Ring 6 Node

    11…12 Node 12…1 Node 1…2 Node 2…3 Node 3…4 Node 4…5 Node 5…6 Node 6…7 Node 7…8 Node 8…9 Node 9…10 Node 10…11 -263 … (+263 - 1) Murmur3 Partitioner
  6. © 2015 DataStax, All Rights Reserved. Keyspaces 7 CREATE KEYSPACE

    default WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 }
  7. © 2015 DataStax, All Rights Reserved. C* Data Partitioning 8

    Keyspace Row token(PK) = 1 RF = 3 Partitioner: Gets a token by hashing the primary key of a row
  8. © 2015 DataStax, All Rights Reserved. C* Replication Strategy 9

    Keyspace 1 Row RF = 3 Replication strategy: Determines the first replica for the row token(PK) = 1
  9. © 2015 DataStax, All Rights Reserved. C* Replication Factor 10

    Keyspace Row RF = 3 Replication factor: Specifies total number of replicas for each row token(PK) = 1
  10. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica

    Node 11 Replica Application Consistency Level RF = 3, CL = Quorum
  11. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica

    Node 11 Replica Application Consistency Level RF = 3, CL = Quorum INSERT
  12. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica

    Node 11 Replica Application Consistency Level RF = 3, CL = Quorum INSERT
  13. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica

    Node 11 Replica Application Consistency Level RF = 3, CL = Quorum INSERT
  14. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica

    Node 11 Replica Application Consistency Level RF = 3, CL = Quorum INSERT
  15. © 2015 DataStax, All Rights Reserved. Goals of DataStax Drivers

    • Consistent set of features across languages • Asynchronous execution of requests • Load balancing • Fault tolerant • Address Resolution (multi-region!) • Automatic cluster discovery and reconnection • Flexible to the core • Consistent terminology • Open source 13
  16. © 2015 DataStax, All Rights Reserved. Asynchronous Core 16 Application

    Thread Business Logic Driver Background Thread IO Reactor
  17. © 2015 DataStax, All Rights Reserved. Request Pipelining 17 Client

    Without Request Pipelining Server Client Server With Request Pipelining 1 2 2 3 1 3 1 2 3 1 2 3
  18. © 2015 DataStax, All Rights Reserved. What is a Future?

    • Represents the result of an asynchronous operation • Returned by any *_async method in the Ruby driver • execute_async • prepare_async • Will block if asked for the true result 18
  19. © 2015 DataStax, All Rights Reserved. Future Composition 19 select_user

    = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get
  20. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT *

    FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 20
  21. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT *

    FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 21
  22. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT *

    FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 22
  23. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT *

    FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 23
  24. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT *

    FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 24
  25. © 2015 DataStax, All Rights Reserved. Future Composition 25 [#<User

    @id=1 @username="avalanche123"; @page=#<Page @slug="avalanche123" ... > ... >, ... ]
  26. © 2015 DataStax, All Rights Reserved. Pop Quiz: How to

    make this faster? 26 select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get
  27. © 2015 DataStax, All Rights Reserved. Pop Quiz: How to

    make this faster? 27 user_future = session.prepare_async(‘SELECT * FROM users WHERE id = ?') page_future = session.prepare_async(‘SELECT * FROM pages WHERE slug = ?’) user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(user_future.get, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(page_future.get, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get
  28. © 2015 DataStax, All Rights Reserved. Application Driver Load Balancing

    29 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  29. © 2015 DataStax, All Rights Reserved. Application Driver Load Balancing

    29 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  30. © 2015 DataStax, All Rights Reserved. Application Driver Load Balancing

    29 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  31. © 2015 DataStax, All Rights Reserved. Datacenter Datacenter DataCenter Aware

    Balancing 30 Node Node Node Client Node Node Node Client Client Client Client Client Local nodes are queried first, if none are available, the request could be sent to a remote node.
  32. © 2015 DataStax, All Rights Reserved. Token Aware Balancing 31

    Route request directly to Replicas Node Node Replica Node Client Replica Replica Uses prepared statement metadata to get the token
  33. © 2015 DataStax, All Rights Reserved. Other built-in policies •

    Round Robin Policy • ignores topology • White List Policy • only connect with certain hosts 32
  34. © 2015 DataStax, All Rights Reserved. Fault Tolerance 34 Coordinator

    Node Replica Replica Replica Node Business Logic Driver Application
  35. © 2015 DataStax, All Rights Reserved. 35 Coordinator Node Replica

    Replica Replica Node Business Logic Driver Application Invalid Requests Network Timeouts Server Errors Possible Failures
  36. © 2015 DataStax, All Rights Reserved. Application Driver Automatic Retry

    of Server Errors 36 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  37. © 2015 DataStax, All Rights Reserved. Application Driver Automatic Retry

    of Server Errors 36 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  38. © 2015 DataStax, All Rights Reserved. Application Driver Automatic Retry

    of Server Errors 36 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  39. © 2015 DataStax, All Rights Reserved. 37 Coordinator Node Replica

    Replica Replica Node Business Logic Driver Application Unreachable Consistency
  40. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica

    Node 38 Replica Business Logic Driver Application Read / Write Timeout Error
  41. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica

    Node 38 Replica Business Logic Driver Application Read / Write Timeout Error
  42. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica

    Node 38 Replica Business Logic Driver Application Read / Write Timeout Error read / write timeout
  43. © 2015 DataStax, All Rights Reserved. 39 Coordinator Node Replica

    Replica Replica Node Business Logic Driver Application Unavailable Error
  44. © 2015 DataStax, All Rights Reserved. 39 Coordinator Node Replica

    Replica Replica Node Business Logic Driver Application Unavailable Error unavailable
  45. © 2015 DataStax, All Rights Reserved. Datacenter Datacenter Multiple Addresses

    42 Node Node Node Node Client Client Node Node Node Node Client Client Within Datacenter:
 Private IPs Across Datacenters:
 Public IPs
  46. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution

    43 Application Thread Application Thread Application Thread Client Cluster
  47. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution

    43 Application Thread Node Cluster Application Thread Application Thread Client Cluster Address Resolution Policy
  48. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution

    43 Application Thread Node Cluster Application Thread Application Thread Client Cluster Node Node Node Address Resolution Policy Control Connection
  49. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution

    43 Application Thread Node Cluster Application Thread Application Thread Client Cluster Node Node Node Address Resolution Policy Control Connection
  50. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution

    43 Application Thread Node Pool Cluster Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Address Resolution Policy Control Connection Session
  51. © 2015 DataStax, All Rights Reserved. More • Request Tracing

    • Execution Information • which node was used, # retries for query, etc. • State Listeners • node goes down/comes up, schema changes, etc. • Result Paging • SSL and Authentication 45