$30 off During Our Annual Pro Sale. View Details »

Apache Cassandra and Drivers

Apache Cassandra and Drivers

From Cassandra user group meetup:

This is a two part talk in which we'll go over the architecture that enables Apache Cassandra’s linear scalability as well as how DataStax Drivers are able to take full advantage of it to provide developers with nicely designed and speedy clients extendable to the core.

In the first part of this talk, Bulat will demystify Cassandra terms like replica, coordinator, replication factor, token map, conflict resolution and consistency level. And after that, Sandeep will teach you everything there is to know about various features of the DataStax Drivers, such as asynchronous APIs, futures, error handling, load balancing and retry policies, address resolution, schema metadata and host state listeners.

Bulat Shakirzyanov

January 28, 2016
Tweet

More Decks by Bulat Shakirzyanov

Other Decks in Programming

Transcript

  1. Apache Cassandra and Drivers
    Overview of Apache Cassandra and DataStax Drivers
    Bulat Shakirzyanov
    @avalanche123
    Sandeep Tamhankar
    @stamhankar999
    https://goo.gl/cBsRVv

    View Slide

  2. Introduction
    Cassandra Overview

    View Slide

  3. © 2015 DataStax, All Rights Reserved.
    Datacenter Datacenter
    Cassandra Topology
    3
    Node
    Node
    Node
    Node
    Client Client
    Node
    Node
    Node
    Node
    Client Client
    Cluster

    View Slide

  4. © 2015 DataStax, All Rights Reserved.
    Datacenter Datacenter
    Request Coordinator
    4
    Node
    Node
    Node
    Node
    Client Client
    Node
    Node
    Coordinator
    Node
    Client Client
    Coordinator node:
    Forwards requests
    to corresponding replicas

    View Slide

  5. © 2015 DataStax, All Rights Reserved.
    Datacenter
    Row Replica
    5
    Replica
    Node
    Node
    Replica
    Client Client
    Datacenter
    Node
    Node
    Replica
    Client Client
    Coordinator
    Replica node:
    Stores a slice of total rows
    of each keyspace

    View Slide

  6. © 2015 DataStax, All Rights Reserved.
    Token Ring
    6
    12
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11

    View Slide

  7. © 2015 DataStax, All Rights Reserved.
    Token Ring
    6
    -263 … (+263 - 1)
    Murmur3 Partitioner

    View Slide

  8. © 2015 DataStax, All Rights Reserved.
    Token Ring
    6
    Node
    11…12
    Node
    12…1
    Node
    1…2
    Node
    2…3
    Node
    3…4
    Node
    4…5
    Node
    5…6
    Node
    6…7
    Node
    7…8
    Node
    8…9
    Node
    9…10
    Node
    10…11
    -263 … (+263 - 1)
    Murmur3 Partitioner

    View Slide

  9. © 2015 DataStax, All Rights Reserved.
    Keyspaces
    7
    CREATE KEYSPACE default WITH replication = {
    'class': 'SimpleStrategy',
    'replication_factor': 3
    }

    View Slide

  10. © 2015 DataStax, All Rights Reserved.
    C*
    Data Partitioning
    8
    Keyspace
    Row
    token(PK) = 1
    RF = 3
    Partitioner:
    Gets a token by hashing
    the primary key of a row

    View Slide

  11. © 2015 DataStax, All Rights Reserved.
    C*
    Replication Strategy
    9
    Keyspace 1
    Row
    RF = 3
    Replication strategy:
    Determines the first
    replica for the row
    token(PK) = 1

    View Slide

  12. © 2015 DataStax, All Rights Reserved.
    C*
    Replication Factor
    10
    Keyspace
    Row
    RF = 3
    Replication factor:
    Specifies total number of
    replicas for each row
    token(PK) = 1

    View Slide

  13. © 2015 DataStax, All Rights Reserved.
    Coordinator
    Node Replica
    Replica
    Node
    11
    Replica
    Application
    Consistency Level
    RF = 3, CL = Quorum

    View Slide

  14. © 2015 DataStax, All Rights Reserved.
    Coordinator
    Node Replica
    Replica
    Node
    11
    Replica
    Application
    Consistency Level
    RF = 3, CL = Quorum
    INSERT

    View Slide

  15. © 2015 DataStax, All Rights Reserved.
    Coordinator
    Node Replica
    Replica
    Node
    11
    Replica
    Application
    Consistency Level
    RF = 3, CL = Quorum
    INSERT

    View Slide

  16. © 2015 DataStax, All Rights Reserved.
    Coordinator
    Node Replica
    Replica
    Node
    11
    Replica
    Application
    Consistency Level
    RF = 3, CL = Quorum
    INSERT

    View Slide

  17. © 2015 DataStax, All Rights Reserved.
    Coordinator
    Node Replica
    Replica
    Node
    11
    Replica
    Application
    Consistency Level
    RF = 3, CL = Quorum
    INSERT

    View Slide

  18. DataStax Drivers
    Smart clients for Apache Cassandra

    View Slide

  19. © 2015 DataStax, All Rights Reserved.
    Goals of DataStax Drivers
    • Consistent set of features across languages
    • Asynchronous execution of requests
    • Load balancing
    • Fault tolerant
    • Address Resolution (multi-region!)
    • Automatic cluster discovery and reconnection
    • Flexible to the core
    • Consistent terminology
    • Open source
    13

    View Slide

  20. © 2015 DataStax, All Rights Reserved.
    14

    View Slide

  21. Asynchronous Execution
    IO Reactor, Request Pipelining and Future Composition

    View Slide

  22. © 2015 DataStax, All Rights Reserved.
    Asynchronous Core
    16
    Application Thread
    Business Logic
    Driver
    Background Thread
    IO Reactor

    View Slide

  23. © 2015 DataStax, All Rights Reserved.
    Request Pipelining
    17
    Client
    Without
    Request Pipelining
    Server
    Client Server
    With
    Request Pipelining
    1
    2
    2
    3
    1
    3
    1
    2
    3
    1
    2
    3

    View Slide

  24. © 2015 DataStax, All Rights Reserved.
    What is a Future?
    • Represents the result of an asynchronous operation
    • Returned by any *_async method in the Ruby driver
    • execute_async
    • prepare_async
    • Will block if asked for the true result
    18

    View Slide

  25. © 2015 DataStax, All Rights Reserved.
    Future Composition
    19
    select_user = session.prepare('SELECT * FROM users WHERE id = ?')
    select_page = session.prepare('SELECT * FROM pages WHERE slug = ?')
    user_ids = [1, 2, 3, 4]
    futures = user_ids.map do |id|
    future = session.execute_async(select_user, arguments: [id])
    future.then do |users|
    user = users.first
    future = session.execute_async(select_page, arguments: [user['username']])
    future.then do |pages|
    page = pages.first
    User.new(user, Page.new(page))
    end
    end
    end
    Cassandra::Future.all(futures).get

    View Slide

  26. © 2015 DataStax, All Rights Reserved.
    select_user = session.prepare('SELECT * FROM users WHERE id = ?')
    select_page = session.prepare('SELECT * FROM pages WHERE slug = ?')
    user_ids = [1, 2, 3, 4]
    futures = user_ids.map do |id|
    future = session.execute_async(select_user, arguments: [id])
    future.then do |users|
    user = users.first
    future = session.execute_async(select_page, arguments: [user['username']])
    future.then do |pages|
    page = pages.first
    User.new(user, Page.new(page))
    end
    end
    end
    Cassandra::Future.all(futures).get
    Future Composition
    20

    View Slide

  27. © 2015 DataStax, All Rights Reserved.
    select_user = session.prepare('SELECT * FROM users WHERE id = ?')
    select_page = session.prepare('SELECT * FROM pages WHERE slug = ?')
    user_ids = [1, 2, 3, 4]
    futures = user_ids.map do |id|
    future = session.execute_async(select_user, arguments: [id])
    future.then do |users|
    user = users.first
    future = session.execute_async(select_page, arguments: [user['username']])
    future.then do |pages|
    page = pages.first
    User.new(user, Page.new(page))
    end
    end
    end
    Cassandra::Future.all(futures).get
    Future Composition
    21

    View Slide

  28. © 2015 DataStax, All Rights Reserved.
    select_user = session.prepare('SELECT * FROM users WHERE id = ?')
    select_page = session.prepare('SELECT * FROM pages WHERE slug = ?')
    user_ids = [1, 2, 3, 4]
    futures = user_ids.map do |id|
    future = session.execute_async(select_user, arguments: [id])
    future.then do |users|
    user = users.first
    future = session.execute_async(select_page, arguments: [user['username']])
    future.then do |pages|
    page = pages.first
    User.new(user, Page.new(page))
    end
    end
    end
    Cassandra::Future.all(futures).get
    Future Composition
    22

    View Slide

  29. © 2015 DataStax, All Rights Reserved.
    select_user = session.prepare('SELECT * FROM users WHERE id = ?')
    select_page = session.prepare('SELECT * FROM pages WHERE slug = ?')
    user_ids = [1, 2, 3, 4]
    futures = user_ids.map do |id|
    future = session.execute_async(select_user, arguments: [id])
    future.then do |users|
    user = users.first
    future = session.execute_async(select_page, arguments: [user['username']])
    future.then do |pages|
    page = pages.first
    User.new(user, Page.new(page))
    end
    end
    end
    Cassandra::Future.all(futures).get
    Future Composition
    23

    View Slide

  30. © 2015 DataStax, All Rights Reserved.
    select_user = session.prepare('SELECT * FROM users WHERE id = ?')
    select_page = session.prepare('SELECT * FROM pages WHERE slug = ?')
    user_ids = [1, 2, 3, 4]
    futures = user_ids.map do |id|
    future = session.execute_async(select_user, arguments: [id])
    future.then do |users|
    user = users.first
    future = session.execute_async(select_page, arguments: [user['username']])
    future.then do |pages|
    page = pages.first
    User.new(user, Page.new(page))
    end
    end
    end
    Cassandra::Future.all(futures).get
    Future Composition
    24

    View Slide

  31. © 2015 DataStax, All Rights Reserved.
    Future Composition
    25
    [# ... >, ... ]

    View Slide

  32. © 2015 DataStax, All Rights Reserved.
    Pop Quiz: How to make this faster?
    26
    select_user = session.prepare('SELECT * FROM users WHERE id = ?')
    select_page = session.prepare('SELECT * FROM pages WHERE slug = ?')
    user_ids = [1, 2, 3, 4]
    futures = user_ids.map do |id|
    future = session.execute_async(select_user, arguments: [id])
    future.then do |users|
    user = users.first
    future = session.execute_async(select_page,
    arguments: [user['username']])
    future.then do |pages|
    page = pages.first
    User.new(user, Page.new(page))
    end
    end
    end
    Cassandra::Future.all(futures).get

    View Slide

  33. © 2015 DataStax, All Rights Reserved.
    Pop Quiz: How to make this faster?
    27
    user_future = session.prepare_async(‘SELECT * FROM users WHERE id = ?')
    page_future = session.prepare_async(‘SELECT * FROM pages WHERE slug = ?’)
    user_ids = [1, 2, 3, 4]
    futures = user_ids.map do |id|
    future = session.execute_async(user_future.get, arguments: [id])
    future.then do |users|
    user = users.first
    future = session.execute_async(page_future.get,
    arguments: [user['username']])
    future.then do |pages|
    page = pages.first
    User.new(user, Page.new(page))
    end
    end
    end
    Cassandra::Future.all(futures).get

    View Slide

  34. Load Balancing
    Principles and Implementations

    View Slide

  35. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Load Balancing
    29
    Application
    Thread
    Node
    Pool
    Session
    Pool
    Pool
    Pool
    Application
    Thread
    Application
    Thread
    Client Cluster
    Node
    Node
    Node
    Load Balancing
    Policy

    View Slide

  36. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Load Balancing
    29
    Application
    Thread
    Node
    Pool
    Session
    Pool
    Pool
    Pool
    Application
    Thread
    Application
    Thread
    Client Cluster
    Node
    Node
    Node
    Load Balancing
    Policy

    View Slide

  37. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Load Balancing
    29
    Application
    Thread
    Node
    Pool
    Session
    Pool
    Pool
    Pool
    Application
    Thread
    Application
    Thread
    Client Cluster
    Node
    Node
    Node
    Load Balancing
    Policy

    View Slide

  38. © 2015 DataStax, All Rights Reserved.
    Datacenter
    Datacenter
    DataCenter Aware Balancing
    30
    Node
    Node
    Node
    Client
    Node
    Node
    Node
    Client
    Client
    Client
    Client
    Client
    Local nodes are queried
    first, if none are available,
    the request could be
    sent to a remote node.

    View Slide

  39. © 2015 DataStax, All Rights Reserved.
    Token Aware Balancing
    31
    Route request
    directly to Replicas
    Node
    Node
    Replica
    Node
    Client
    Replica
    Replica
    Uses prepared statement
    metadata to get the token

    View Slide

  40. © 2015 DataStax, All Rights Reserved.
    Other built-in policies
    • Round Robin Policy
    • ignores topology
    • White List Policy
    • only connect with certain hosts
    32

    View Slide

  41. Fault Tolerance
    Sources of Failure and Error Handling

    View Slide

  42. © 2015 DataStax, All Rights Reserved.
    Fault Tolerance
    34
    Coordinator
    Node Replica
    Replica
    Replica
    Node
    Business Logic
    Driver
    Application

    View Slide

  43. © 2015 DataStax, All Rights Reserved. 35
    Coordinator
    Node Replica
    Replica
    Replica
    Node
    Business Logic
    Driver
    Application
    Invalid Requests
    Network Timeouts
    Server Errors
    Possible Failures

    View Slide

  44. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Automatic Retry of Server Errors
    36
    Application
    Thread
    Node
    Pool
    Session
    Pool
    Pool
    Pool
    Application
    Thread
    Application
    Thread
    Client Cluster
    Node
    Node
    Node
    Load Balancing
    Policy

    View Slide

  45. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Automatic Retry of Server Errors
    36
    Application
    Thread
    Node
    Pool
    Session
    Pool
    Pool
    Pool
    Application
    Thread
    Application
    Thread
    Client Cluster
    Node
    Node
    Node
    Load Balancing
    Policy

    View Slide

  46. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Automatic Retry of Server Errors
    36
    Application
    Thread
    Node
    Pool
    Session
    Pool
    Pool
    Pool
    Application
    Thread
    Application
    Thread
    Client Cluster
    Node
    Node
    Node
    Load Balancing
    Policy

    View Slide

  47. © 2015 DataStax, All Rights Reserved. 37
    Coordinator
    Node Replica
    Replica
    Replica
    Node
    Business Logic
    Driver
    Application
    Unreachable Consistency

    View Slide

  48. © 2015 DataStax, All Rights Reserved.
    Coordinator
    Node Replica
    Replica
    Node
    38
    Replica
    Business Logic
    Driver
    Application
    Read / Write Timeout Error

    View Slide

  49. © 2015 DataStax, All Rights Reserved.
    Coordinator
    Node Replica
    Replica
    Node
    38
    Replica
    Business Logic
    Driver
    Application
    Read / Write Timeout Error

    View Slide

  50. © 2015 DataStax, All Rights Reserved.
    Coordinator
    Node Replica
    Replica
    Node
    38
    Replica
    Business Logic
    Driver
    Application
    Read / Write Timeout Error
    read / write timeout

    View Slide

  51. © 2015 DataStax, All Rights Reserved. 39
    Coordinator
    Node Replica
    Replica
    Replica
    Node
    Business Logic
    Driver
    Application
    Unavailable Error

    View Slide

  52. © 2015 DataStax, All Rights Reserved. 39
    Coordinator
    Node Replica
    Replica
    Replica
    Node
    Business Logic
    Driver
    Application
    Unavailable Error
    unavailable

    View Slide

  53. © 2015 DataStax, All Rights Reserved. 40
    Error Handling

    View Slide

  54. Address Resolution
    Topology Aware Client

    View Slide

  55. © 2015 DataStax, All Rights Reserved.
    Datacenter Datacenter
    Multiple Addresses
    42
    Node
    Node
    Node
    Node
    Client Client
    Node
    Node
    Node
    Node
    Client Client
    Within Datacenter:

    Private IPs
    Across Datacenters:

    Public IPs

    View Slide

  56. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Address Resolution
    43
    Application
    Thread
    Application
    Thread
    Application
    Thread
    Client Cluster

    View Slide

  57. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Address Resolution
    43
    Application
    Thread Node
    Cluster
    Application
    Thread
    Application
    Thread
    Client Cluster
    Address
    Resolution Policy

    View Slide

  58. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Address Resolution
    43
    Application
    Thread Node
    Cluster
    Application
    Thread
    Application
    Thread
    Client Cluster
    Node
    Node
    Node
    Address
    Resolution Policy
    Control Connection

    View Slide

  59. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Address Resolution
    43
    Application
    Thread Node
    Cluster
    Application
    Thread
    Application
    Thread
    Client Cluster
    Node
    Node
    Node
    Address
    Resolution Policy
    Control Connection

    View Slide

  60. © 2015 DataStax, All Rights Reserved.
    Application Driver
    Address Resolution
    43
    Application
    Thread Node
    Pool
    Cluster
    Pool
    Pool
    Pool
    Application
    Thread
    Application
    Thread
    Client Cluster
    Node
    Node
    Node
    Address
    Resolution Policy
    Control Connection
    Session

    View Slide

  61. © 2015 DataStax, All Rights Reserved.
    EC2 Multi-Region Address Resolution
    44

    View Slide

  62. © 2015 DataStax, All Rights Reserved.
    More
    • Request Tracing
    • Execution Information
    • which node was used, # retries for query, etc.
    • State Listeners
    • node goes down/comes up, schema changes, etc.
    • Result Paging
    • SSL and Authentication
    45

    View Slide

  63. Questions

    View Slide