Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Apache Cassandra is distributed masterless column store database which is becoming mainstream for analytics and IoT data.

https://www.bigdataspain.org/2017/talk/tuning-java-driver-for-apache-cassandra

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

December 01, 2017
Tweet

Transcript

  1. None
  2. None
  3. Tuning Java Driver for Apache Cassandra November 2017 Nenad Bozic

    @NenadBozicNs nenad.bozic@smartcat. SmartCat www.smartcat. io
  4. When people start with Apache Cassandra

  5. When people call us for help

  6. Agenda • intro to Apache Cassandra • tuning options in

    driver • use cases • takeaways and Q&A
  7. Apache Cassandra

  8. Cassandra Overview • partitioned data with tunable consistency • replication

    factor - how many replicas • masterless architecture • native multi-datacenter support
  9. Architecture Client contact

  10. Architecture Client request Consistency level 1 Replication factor 3

  11. Architecture Client request response Consistency level 1 Replication factor 3

  12. Architecture DC1 DC2 Cluster

  13. Data Modeling • query based modeling • data is denormalized

    • data is duplicated
  14. Use Cases • when high availability is crucial, and eventual

    consistency is tolerable • event sourcing • logging continuous streams of data • deep visitor analytics • early prototyping with significant query changes • referential integrity required • dynamic access patterns on data
  15. Tuning options in driver

  16. Drivers for Apache Cassandra

  17. Load balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

  18. Data Center Aware Load Balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

  19. Toke Aware Load Balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

  20. Latency Aware Load Balancing

  21. Pooling options • driver communicates with cluster with pool of

    connections • changed between V2 and V3 version of protocol (core lowered to 1) • going for more requests on connection can put more load to cluster • add monitoring of in flight queries on driver side and tune for your use case
  22. Pooling options

  23. Speculative executions • spawn additional queries to other nodes after

    configured time http://docs.datastax.com/en/developer/java-driver/3.1/manual/speculative_execution/
  24. Speculative executions • constant speculative execution policy • percentile speculative

    execution policy
  25. Timeouts • driver read timeout vs server read timeout •

    driver settings for all queries or per query settings • setReadTimeoutMillis and setConnectionTimeoutMillis
  26. Retry policies • fail early and retry • add retry

    policy or speculative execution • downgrading retry policy if inconsistent data is more important than no data
  27. Use cases

  28. Click stream and IoT measurements • visualize measurements from many

    devices • fast access with tolerable inconsistencies • DC aware and token aware policy to land on local node with data • lower consistency level (ONE) or use downgrading retry policy • use speculative executions to query more nodes if cluster can manage load
  29. Mission critical data with tolerable performance • stock data in

    warehouse used to compare with ERP system • high consistency (read + write > replication factor) • retry and reconnect policy is a must • choose lower requests per connection numbers not to overload cluster • set lower read timeout to fail early and retry
  30. Write heavy low latency read use case • ad serving

    (store user analytics and serve ads fast) • separate read and write for different tuning options • latency aware policy on reads to choose always fast performing nodes • lower down read timeout on driver and server to fail early • increase maximum requests per connection
  31. Conclusion

  32. Conclusion and take aways • know your use case and

    know your database • each tuning options requires good monitoring TEST ADJUST MEASURE
  33. Links • SmartCat Blog post - Tuning Java driver for

    Apache Cassandra - part 1 • SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 2 • Use case example - Tuning for heavy write and low latency read scenario
  34. Q&A

  35. Thank you Nenad Bozic @NenadBozic Ns SmartCat www.smartcat.i o