Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Apache Cassandra is distributed masterless column store database which is becoming mainstream for analytics and IoT data.

https://www.bigdataspain.org/2017/talk/tuning-java-driver-for-apache-cassandra

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Big Data Spain

December 01, 2017
Tweet

More Decks by Big Data Spain

Other Decks in Technology

Transcript

  1. Tuning Java Driver for Apache Cassandra November 2017 Nenad Bozic

    @NenadBozicNs nenad.bozic@smartcat. SmartCat www.smartcat. io
  2. Agenda • intro to Apache Cassandra • tuning options in

    driver • use cases • takeaways and Q&A
  3. Cassandra Overview • partitioned data with tunable consistency • replication

    factor - how many replicas • masterless architecture • native multi-datacenter support
  4. Use Cases • when high availability is crucial, and eventual

    consistency is tolerable • event sourcing • logging continuous streams of data • deep visitor analytics • early prototyping with significant query changes • referential integrity required • dynamic access patterns on data
  5. Pooling options • driver communicates with cluster with pool of

    connections • changed between V2 and V3 version of protocol (core lowered to 1) • going for more requests on connection can put more load to cluster • add monitoring of in flight queries on driver side and tune for your use case
  6. Speculative executions • spawn additional queries to other nodes after

    configured time http://docs.datastax.com/en/developer/java-driver/3.1/manual/speculative_execution/
  7. Timeouts • driver read timeout vs server read timeout •

    driver settings for all queries or per query settings • setReadTimeoutMillis and setConnectionTimeoutMillis
  8. Retry policies • fail early and retry • add retry

    policy or speculative execution • downgrading retry policy if inconsistent data is more important than no data
  9. Click stream and IoT measurements • visualize measurements from many

    devices • fast access with tolerable inconsistencies • DC aware and token aware policy to land on local node with data • lower consistency level (ONE) or use downgrading retry policy • use speculative executions to query more nodes if cluster can manage load
  10. Mission critical data with tolerable performance • stock data in

    warehouse used to compare with ERP system • high consistency (read + write > replication factor) • retry and reconnect policy is a must • choose lower requests per connection numbers not to overload cluster • set lower read timeout to fail early and retry
  11. Write heavy low latency read use case • ad serving

    (store user analytics and serve ads fast) • separate read and write for different tuning options • latency aware policy on reads to choose always fast performing nodes • lower down read timeout on driver and server to fail early • increase maximum requests per connection
  12. Conclusion and take aways • know your use case and

    know your database • each tuning options requires good monitoring TEST ADJUST MEASURE
  13. Links • SmartCat Blog post - Tuning Java driver for

    Apache Cassandra - part 1 • SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 2 • Use case example - Tuning for heavy write and low latency read scenario
  14. Q&A