Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Tuning Java Driver for Apache Cassandra November 2017 Nenad Bozic
@NenadBozicNs nenad.bozic@smartcat. SmartCat www.smartcat. io

When people start with Apache Cassandra

When people call us for help

Agenda • intro to Apache Cassandra • tuning options in
driver • use cases • takeaways and Q&A

Apache Cassandra

Cassandra Overview • partitioned data with tunable consistency • replication
factor - how many replicas • masterless architecture • native multi-datacenter support

Architecture Client contact

Architecture Client request Consistency level 1 Replication factor 3

Architecture Client request response Consistency level 1 Replication factor 3

Architecture DC1 DC2 Cluster

Data Modeling • query based modeling • data is denormalized
• data is duplicated

Use Cases • when high availability is crucial, and eventual
consistency is tolerable • event sourcing • logging continuous streams of data • deep visitor analytics • early prototyping with significant query changes • referential integrity required • dynamic access patterns on data

Tuning options in driver

Drivers for Apache Cassandra

Load balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

Data Center Aware Load Balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

Toke Aware Load Balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

Latency Aware Load Balancing

Pooling options • driver communicates with cluster with pool of
connections • changed between V2 and V3 version of protocol (core lowered to 1) • going for more requests on connection can put more load to cluster • add monitoring of in flight queries on driver side and tune for your use case

Pooling options

Speculative executions • spawn additional queries to other nodes after
configured time http://docs.datastax.com/en/developer/java-driver/3.1/manual/speculative_execution/

Speculative executions • constant speculative execution policy • percentile speculative
execution policy

Timeouts • driver read timeout vs server read timeout •
driver settings for all queries or per query settings • setReadTimeoutMillis and setConnectionTimeoutMillis

Retry policies • fail early and retry • add retry
policy or speculative execution • downgrading retry policy if inconsistent data is more important than no data

Use cases

Click stream and IoT measurements • visualize measurements from many
devices • fast access with tolerable inconsistencies • DC aware and token aware policy to land on local node with data • lower consistency level (ONE) or use downgrading retry policy • use speculative executions to query more nodes if cluster can manage load

Mission critical data with tolerable performance • stock data in
warehouse used to compare with ERP system • high consistency (read + write > replication factor) • retry and reconnect policy is a must • choose lower requests per connection numbers not to overload cluster • set lower read timeout to fail early and retry

Write heavy low latency read use case • ad serving
(store user analytics and serve ads fast) • separate read and write for different tuning options • latency aware policy on reads to choose always fast performing nodes • lower down read timeout on driver and server to fail early • increase maximum requests per connection

Conclusion

Conclusion and take aways • know your use case and
know your database • each tuning options requires good monitoring TEST ADJUST MEASURE

Links • SmartCat Blog post - Tuning Java driver for
Apache Cassandra - part 1 • SmartCat Blog post - Tuning Java driver for Apache Cassandra - part 2 • Use case example - Tuning for heavy write and low latency read scenario

Thank you Nenad Bozic @NenadBozic Ns SmartCat www.smartcat.i o

Tuning Java Driver for Apache Cassandra by Nena...

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Big Data Spain

More Decks by Big Data Spain

Other Decks in Technology

Featured

Transcript

Tuning Java Driver for Apache Cassandra November 2017 Nenad Bozic

When people start with Apache Cassandra

When people call us for help

Agenda • intro to Apache Cassandra • tuning options in

Apache Cassandra

Cassandra Overview • partitioned data with tunable consistency • replication

Architecture Client contact

Architecture Client request Consistency level 1 Replication factor 3

Architecture Client request response Consistency level 1 Replication factor 3

Architecture DC1 DC2 Cluster

Data Modeling • query based modeling • data is denormalized

Use Cases • when high availability is crucial, and eventual

Tuning options in driver

Drivers for Apache Cassandra

Load balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

Data Center Aware Load Balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

Toke Aware Load Balancing https://www.slideshare.net/planetcassandra/apache-cassandra-and-drivers

Latency Aware Load Balancing

Pooling options • driver communicates with cluster with pool of

Pooling options

Speculative executions • spawn additional queries to other nodes after

Speculative executions • constant speculative execution policy • percentile speculative

Timeouts • driver read timeout vs server read timeout •

Retry policies • fail early and retry • add retry

Use cases

Click stream and IoT measurements • visualize measurements from many

Mission critical data with tolerable performance • stock data in

Write heavy low latency read use case • ad serving

Conclusion

Conclusion and take aways • know your use case and

Links • SmartCat Blog post - Tuning Java driver for

Q&A

Thank you Nenad Bozic @NenadBozic Ns SmartCat www.smartcat.i o