Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cassandra To Infinity And Beyond

Cassandra To Infinity And Beyond

How Teads scale with Apache Cassandra.
Internet scale means tons of data, read heavy workload, massive data ingestion and low latency.
The French AdTech company Teads uses Cassandra massively, a reliable and performant Open Source database.
Spawning Cassandra nodes in AWS is a piece of cake with Terraform and Chef.

Romain Hardouin

September 25, 2019
Tweet

More Decks by Romain Hardouin

Other Decks in Technology

Transcript

  1. French AdTech AWS / GCP Scala / JS / Go

    Machine learning Docker / CoreOS Terraform / Chef / Debian Cassandra / Kafka / MySQL / Redis Spark / Flink
  2. Workload Different kinds of workloads but all of them are

    latency sensitive Internet scale Massive amount of data ingested from partners We also create lots of data by ourselves No more analytics • Tons of business critical counters • Time series • TTL, TTL, TTL
  3. Gimme some figures! 250 nodes Mostly ephemeral data 28 TB

    100 billions keys 3 regions Regional vs Worldwide 21 DCs
  4. Why not EBS? No more EBS Cheap storage, great for

    STCS Snapshots (S3 backup) No coupling between disks and CPU/RAM High latency, high I/O wait Throughput: 160 MB/s Unsteady performances
  5. Ops

  6. Monitoring Ratio cross DCs/Clusters to grasp workloads Examples: • R/W

    Spread: ( (max qps -min qps )/max qps )*100 • P99/95 jitter factor: ( P99 - P95 ) / P99 • Memory cached / disk ratio
  7. Monitoring YAML Configuration - include: bean_regex: org.apache.cassandra.metrics:type=ReadRepair,name=.* attribute: - Count

    - include: bean: org.apache.cassandra.metrics:type=CommitLog,name=TotalCommitLogSize
  8. Alerting Down node Exceptions Commitlog size High latency High pendings

    tasks Many hints Clock out of sync IO Wait Disk space ...
  9. Back in 2016 Old analytics data model: How do you

    repair 45TB of data within gc grace period?
  10. Fork motivation 1. Need to add a patch ASAP High

    Blocked NTR CASSANDRA-11363 2. Why not backport interesting tickets? 3. Why not add small features/fixes? Expose tasks queue length via JMX CASSANDRA-12758
  11. Custom feature: Securing a legacy cluster without any downtime Allow

    to create roles prior to use standard auth stack • AllowAnyCredentialsAuthenticator ◦ "User '{}' has been authenticated without password checking" • UnsecureAutoLoginCassandraRoleManager ◦ Allow any non-existing role to login: "Auto login of role '{}'" ◦ If a role exists standard checking will be done. • UnsecureCassandraAuthorizer ◦ Authorize DML on any resource for any authenticated user. ◦ Super users will be given all permissions i.e. DML, DDL and DCL. • No spam logs
  12. Fix Merkle tree size calculation “How do you repair 45TB

    of data within gc grace period?” For a subset of tables: • Before: 23 days • With CASSANDRA-12580: 16 hours
  13. Can’t upgrade? Quick reaction High Blocked NTR ticket has been

    released in 2.1 but you need to wait after a release Stability, reliability Critical counters e.g. CASSANDRA-14958
  14. References cassandra.apache.org thelastpickle.com Buzz Lightyear is a character in the

    Toy Story franchise. © Disney/Pixar Back to the Future is a trademark and copyright of Universal Studios and U-Drive Joint Venture. Licensed by Universal Studios. All Rights Reserved. Crown icon designed by Good Ware from Flaticon www.chef.io www.datadoghq.com www.packer.io www.rundeck.com www.terraform.io