Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cassandra advanced data modeling

Cassandra advanced data modeling

RDMS have their data modeling methodology and diagrams. What about Cassandra? Let's discover the key principles of Cassandra data modeling with the Chebotko methodology. Have a look at KDM, a Chebotko modeling tool. And finally, let's talk about the time dimension in Cassandra.

This presentation was made for the Lyon Cassandra Users meetup (France).

Romain Hardouin

May 31, 2016
Tweet

More Decks by Romain Hardouin

Other Decks in Technology

Transcript

  1. $ who Romain $ pgrep -fl work Cassandra architect $

    whatis teads No.1 Video Advertising Marketplace
  2. Know your domain Conceptual Data Model, E&R • Entities •

    Relationships • Attributes / Keys • Cardinalities • Constraints Know your data
  3. Query-driven model Application Workflow New needs? • New queries =>

    new tables • Alter table possible? Know your data Know your queries
  4. Goal: one partition per query Anti-pattern: • Table scan •

    Client joins (a.k.a multi-table) • Secondary index • Allow filtering Know your data Know your queries
  5. Nest Data Know your data Denormalize CREATE TABLE actors_by_video (

    video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name) );
  6. Duplicate data Writes are cheap: « Joins on write »

    Duplication occurs at different levels: • Table: Materialized views • Partition • Rows Know your data Denormalize
  7. From « A Big Data Modeling Methodology for Apache Cassandra

    » From « A Big Data Modeling Methodology for Apache Cassandra » Application workflow Application workflow Query workflow Query list
  8. From « A Big Data Modeling Methodology for Apache Cassandra

    » From « A Big Data Modeling Methodology for Apache Cassandra » Chebotko Diagram Chebotko Diagram
  9. actors_by_video video_id uuid K actor_name text C↑ character_name text C↑

    CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name) ); Chebotko Diagram Chebotko Diagram
  10. MR 1 Entities & Relationships MR 2 Equality search attributes

    MR 3 Inequality search attribues Chebotko mapping rules MR 5 Key attributes, uniqueness MR 4 Ordering attributes <> = ↑↓
  11. From « A Big Data Modeling Methodology for Apache Cassandra

    » From « A Big Data Modeling Methodology for Apache Cassandra » Chebotko mapping rules Chebotko mapping rules
  12. Eventually consistency No instant deletes Deletes are writes SSTables are

    immutable files Writes are spread across many files
  13. Goal: avoid to read too many* tombstones ... ... *

    see tombstone_warn_threshold & tombstone_failure_threshold
  14. UPSERTs UPSERTs Same INSERT over and over again? UPSERTs hide

    this behavior What if… one day you want to add time
  15. Resources « A Big Data Modeling Methodology for Apache Cassandra

    » - Artem Chebotko, Andrey Kashlev & Shiyong Lu - www.cs.wayne.edu/andrey/papers/TR-BIGDATA-05-2015-CKL.pdf KDM - Andrey Kashlev - kdm.dataview.org