Key Value Value Value
Column A Column B Column C
Timestamp Timestamp Timestamp
Slide 33
Slide 33 text
CREATE TABLE IF NOT EXISTS account (
id text,
userid text,
created timestamp,
currency text,
country text,
description text,
type text,
PRIMARY KEY ((id))
);
Slide 34
Slide 34 text
Key Value Value Value
Column A Column B Column C
Timestamp Timestamp Timestamp
CREATE TABLE IF NOT EXISTS accounts_by_userid (
id text,
userid text,
created timestamp,
currency text,
country text,
description text,
type text,
PRIMARY KEY ((userid), id)
);
Slide 37
Slide 37 text
Partition
Key
Row 1 Row 2 Row 3
Slide 38
Slide 38 text
Partition
Key
Row 1 Row 2 Row 3
Slide 39
Slide 39 text
No content
Slide 40
Slide 40 text
Partition Key
Slide 41
Slide 41 text
Partition
Key
Row 1 Row 2 Row 3
Slide 42
Slide 42 text
Partition
Key
Row 1 Row infinity
Slide 43
Slide 43 text
User ID Account 1 Account 2 Account 3
No-one has infinite accounts!
Slide 44
Slide 44 text
CREATE TABLE IF NOT EXISTS transaction (
id text,
accountid text,
created timestamp,
currency text,
amount bigint,
description text,
PRIMARY KEY ((id))
);
Slide 45
Slide 45 text
Column B Column C
Transaction
ID
Column A
Slide 46
Slide 46 text
Column B Column C
Transaction
ID
Column A
Must know primary key — can’t iterate
Slide 47
Slide 47 text
CREATE TABLE IF NOT EXISTS transaction_by_account (
id text,
accountid text,
created timestamp,
currency text,
amount bigint,
description text,
PRIMARY KEY ((accountid), id)
);
Slide 48
Slide 48 text
Account
ID
Transaction 1
Transaction
infinity
Slide 49
Slide 49 text
Timeseries:
Partition by Time ⏱
Slide 50
Slide 50 text
Partition 2 3 4 5 7
Slide 51
Slide 51 text
Bucket 2 3 4 5 7
Composite Partition Key
(Time range and Account ID)
PRIMARY KEY ((accountid, timebucket), created, id)
Slide 52
Slide 52 text
Day 1 2 3 4 5 7
Composite Partition Key
(Time range and Account ID)
PRIMARY KEY ((accountid, timebucket), created, id)
Day 1 2 3 4 5 7
9 12 15 16 17 18 19 20
Day 2 21 22 24
23
Hot Partition Key
Slide 61
Slide 61 text
eu-west-1a eu-west-1b eu-west-1c
Partition Key = Day 1
Slide 62
Slide 62 text
eu-west-1a eu-west-1b eu-west-1c
Partition Key = Day 2
Slide 63
Slide 63 text
Is your data predictable?
How do you choose your bucket size?
Ensure correct partitioning!
Slide 64
Slide 64 text
Flakeseries:
Partition by Time ⏱
Retrieve by ID
Slide 65
Slide 65 text
Bucket 2 3 4 5 7
Composite Partition Key
(Time range and Account ID)
PRIMARY KEY ((accountid, timebucket), created, id)
Slide 66
Slide 66 text
Bucket 2 3 4 5 7
Time range:
We need to know the timestamp to read
PRIMARY KEY ((accountid, timebucket), created, id)
Slide 67
Slide 67 text
Bucket 2 3 4 5 7
Time range:
We need to know the timestamp to read
PRIMARY KEY ((accountid, timebucket), created, id)
accountid = acc_00009Wd3Yeh2O329bFTVHF
Slide 68
Slide 68 text
accountid = acc_00009Wd3Yeh2O329bFTVHF
Slide 69
Slide 69 text
accountid = acc_00009Wd3Yeh2O329bFTVHF
Flake IDs = Time based lexically sortable IDs
Slide 70
Slide 70 text
accountid = acc_00009Wd3Yeh2O329bFTVHF
Flake IDs = Time based lexically sortable IDs
Base62 encoded 128bit Int
eg 26341991268378369512474991263748
Slide 71
Slide 71 text
accountid = acc_00009Wd3Yeh2O329bFTVHF
Flake IDs = Time based lexically sortable IDs
Base62 encoded 128bit Int
eg 26341991268378369512474991263748
64 bits - Time in ms since epoch
48 bits - Worker ID
16 bits - Sequence ID
Time Window Compaction Strategy
sstables dropped once all data has expired
Bucket 1
Bucket 2
9 12 15 16 17 18 19 20
Bucket 3 21 22 24
23
2 3 4 5 7
9 12 15 16 17 18 19 20 21 22 24
23
Slide 86
Slide 86 text
Time Window Compaction Strategy
sstables dropped once all data has expired
Bucket 1
Bucket 2
9 12 15 16 17 18 19 20
Bucket 3 21 22 24
23
2 3 4 5 7
9 12 15 16 17 18 19 20 21 22 24
23
Slide 87
Slide 87 text
Time Window Compaction Strategy
sstables dropped once all data has expired
Bucket 1
Bucket 2
9 12 15 16 17 18 19 20
Bucket 3 21 22 24
23
2 3 4 5 7
9 12 15 16 17 18 19 20 21 22 24
23
Slide 88
Slide 88 text
Data modelling take aways
- Correct data modelling is incredibly important!
- Wide rows are ok to a point
- Repairs on wide rows are problematic
- Make Timeseries buckets predictable
- Watch for Hot Keys!
- TTLs don’t always mean your data is deleted
Slide 89
Slide 89 text
Nov
2015
Mar
2019
What works here
Might not work here