Slide 1

Slide 1 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Distributed Materialized Views @duarte_nunes Duarte Nunes

Slide 2

Slide 2 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Duarte Nunes 2 Duarte Nunes is a Software Engineer working on ScyllaDB. He has a background in concurrent programming, distributed systems and low-latency software. Prior to ScyllaDB, he worked on distributed network virtualization.

Slide 3

Slide 3 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Agenda ▪ Introduction ▪ Problem Space ▪ Solution o Propagation o Consistency o Concurrent Updates o Building a MV ▪ Conclusions 3

Slide 4

Slide 4 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Introduction

Slide 5

Slide 5 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Materialized Views 5 ▪ A table containing a copy of the results of some query performed on a base table ▪ Updated when the underlying base table is modified ▪ Used as an index in Scylla ▪ Usually has a different PK than the base o Currently limited to including at most one base regular column o Must include the base’s PK

Slide 6

Slide 6 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Distributed Materialized Views 6 ▪ The view table is stored in different replicas than the base table ▪ No single master server to serialize updates ▪ View must be eventually consistent with the base table

Slide 7

Slide 7 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company CQL 7 ▪ CREATE TABLE buildings ( name text, city text, built int, meters int, PRIMARY KEY (name) ); ▪ CREATE MATERIALIZED VIEW building_by_city AS SELECT * FROM buildings WHERE city IS NOT NULL PRIMARY KEY(city, name);

Slide 8

Slide 8 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Problem Space

Slide 9

Slide 9 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Constraints 9 ▪ Cluster availability mustn’t be affected o Don’t stop serving base table writes ▪ Base write latency overhead must be minimized ▪ An update to a base table must propagate to the view table o Rows in the base will eventually appear in the view o Rows absent from the base will eventually be absent from the view

Slide 10

Slide 10 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); Read-before-write 10 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c);

Slide 11

Slide 11 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); ▪ p | c | v ---+---+--- 0 | 1 | 8 Read-before-write 11 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c); ▪ v | p | c ---+---+--- 8 | 0 | 1

Slide 12

Slide 12 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); ▪ p | c | v ---+---+--- 0 | 1 | 8 ▪ UPDATE TABLE base_table SET v = 10 WHERE p = 0 AND c = 1 Read-before-write 12 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c); ▪ v | p | c ---+---+--- 8 | 0 | 1

Slide 13

Slide 13 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); ▪ p | c | v ---+---+--- 0 | 1 | 8 ▪ UPDATE TABLE base_table SET v = 10 WHERE p = 0 AND c = 1 Read-before-write 13 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c); ▪ v | p | c ---+---+--- 8 | 0 | 1 ▪ Insert (10, 0, 1)

Slide 14

Slide 14 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); ▪ p | c | v ---+---+--- 0 | 1 | 8 ▪ UPDATE TABLE base_table SET v = 10 WHERE p = 0 AND c = 1 Read-before-write 14 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c); ▪ v | p | c ---+---+--- 8 | 0 | 1 ▪ Insert (10, 0, 1) ▪ Tombstone (8, 0, 1)

Slide 15

Slide 15 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v1 int, v2 int, PRIMARY KEY (p, c) ); Read-before-write 15 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v1 IS NOT NULL PRIMARY KEY(v1, p, c, v2);

Slide 16

Slide 16 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v1 int, v2 int, PRIMARY KEY (p, c) ); ▪ UPDATE TABLE base_table SET v2 = 9 WHERE p = 0 AND c = 1 Read-before-write 16 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v1 IS NOT NULL PRIMARY KEY(v1, p, c, v2);

Slide 17

Slide 17 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v1 int, v2 int, PRIMARY KEY (p, c) ); ▪ UPDATE TABLE base_table SET v2 = 9 WHERE p = 0 AND c = 1 Read-before-write 17 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v1 IS NOT NULL PRIMARY KEY(v1, p, c, v2); ▪ Update (8, 0, 1, 9)

Slide 18

Slide 18 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Concurrent updates don’t mix well with read-before-write o Two or more updates can read the same old value in the base o They would insert the same tombstone and miss each other’s updates o A view entry would have no corresponding base table entry Concurrent updates 18

Slide 19

Slide 19 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company A solution must 19 ▪ Include a mechanism for update propagation ▪ Solve consistency issues between base and view tables ▪ Concurrent updates must be safe o An algorithm should work locally and also provide emergent safety, as synchronizing a whole cluster is practically unattainable.

Slide 20

Slide 20 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Solution

Slide 21

Slide 21 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Coordinator-based propagation 21 C B V1 w(p: 1, v: 10) V2

Slide 22

Slide 22 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Coordinator-based propagation 22 C B V1 w(p: 1, v: 10) d(v: 5) w(p: 10, p: 1) V2

Slide 23

Slide 23 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Coordinator-based propagation 23 C B V1 w(p: 1, v: 10) d(v: 5) w(p: 10, p: 1) V2 d(v: 5) w(v:10, p: 1)

Slide 24

Slide 24 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Transactional guarantees? ▪ More roundtrips o Prepare and commit stages ▪ Added latency ▪ View replica availability impacting the base replica’s 24

Slide 25

Slide 25 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Replica-based propagation 25 C B V1 w(p: 1, v: 10) V2 d(v: 5) w(v:10, p: 1)

Slide 26

Slide 26 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Synchronous replica-based propagation 26 C B V1 w(p: 1, v: 10) V2 d(v: 5) w(v:10, p: 1) r: Ok r: Ok

Slide 27

Slide 27 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Asynchronous replica-based propagation 27 C B V1 w(p: 1, v: 10) V2 d(v: 5) w(v:10, p: 1) r: Ok r: Ok

Slide 28

Slide 28 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Pair one base replica to one view replica per view partition o The view replica receiving the update may not be the one receiving a tombstone Asynchronous replica-based propagation 28

Slide 29

Slide 29 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Pair one base replica to one view replica per view partition o The view replica receiving the update may not be the one receiving a tombstone ▪ View updating load distributed across base replicas Asynchronous replica-based propagation 29

Slide 30

Slide 30 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Pair one base replica to one view replica per view partition o The view replica receiving the update may not be the one receiving a tombstone ▪ View updating load distributed across base replicas ▪ In the absence of failures, RF view replicas will be updated Asynchronous replica-based propagation 30

Slide 31

Slide 31 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Base and view tables can diverge Consistency 31

Slide 32

Slide 32 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Base and view tables can diverge ▪ A base table update may fail to be propagated o If the view that receives the updated row becomes unavailable, the base row won’t be indexed (or will be stale) Consistency 32

Slide 33

Slide 33 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Base and view tables can diverge ▪ A base table update may fail to be propagated o If the view that receives the updated row becomes unavailable, the base row won’t be indexed (or will be stale) ▪ A ghost record can remain in the view o If the base table becomes unavailable after updating the view table, the updates becomes a ghost row in the view o If the view that receives the tombstone becomes unavailable, that row becomes a ghost row in the view Consistency 33

Slide 34

Slide 34 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Remember view table updates in the base o Use commit or batch log o Solves first consistency issue o … or just delegate to repair Remembering view updates 34

Slide 35

Slide 35 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ View read-repair o When querying a view, probabilistically check if the row exists in the base and whether it’s up to date Repair 35

Slide 36

Slide 36 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ View read-repair o When querying a view, probabilistically check if the row exists in the base and whether it’s up to date ▪ View repair o Scan view table rows o Applies just between a base replica and a view replica, but is expected to work globally Repair 36

Slide 37

Slide 37 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ View read-repair o When querying a view, probabilistically check if the row exists in the base and whether it’s up to date ▪ View repair o Scan view table rows o Applies just between a base replica and a view replica, but is expected to work globally ▪ Can’t repair views between themselves o Due to distributed concurrent updates Repair 37

Slide 38

Slide 38 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Optimistic o Gate on the memtable o Retry calculating view update if it is flushed or if a conflicting update happened Local concurrent updates 38

Slide 39

Slide 39 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Optimistic o Gate on the memtable o Retry calculating view update if it is flushed or if a conflicting update happened ▪ Locking o Row, range of rows or partition o Cheaper if sufficiently granular, due to Scylla’s TPC Local concurrent updates 39

Slide 40

Slide 40 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Applies to base rows created before a view ▪ Scan all of the base table partitions o Send updates to the paired view replicas o Keep progress in system tables View Building 40

Slide 41

Slide 41 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Conclusions

Slide 42

Slide 42 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company ▪ Powerful global indexes ▪ Under development ▪ Under-researched problem o Lots of subtleties (see State of Materialized Views) ▪ Performance impact o Locking and read-before-write incur overhead o Impossible to avoid with Scylla’s data model ▪ Future: support for aggregation views Closing thoughts 42

Slide 43

Slide 43 text

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company THANK YOU [email protected] @duarte_nunes Please stay in touch Any questions?