and last name Position, company Duarte Nunes 2 Duarte Nunes is a Software Engineer working on ScyllaDB. He has a background in concurrent programming, distributed systems and low-latency software. Prior to ScyllaDB, he worked on distributed network virtualization.
and last name Position, company Agenda ▪ Introduction ▪ Problem Space ▪ Solution o Propagation o Consistency o Concurrent Updates o Building a MV ▪ Conclusions 3
and last name Position, company Materialized Views 5 ▪ A table containing a copy of the results of some query performed on a base table ▪ Updated when the underlying base table is modified ▪ Used as an index in Scylla ▪ Usually has a different PK than the base o Currently limited to including at most one base regular column o Must include the base’s PK
and last name Position, company Distributed Materialized Views 6 ▪ The view table is stored in different replicas than the base table ▪ No single master server to serialize updates ▪ View must be eventually consistent with the base table
and last name Position, company CQL 7 ▪ CREATE TABLE buildings ( name text, city text, built int, meters int, PRIMARY KEY (name) ); ▪ CREATE MATERIALIZED VIEW building_by_city AS SELECT * FROM buildings WHERE city IS NOT NULL PRIMARY KEY(city, name);
and last name Position, company Constraints 9 ▪ Cluster availability mustn’t be affected o Don’t stop serving base table writes ▪ Base write latency overhead must be minimized ▪ An update to a base table must propagate to the view table o Rows in the base will eventually appear in the view o Rows absent from the base will eventually be absent from the view
and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); Read-before-write 10 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c);
and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); ▪ p | c | v ---+---+--- 0 | 1 | 8 Read-before-write 11 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c); ▪ v | p | c ---+---+--- 8 | 0 | 1
and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); ▪ p | c | v ---+---+--- 0 | 1 | 8 ▪ UPDATE TABLE base_table SET v = 10 WHERE p = 0 AND c = 1 Read-before-write 12 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c); ▪ v | p | c ---+---+--- 8 | 0 | 1
and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); ▪ p | c | v ---+---+--- 0 | 1 | 8 ▪ UPDATE TABLE base_table SET v = 10 WHERE p = 0 AND c = 1 Read-before-write 13 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c); ▪ v | p | c ---+---+--- 8 | 0 | 1 ▪ Insert (10, 0, 1)
and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v int, PRIMARY KEY (p, c) ); ▪ p | c | v ---+---+--- 0 | 1 | 8 ▪ UPDATE TABLE base_table SET v = 10 WHERE p = 0 AND c = 1 Read-before-write 14 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v IS NOT NULL PRIMARY KEY(v, p, c); ▪ v | p | c ---+---+--- 8 | 0 | 1 ▪ Insert (10, 0, 1) ▪ Tombstone (8, 0, 1)
and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v1 int, v2 int, PRIMARY KEY (p, c) ); Read-before-write 15 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v1 IS NOT NULL PRIMARY KEY(v1, p, c, v2);
and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v1 int, v2 int, PRIMARY KEY (p, c) ); ▪ UPDATE TABLE base_table SET v2 = 9 WHERE p = 0 AND c = 1 Read-before-write 16 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v1 IS NOT NULL PRIMARY KEY(v1, p, c, v2);
and last name Position, company ▪ CREATE TABLE base_table ( p int, c int, v1 int, v2 int, PRIMARY KEY (p, c) ); ▪ UPDATE TABLE base_table SET v2 = 9 WHERE p = 0 AND c = 1 Read-before-write 17 ▪ CREATE MATERIALIZED VIEW view_table AS SELECT * FROM base_table WHERE v1 IS NOT NULL PRIMARY KEY(v1, p, c, v2); ▪ Update (8, 0, 1, 9)
and last name Position, company ▪ Concurrent updates don’t mix well with read-before-write o Two or more updates can read the same old value in the base o They would insert the same tombstone and miss each other’s updates o A view entry would have no corresponding base table entry Concurrent updates 18
and last name Position, company A solution must 19 ▪ Include a mechanism for update propagation ▪ Solve consistency issues between base and view tables ▪ Concurrent updates must be safe o An algorithm should work locally and also provide emergent safety, as synchronizing a whole cluster is practically unattainable.
and last name Position, company Transactional guarantees? ▪ More roundtrips o Prepare and commit stages ▪ Added latency ▪ View replica availability impacting the base replica’s 24
and last name Position, company ▪ Pair one base replica to one view replica per view partition o The view replica receiving the update may not be the one receiving a tombstone Asynchronous replica-based propagation 28
and last name Position, company ▪ Pair one base replica to one view replica per view partition o The view replica receiving the update may not be the one receiving a tombstone ▪ View updating load distributed across base replicas Asynchronous replica-based propagation 29
and last name Position, company ▪ Pair one base replica to one view replica per view partition o The view replica receiving the update may not be the one receiving a tombstone ▪ View updating load distributed across base replicas ▪ In the absence of failures, RF view replicas will be updated Asynchronous replica-based propagation 30
and last name Position, company ▪ Base and view tables can diverge ▪ A base table update may fail to be propagated o If the view that receives the updated row becomes unavailable, the base row won’t be indexed (or will be stale) Consistency 32
and last name Position, company ▪ Base and view tables can diverge ▪ A base table update may fail to be propagated o If the view that receives the updated row becomes unavailable, the base row won’t be indexed (or will be stale) ▪ A ghost record can remain in the view o If the base table becomes unavailable after updating the view table, the updates becomes a ghost row in the view o If the view that receives the tombstone becomes unavailable, that row becomes a ghost row in the view Consistency 33
and last name Position, company ▪ Remember view table updates in the base o Use commit or batch log o Solves first consistency issue o … or just delegate to repair Remembering view updates 34
and last name Position, company ▪ View read-repair o When querying a view, probabilistically check if the row exists in the base and whether it’s up to date Repair 35
and last name Position, company ▪ View read-repair o When querying a view, probabilistically check if the row exists in the base and whether it’s up to date ▪ View repair o Scan view table rows o Applies just between a base replica and a view replica, but is expected to work globally Repair 36
and last name Position, company ▪ View read-repair o When querying a view, probabilistically check if the row exists in the base and whether it’s up to date ▪ View repair o Scan view table rows o Applies just between a base replica and a view replica, but is expected to work globally ▪ Can’t repair views between themselves o Due to distributed concurrent updates Repair 37
and last name Position, company ▪ Optimistic o Gate on the memtable o Retry calculating view update if it is flushed or if a conflicting update happened Local concurrent updates 38
and last name Position, company ▪ Optimistic o Gate on the memtable o Retry calculating view update if it is flushed or if a conflicting update happened ▪ Locking o Row, range of rows or partition o Cheaper if sufficiently granular, due to Scylla’s TPC Local concurrent updates 39
and last name Position, company ▪ Applies to base rows created before a view ▪ Scan all of the base table partitions o Send updates to the paired view replicas o Keep progress in system tables View Building 40
and last name Position, company ▪ Powerful global indexes ▪ Under development ▪ Under-researched problem o Lots of subtleties (see State of Materialized Views) ▪ Performance impact o Locking and read-before-write incur overhead o Impossible to avoid with Scylla’s data model ▪ Future: support for aggregation views Closing thoughts 42