Deniz Altınbüken on Chain Replication (old and new)

Chain Replication Deniz Altinbuken Cornell University

Chain Replication Robbert van Renesse and Fred B. Schneider. 2004.
Chain replica:on for suppor:ng high throughput and availability. In Proceedings of OSDI'04.

replication failure models • fail-stop failure model • crash failure
model • byzantine failure model replication techniques • quorum replication • stake replication • broker replication • primary-backup replication • state machine replication • chain replication • etc. consistency models • strong consistency • sequential consistency • eventual consistency • causal consistency • read-your-writes consistency • monotonic read consistency • etc.

primary-backup replication client R1 R2 R3 Rprimary update

primary-backup replication client update reply R1 R2 R3 Rprimary

primary-backup replication client R1 R2 R3 Rprimary query

primary-backup replication client R1 R2 R3 Rprimary query reply

R2 R3 Rtail Rhead update chain replication client

R2 R3 Rtail Rhead reply chain replication client update

query R2 R3 Rtail Rhead chain replication client client

query R2 R3 Rtail Rhead chain replication client reply client

R2 R3 Rtail Rhead chain replication client reply update query

primary-backup replication client update reply R1 R2 R3 Rprimary 1
2 3 4

R2 R3 Rtail Rhead chain replication client reply update 1
5 2 3 4 Higher latency!

primary-backup replication client query reply R1 R2 R3 Rprimary Primary
has to make sure that all updates prior to this query are done!

R2 R3 Rtail Rhead chain replication client reply query Tail
can respond directly!

R2 R3 Rtail Rhead chain replication client reply query Higher
throughput! Tail can respond directly!

related work • Sage A. Weil, Andrew W. Leung, ScoE
A. Brandt, and Carlos Maltzahn. 2007. RADOS: a scalable, reliable storage service for petabyte-‐scale storage clusters. In Proceedings of the 2nd interna8onal workshop on Petascale data storage: held in conjunc8on with Supercompu8ng '07 (PDSW '07). ACM, New York, NY, USA, 35-‐44. • Jeﬀ Terrace and Michael J. Freedman. 2009. Object storage on CRAQ: high-‐throughput chain replica@on for read-‐mostly workloads. In Proceedings of the 2009 conference on USENIX Annual technical conference (USENIX'09). USENIX Associa:on, Berkeley, CA, USA, 11-‐11. • David G. Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee, Lawrence Tan, and Vijay Vasudevan. 2009. FAWN: a fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd symposium on Opera8ng systems principles (SOSP '09). ACM, New York, NY, USA, 1-‐14. • ScoE Lys:g Fritchie. 2010. Chain replica@on in theory and in prac@ce. In Proceedings of the 9th ACM SIGPLAN workshop on Erlang (Erlang '10). ACM, New York, NY, USA, 33-‐44. • WyaE Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don't seIle for eventual: scalable causal consistency for wide-‐area storage with COPS. In Proceedings of the Twenty-‐Third ACM Symposium on Opera8ng Systems Principles (SOSP '11). ACM, New York, NY, USA, 401-‐416. • Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, and John D. Davis. 2012. CORFU: a shared log design for ﬂash clusters. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementa8on (NSDI'12). USENIX Associa:on, Berkeley, CA, USA, 1-‐1. • Guy Laden, Roie Melamed, and Ymir Vigfusson. 2012. Adap@ve and dynamic funnel replica@on in clouds. SIGOPS Oper. Syst. Rev. 46, 1 (February 2012), 40-‐46. • Sérgio Almeida, João Leitão, and Luís Rodrigues. 2013. ChainReac@on: a causal+ consistent datastore based on chain replica@on. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 85-‐98. • Hussam Abu-‐Libdeh, Robbert van Renesse, and Ymir Vigfusson. 2013. Leveraging sharding in the design of scalable replica@on protocols. In Proceedings of the 4th annual Symposium on Cloud Compu8ng (SOCC '13). ACM, New York, NY …

chain replication limitations • tail is a bottleneck for queries.
• CRAQ: read from “clean” nodes. • supports only strong consistency. • CRAQ: eventual consistency • Chain Reaction: causal consistency • requires a master to reconﬁgure.

motivation • explain why suggested improvements work. • find further
improvements. • make reconfiguration easier and cleaner. • create complete specifications. • prove chain replication works.

outline •updates •queries •failures •reconﬁguration •various consistency models

updates

R2 R3 Rtail Rhead Speculative History Speculative History Speculative History
Speculative History Stable History Stable History Stable History Stable History

Speculative History Stable History Stable History Stable History Stable History R2 is the predecessor of R3 R3 is the successor of R2

Speculative History Stable History Stable History Stable History Stable History update

Speculative History Stable History Stable History Stable History Stable History propagation message

Speculative History Stable History Stable History Stable History Stable History reply

Speculative History Stable History Stable History Stable History Stable History acknowledgment message

Speculative History update update pdate Stable History Stable History Stable History Stable History reply reply

Speculative History update update pdate Stable History Stable History Stable History Stable History reply reply Multiple updates are handled simultaneously.

Speculative History Speculative History Speculative History Speculative History Stable History
Stable History Stable History Stable History update update pdate reply reply R2 R3 Rtail Rhead

R2 R3 Rtail Speculative History ⊆ ⊇ ⊇ ⊆ ⊆
⊆ ⊆ Stable History Speculative History Speculative History ⊇ Speculative History ⊆ Stable History ⊆ Stable History Stable History Rhead ⊆

⊆ Stable History Stable History Speculative History Speculative History ⊇
Speculative History ⊆ Stable History R2 R3 R2 R3 R2 R2 The speculative history of a node’s successor is a subset of that node’s speculative history. The speculative history of a node is a superset of its stable history. The stable history of a node’s successor is a superset of that node’s stable history.

queries

Stable History Stable History Stable History R2 R3 Rtail Rhead

Speculative History Speculative History Speculative History Speculative History query Stable
History Stable History Stable History Stable History R2 R3 Rtail Rhead

Speculative History Speculative History Speculative History Speculative History query Stable
History Stable History Stable History Stable History reply R2 R3 Rtail Rhead

Speculative History Speculative History Speculative History Speculative History reply query
Stable History Stable History Stable History Stable History R2 R3 Rtail Rhead

Speculative History update update pdate Stable History Stable History Stable History Stable History reply reply query reply

Speculative History update update pdate Stable History Stable History Stable History Stable History reply reply query reply The tail is the point of linearization!

failures

Stable History Stable History Stable History head failure R2 R3 Rtail Rhead

Stable History Stable History Stable History head failure R2 R3 Rtail Rhead update

Stable History Stable History Stable History head failure R2 R3 Rtail Rhead

Stable History Stable History Stable History middle node failure R2 R3 Rtail Rhead

Stable History Stable History Stable History tail failure R2 R3 Rtail Rhead

reconﬁguration

Stable History Stable History Stable History adding a new node Rnew Speculative History Stable History tail R2 R3 Rtail Rhead

Stable History Stable History Stable History adding a new node Rnew Speculative History Stable History tail • new nodes are added to the chain with special conﬁguration updates that are added to the history: add(nodeid) • by looking at the order of these updates, a node can determine the conﬁguration of the chain add( ) new tail R2 R3 Rtail Rhead

Stable History Stable History Stable History adding a new node Rnew Speculative History Stable History tail add( ) new tail R2 R3 Rtail Rhead

Stable History Stable History Stable History adding a new node Speculative History Stable History • stable history of new tail should be a superset of the stable history of tail. • speculative history of new tail should be a superset of its stable history. • speculative and stable histories of new tail should be equal to the speculative history of tail • old tail should not answer to queries when the new tail should. add( ) new tail Rnew tail R2 R3 Rtail Rhead ⊆

Stable History Stable History Stable History adding a new node Speculative History Stable History • stable history of new tail should be a superset of the stable history of tail. • speculative history of new tail should be a superset of its stable history. • speculative and stable histories of new tail should be equal to the speculative history of tail • old tail should not answer to queries when the new tail should. add( ) new tail Rnew tail R2 R3 Rtail Rhead ⊆ ⊇

Stable History Stable History Stable History adding a new node Speculative History Stable History • stable history of new tail should be a superset of the stable history of tail. • speculative history of new tail should be a superset of its stable history. • speculative and stable histories of new tail should be equal to the speculative history of tail • old tail should not answer to queries when the new tail should. add( ) new tail Rnew tail R2 R3 Rtail Rhead ⊆ ⊇ ⊇

Stable History Stable History Stable History adding a new node Speculative History Stable History • stable history of new tail should be a superset of the stable history of tail. • speculative history of new tail should be a superset of its stable history. • speculative and stable histories of new tail should be equal to the speculative history of tail • old tail should not answer to queries when the new tail should. add( ) new tail Rnew tail R2 R3 Rtail Rhead = = ⊆

Stable History Stable History Stable History adding a new node Speculative History Stable History • stable history of new tail should be a superset of the stable history of tail. • speculative history of new tail should be a superset of its stable history. • speculative and stable histories of new tail should be equal to the speculative history of tail • old tail should not answer to queries when the new tail should. add( ) new tail Rnew tail R2 R3 Rtail Rhead

Stable History Stable History Stable History adding a new node Speculative History Stable History add( ) new tail Rnew tail R2 R3 Rtail Rhead

Stable History Stable History Stable History adding a new node Speculative History Stable History add( ) new tail reply Rnew tail R2 R3 Rtail Rhead

Stable History Stable History Stable History adding a new node Speculative History Stable History add( ) new tail Rnew tail R2 R3 Rtail Rhead

various consistency models

strong consistency after an update completes, any subsequent query by
any client will return the updated value.

strong consistency • tail can reply to queries. • nodes
that have their speculative and stable histories equal to each other can reply to queries. (clean vs dirty nodes at CRAQ) • a node can record the speculative history when it received a query and reply to the client when its stable history becomes equal to it. R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History reply query Stable History Stable History Stable History Stable History after an update completes, any subsequent query by any client will return the updated value.

strong consistency • tail can reply to queries. • any
node can record its speculative history when it received a query and reply to the client when its stable history becomes equal to it. • nodes that have their speculative and stable histories equal to each other can reply to queries. (clean vs dirty nodes at CRAQ) R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History Stable History Stable History Stable History Stable History query after an update completes, any subsequent query by any client will return the updated value.

• tail can reply to queries. • any node can
record its speculative history when it received a query and reply to the client when its stable history becomes equal to it. • nodes that have their speculative and stable histories equal to each other can reply to queries. (clean vs dirty nodes at CRAQ) strong consistency R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History Stable History Stable History Stable History Stable History query after an update completes, any subsequent query by any client will return the updated value.

node can record its speculative history when it received a query and reply to the client when its stable history becomes equal to it. • nodes that have their speculative and stable histories equal to each other can reply to queries. (clean vs dirty nodes at CRAQ) R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History Stable History Stable History Stable History Stable History query after an update completes, any subsequent query by any client will return the updated value.

node can record its speculative history when it received a query and reply to the client when its stable history becomes equal to it. • nodes that have their speculative and stable histories equal to each other can reply to queries. (clean vs dirty nodes at CRAQ) R2 Rhead Speculative History Speculative History Speculative History Speculative History Stable History Stable History Stable History Stable History reply query R3 Rtail after an update completes, any subsequent query by any client will return the updated value.

node can record its speculative history when it received a query and reply to the client when its stable history becomes equal to it. • nodes that have their speculative and stable histories equal to each other can reply to queries. (clean vs dirty nodes at CRAQ) R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History Stable History Stable History Stable History Stable History reply query after an update completes, any subsequent query by any client will return the updated value.

sequential consistency queries might return stale values, as long as
they are not reordered.

sequential consistency queries might return stale values, as long as
they are not reordered. • any node can reply to query messages with their stable history. • the stable history of any node is a preﬁx of history at tail. R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History reply query Stable History Stable History Stable History Stable History reply query

eventual consistency if no new updates are made, eventually all
queries will return a history including that last update.

eventual consistency • any node can reply to query messages
with their speculative history • the speculative history includes the history at tail and a sequence of updates that have been invoked but not yet stabilized (used in CRAQ) R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History reply query Stable History Stable History Stable History Stable History reply query if no new updates are made, eventually all queries will return a history including that last update.

causal consistency if client A has communicated to client B
that it has completed an update, a subsequent query by client B will return that completed update.

causal consistency • requires modeling communication between clients • if
a client receives a query reply from a node, same client can only read from this node’s predecessors until all updates in the reply are stabilized (used in Chain Reaction) R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History Stable History Stable History Stable History Stable History query1 reply if client A has communicated to client B that it has completed an update, a subsequent query by client B will return that completed update.

causal consistency • requires modeling communication between clients • if
a client receives a query reply from a node, same client can only read from this node’s predecessors until all updates in the reply are stabilized (used in Chain Reaction) R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History reply query2 Stable History Stable History Stable History Stable History query1 reply if client A has communicated to client B that it has completed an update, a subsequent query by client B will return that completed update.

read-your-writes consistency if a client’s update completes, that client will
never see an older version of the history. this is a special case of the causal consistency model.

read-your-writes consistency • requires modeling client-side. • on the client-side,
the proxy should ensure that a history returned by a query includes all updates that have been completed. • the proxy keeps track of updates that are invoked and completed. R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History Stable History Stable History Stable History Stable History query reply if a client’s update completes, that client will never see an older version of the history. this is a special case of the causal consistency model.

monotonic read consistency if a client has issued a query
and received h as a response, all following queries will receive a response with a history that has h as a preﬁx.

monotonic read consistency if a client has seen a particular
update, any subsequent queries will never return any previous state.

monotonic read consistency • any given client only queries a
single node (used in CRAQ) R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History Stable History Stable History Stable History Stable History query reply if a client has seen a particular update, any subsequent queries will never return any previous state.

monotonic read consistency • requires modeling client-side. • on the
client-side, the proxy should ensure that a history returned by a query should always be a sufﬁx of histories returned by previous queries • the proxy keeps track of queries that are invoked and completed R2 R3 Rtail Rhead Speculative History Speculative History Speculative History Speculative History Stable History Stable History Stable History Stable History query reply if a client has seen a particular update, any subsequent queries will never return any previous state.

R2 R3 Rtail Rhead chain replication client reply update query

chain replication update reply client query

objective • A linearizable data store replicated with chain replication
should look like a centralized data store to all clients. • A centralized data store has a single history. • Make sure the data store replicated with chain replication looks like it has a single history. • Prove it :) • Write the specification for the centralized data store. • Write the specification for the replicated data store. • Show the replicated specification refines the centralized specification.

conclusion • we have created a formal end-to-end specification of
chain replication • through this specification we can reason about how chain replication works • chain replication is easy to understand or implement • it can support different consistency models • reconfiguration can be done without requiring a master

TODO • open-source chain replication implementations • java and python
in progress • chain replication wikipedia page :) website: http://www.cs.cornell.edu/~deniz e-mail: [email protected] denizalti

Deniz Altınbüken on Chain Replication (old and ...

Deniz Altınbüken on Chain Replication (old and new)

More Decks by Papers_We_Love

Other Decks in Research

Featured

Transcript