Distributed Systems Advanced Development Group • IEEE ICDCS 1989 Preliminary version at 9th International Conference on Distributed Computing Systems 3
Distributed Systems Advanced Development Group • IEEE ICDCS 1989 Preliminary version at 9th International Conference on Distributed Computing Systems • IEEE TPDS 1990 Final version in the IEEE Transactions on Parallel and Distributed Systems 3
development of distributed applications • Transfer of control Between, instead of within, address spaces • “Alleviate the need to be aware…” Abstraction hides away network protocols, parameter marshaling, external data representations… 5
automatically • Single server, no other choice • All servers are semantically equivalent • Clerks Application specific subroutines or packages • Each application must provide one 9
RPC that hides the explicit binding from the application developer • Conceptual presentation Provides a conceptual presentation, without specifics of previous implementations 10
RPC that hides the explicit binding from the application developer • Conceptual presentation Provides a conceptual presentation, without specifics of previous implementations • Emerald Distributed programming language with runtime system 10
RPC that hides the explicit binding from the application developer • Conceptual presentation Provides a conceptual presentation, without specifics of previous implementations • Emerald Distributed programming language with runtime system • Eden Distributed operating system 10
Load Balancing Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing • Application Data Selection based on correctness and performance 11
Load Balancing Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing • Application Data Selection based on correctness and performance • Partitioned data and correctness Not all servers can answer for all requests 11
Load Balancing Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing • Application Data Selection based on correctness and performance • Partitioned data and correctness Not all servers can answer for all requests • Replicated data and performance (or availability) Choice can be based on desired performance or inherent availability trade-off with consistency 11
a cluster • Reducing communication costs Co-location of processes that work together on a task • Increasing availability Replication or mobility to increase fault-tolerance 17
a cluster • Reducing communication costs Co-location of processes that work together on a task • Increasing availability Replication or mobility to increase fault-tolerance • Reconfigurability Reconfiguration of machines, application, or network topology 17
a cluster • Reducing communication costs Co-location of processes that work together on a task • Increasing availability Replication or mobility to increase fault-tolerance • Reconfigurability Reconfiguration of machines, application, or network topology • Special capabilities Access to specialized hardware only available on certain machines 17
can be found at its new home • Recovery of object In the event of a failure, the object has to be able to be recovered safely • Required by some applications Objects may have to be local for some operations to succeed 20
can be found at its new home • Recovery of object In the event of a failure, the object has to be able to be recovered safely • Required by some applications Objects may have to be local for some operations to succeed • “Quality” improvement in interaction “Cost is justified” when moving objects if interaction quality increases 20
• Process • Vouchers filled in by employees • Approved or rejected by managers • If approved, cash disbursement is made and forms are archived • Remote, geo-distributed, asynchronous process Actions can occur on the order of minutes or days, at any location in Digital’s 36,000 node global, internal network 22
Objects control: • Persistence How state should be persisted for each object • Recovery How objects can be recovered, if they happen to fail • Placement Where objects should be located on the network 25
Objects control: • Persistence How state should be persisted for each object • Recovery How objects can be recovered, if they happen to fail • Placement Where objects should be located on the network • Remove invocation Objects can invoke methods on other objects 25
Application Objects Virtual memory containing the active objects in the system for each application • Intra-object Communication System Location Independent Invocation system with underlying RPC mechanism 28
• Recovery Storesites support object recovery in the event of node failure • Recovery safety and liveness Object’s current location stored at the storesite, and until object is confirmed dead, recovery is prevented 30
and their local Hermes node • Processes Objects as processes that are distributed across the network • Modula-2+ RPC and multithreading support from Digital SRC 32
unique identifier for its lifetime • Age Each object has an age containing a monotonic counter that advances when the object attempt to move between nodes 35
unique identifier for its lifetime • Age Each object has an age containing a monotonic counter that advances when the object attempt to move between nodes • Location and storesite Each object contains a current location and a current storesite 35
node identifier and its age that represents an objects location at some point in time • Passed implicitly Passed implicitly along with the guid when the object is passed by reference 36
node identifier and its age that represents an objects location at some point in time • Passed implicitly Passed implicitly along with the guid when the object is passed by reference • Cached Supervisors cache tad’s locally for currently, and previously, referenced objects 36
Invocation if local; else, we must follow the tad to identify the current location • Follow path of tads until object located Follow tads, returning the most recent tad back to the forwarding node until the object is found 37
Invocation if local; else, we must follow the tad to identify the current location • Follow path of tads until object located Follow tads, returning the most recent tad back to the forwarding node until the object is found • Update cache Each node updates its local cache of tad’s to optimize subsequent invocations 37
If we can’t find the object via forwarding, we need to resort to asking the storesite for the current location • Forwarding pointers are still necessary However, since objects can move immediately after we access the storesite, forwarding pointers are still necessary 38
If we can’t find the object via forwarding, we need to resort to asking the storesite for the current location • Forwarding pointers are still necessary However, since objects can move immediately after we access the storesite, forwarding pointers are still necessary • Resort to name service If objects migrate storesites, it may be required to contact the global name service to identify the current storesite 38
to response Processing of an invocation and response to an invocation are not atomic • Idempotence is one solution Given recovery may trigger duplicate invocation, ensuring idempotence in methods is essential 42
to response Processing of an invocation and response to an invocation are not atomic • Idempotence is one solution Given recovery may trigger duplicate invocation, ensuring idempotence in methods is essential • Otherwise, sequencing …or, simply put, you could just use consensus. 42
date If temporal address descriptor is out of date, return an error immediately • Simplifies failure handling Control is returned to the invoker immediately; thread of control is not consumed 54
date If temporal address descriptor is out of date, return an error immediately • Simplifies failure handling Control is returned to the invoker immediately; thread of control is not consumed • Invoker must retry with new tad Invoker must update local information with new tad and repeat invocation 54
tad Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor • Ties up thread of control Until object located, thread of control is tied up 55
tad Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor • Ties up thread of control Until object located, thread of control is tied up • More prone to disruption Failures in the middle of the chain can cause loss of availability 55
tad Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor • Ties up thread of control Until object located, thread of control is tied up • More prone to disruption Failures in the middle of the chain can cause loss of availability • Allows cache maintenance All nodes along the path can be updated with up-to-date tads as forwarding occurs 55
Propagation of the invocation is done for a finite number of hops, until a hop count is exceeded • Return error to invoker Error is returned to the invoker, and the invoker must try again 56
Propagation of the invocation is done for a finite number of hops, until a hop count is exceeded • Return error to invoker Error is returned to the invoker, and the invoker must try again • Short forwarding chains Performed under the belief that forwarding chains will typically be short 56
to using storesites for locating objects • Initial storesite When created, objects have an initial storesite; this location along with 2PC is used to track object location after migrations 59
to using storesites for locating objects • Initial storesite When created, objects have an initial storesite; this location along with 2PC is used to track object location after migrations • Invocation can race with migration Forwarding pointers are still required for finding the current location of a process 59
through the use of forwarding pointers • Encode initial and track after first migration Encode the storesite in the globally unique identifier and register with the name service after first migration 60
through the use of forwarding pointers • Encode initial and track after first migration Encode the storesite in the globally unique identifier and register with the name service after first migration • Query in parallel Query both the storesite and global name service in parallel to reduce latency for locating objects 60
‘hints’; fallback to stable storage and network broadcast for identifying current location. • Emerald: language-level approach Process migration supported with forwarding pointers; fallback to stable storage with broadcast and pairwise inspection for identifying current location. 62
are extremely similar to temporal address descriptors • Locus, MOS, R* “Home” machine tracks current location of processes that were birthed there which is a similar idea to storesites 67
specific to Modula-2+ • RPC ‘stubs’ Local and remote stubs are used to wrap calls with the code for performing maintenance of the tad lifecycle: caching, forwarding, etc. 69
specific to Modula-2+ • RPC ‘stubs’ Local and remote stubs are used to wrap calls with the code for performing maintenance of the tad lifecycle: caching, forwarding, etc. • Fix and unfix Objects are ‘fixed’ during the duration of the call, as to prevent object migration during invocation 69
system that had a working implementation! (albeit, in a laboratory) • Evaluation is somewhat questionable Wide variance in latencies without explanation; large forwarding chains are never evaluated; hard to understand where certain latency is coming from 70
system that had a working implementation! (albeit, in a laboratory) • Evaluation is somewhat questionable Wide variance in latencies without explanation; large forwarding chains are never evaluated; hard to understand where certain latency is coming from • "cost of communication is outweighed by the gain in parallelism." Unclear where the parallelism gains originate from in the system, or why these would override the cost of communication in latency penalties 70
temporarily unavailable under network partitions • Disruption to forwarding chains Partitions can also be very disruptive to forwarding chains where intermediary nodes may be unavailable 71
temporarily unavailable under network partitions • Disruption to forwarding chains Partitions can also be very disruptive to forwarding chains where intermediary nodes may be unavailable • Fallback We can fallback to the object’s store site or using the name service, but these are also susceptible to network partitions as well 71
invoker and invokee can fail at any point; invokee can fail after performing side-effect but before responding • Recovery from stable storage Recovery does not guarantee “exactly-once” semantics; some invocations may be retried upon recovery if they were performed before a checkpoint 72
invoker and invokee can fail at any point; invokee can fail after performing side-effect but before responding • Recovery from stable storage Recovery does not guarantee “exactly-once” semantics; some invocations may be retried upon recovery if they were performed before a checkpoint • Idempotence Idempotence is the best strategy for mitigation of these issues 72
mobile Through the use of forwarding chains and “home” sites, eliminate the explicit binding step • Selection and placement are still up to the user Developers still need to be concerned with where to place objects, and how to select the target objects of invocation 74
mobile Through the use of forwarding chains and “home” sites, eliminate the explicit binding step • Selection and placement are still up to the user Developers still need to be concerned with where to place objects, and how to select the target objects of invocation • RPC is still a problematic paradigm Issues with duplicate invocation, idempotence, and sequencing of operations remain challenges for the developer 74
Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” • Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” 75
Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” • Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” • Kendall, Samuel C, Jim Waldo, Ann Wollrath, and Geoff Wyant. 1994. “A Note on Distributed Computing.” 75
Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” • Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” • Kendall, Samuel C, Jim Waldo, Ann Wollrath, and Geoff Wyant. 1994. “A Note on Distributed Computing.” • Black, Andrew P, Norman C Hutchinson, Eric Jul, and Henry M Levy. 2007. “The Development of the Emerald Programming Language.” 75