Implementing Location Independent Invocation

Implementing Location Independent Invocation 1 Andrew P. Black and Yeshayahu
Artsy Digital Equipment Corporation Distributed Systems Advanced Development Group IEEE Transactions on Parallel and Distributed Systems 1990

History • Digital Equipment Corporation   Work performed at the
Distributed Systems Advanced Development Group 3

Distributed Systems Advanced Development Group • IEEE ICDCS 1989  Preliminary version at 9th International Conference on Distributed Computing Systems 3

Distributed Systems Advanced Development Group • IEEE ICDCS 1989  Preliminary version at 9th International Conference on Distributed Computing Systems • IEEE TPDS 1990  Final version in the IEEE Transactions on Parallel and Distributed Systems 3

Motivation Remote Procedure Call 4

Remote Procedure Call • Remote Procedure Call (RPC)  Eases the
development of distributed applications 5

development of distributed applications • Transfer of control  Between, instead of within, address spaces 5

development of distributed applications • Transfer of control  Between, instead of within, address spaces • “Alleviate the need to be aware…”  Abstraction hides away network protocols, parameter marshaling, external data representations… 5

Intrinsic Differences • Diﬀerent address spaces  Makes passing objects by
reference difﬁcult, given the reference will no longer be valid 6

Intrinsic Differences • Diﬀerent address spaces  Makes passing objects by
reference difﬁcult, given the reference will no longer be valid • Failure modes  Both the caller and callee can fail independently 6

Most Fundamental Difference? 7 Binding

Which address space should the call be directed at? 8

Two Methods For Binding • Default binding  Server is chosen
automatically 9

automatically • Single server, no other choice 9

automatically • Single server, no other choice • All servers are semantically equivalent 9

automatically • Single server, no other choice • All servers are semantically equivalent • Clerks  Application speciﬁc subroutines or packages 9

automatically • Single server, no other choice • All servers are semantically equivalent • Clerks  Application speciﬁc subroutines or packages • Each application must provide one 9

Location Independent Invocation • Removes the binding step  Abstraction above
RPC that hides the explicit binding from the application developer 10

RPC that hides the explicit binding from the application developer • Conceptual presentation  Provides a conceptual presentation, without speciﬁcs of previous implementations 10

RPC that hides the explicit binding from the application developer • Conceptual presentation  Provides a conceptual presentation, without speciﬁcs of previous implementations • Emerald  Distributed programming language with runtime system 10

RPC that hides the explicit binding from the application developer • Conceptual presentation  Provides a conceptual presentation, without speciﬁcs of previous implementations • Emerald  Distributed programming language with runtime system • Eden  Distributed operating system 10

When is LII useful? • Pure Functions / Binding as
Load Balancing  Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing 11

Load Balancing  Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing • Application Data  Selection based on correctness and performance 11

Load Balancing  Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing • Application Data  Selection based on correctness and performance • Partitioned data and correctness  Not all servers can answer for all requests 11

Load Balancing  Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing • Application Data  Selection based on correctness and performance • Partitioned data and correctness  Not all servers can answer for all requests • Replicated data and performance (or availability)  Choice can be based on desired performance or inherent availability trade-off with consistency 11

The Registry Problem • Mapping  If service instances can move,
we need to keep track of where they are running 12

The Registry Problem • Mapping  If service instances can move,
we need to keep track of where they are running • Churn Rate  Based on what the churn rate is, different mechanisms for tracking might be required 12

“constant” to “very occasionally” 13 each server has a function
that computes a map

“more than very occasionally” 14 global naming service (with inherent
CAP tradeoffs)

“frequently” 15 automation when instances "move frequently enough to make
the implementation of a static mapping function impractical."

Why do objects move? 16 [see Emerald, Eden, …]

Process Migration 17

Process Migration • Sharing memory  Better utilization of memory across
a cluster 17

a cluster • Reducing communication costs  Co-location of processes that work together on a task 17

a cluster • Reducing communication costs  Co-location of processes that work together on a task • Increasing availability  Replication or mobility to increase fault-tolerance 17

a cluster • Reducing communication costs  Co-location of processes that work together on a task • Increasing availability  Replication or mobility to increase fault-tolerance • Reconﬁgurability  Reconﬁguration of machines, application, or network topology 17

a cluster • Reducing communication costs  Co-location of processes that work together on a task • Increasing availability  Replication or mobility to increase fault-tolerance • Reconﬁgurability  Reconﬁguration of machines, application, or network topology • Special capabilities  Access to specialized hardware only available on certain machines 17

concurrent access 18 may result in thrashing

and the code that manipulates that data moves along with
it 19 data moves

Object Migration • Locating the object  Need to ensure object
can be found at its new home 20

can be found at its new home • Recovery of object  In the event of a failure, the object has to be able to be recovered safely 20

can be found at its new home • Recovery of object  In the event of a failure, the object has to be able to be recovered safely • Required by some applications  Objects may have to be local for some operations to succeed 20

can be found at its new home • Recovery of object  In the event of a failure, the object has to be able to be recovered safely • Required by some applications  Objects may have to be local for some operations to succeed • “Quality” improvement in interaction  “Cost is justiﬁed” when moving objects if interaction quality increases 20

Contributions The Hermes System 21

Overview • Expense voucher system at Digital  Corporation-wide, real-life application
22

• Process 22

• Process • Vouchers ﬁlled in by employees 22

• Process • Vouchers ﬁlled in by employees • Approved or rejected by managers 22

• Process • Vouchers ﬁlled in by employees • Approved or rejected by managers • If approved, cash disbursement is made and forms are archived 22

• Process • Vouchers ﬁlled in by employees • Approved or rejected by managers • If approved, cash disbursement is made and forms are archived • Remote, geo-distributed, asynchronous process  Actions can occur on the order of minutes or days, at any location in Digital’s 36,000 node global, internal network 22

“These requirements make it infeasible to store all the forms
in a single centralized database, or even in a number of geographically dispersed databases.” 23

Represent the data and code of each form as an
object that can move around the network as the application demands. 24

(Mobile) Objects • Objects have: 25

(Mobile) Objects • Objects have: • State 25

(Mobile) Objects • Objects have: • State • Methods 25

(Mobile) Objects • Objects have: • State • Methods •
Objects control: 25

Objects control: • Persistence  How state should be persisted for each object 25

Objects control: • Persistence  How state should be persisted for each object • Recovery  How objects can be recovered, if they happen to fail 25

Objects control: • Persistence  How state should be persisted for each object • Recovery  How objects can be recovered, if they happen to fail • Placement  Where objects should be located on the network 25

Objects control: • Persistence  How state should be persisted for each object • Recovery  How objects can be recovered, if they happen to fail • Placement  Where objects should be located on the network • Remove invocation  Objects can invoke methods on other objects 25

"Thus, the view of the world presented to application programmers
is a distributed ocean in which application-dependent objects of their own design can be ﬂoated." 26

Node Services • Active Objects  Objects that are active are
stored in virtual memory 27

Node Services • Active Objects  Objects that are active are
stored in virtual memory • Stable Storage  Stable storage is used for objects that are not currently referenced 27

Node Makeup • Supervisor  Object creation, location, and relocation 28

Node Makeup • Supervisor  Object creation, location, and relocation •
Application Objects  Virtual memory containing the active objects in the system for each application 28

Node Makeup • Supervisor  Object creation, location, and relocation •
Application Objects  Virtual memory containing the active objects in the system for each application • Intra-object Communication System  Location Independent Invocation system with underlying RPC mechanism 28

Storesites • Checkpointing  Objects periodically checkpoint their state at storesites.
30

• Recovery  Storesites support object recovery in the event of node failure 30

• Recovery  Storesites support object recovery in the event of node failure • Recovery safety and liveness  Object’s current location stored at the storesite, and until object is conﬁrmed dead, recovery is prevented 30

Additional Services • Name Service  Locating storesites, supervisors, and objects
by name. 31

Additional Services • Name Service  Locating storesites, supervisors, and objects
by name. • Authentication Service  For authentication between supervisors 31

How’s it built? • Window-based interface  For interacting with objects
and their local Hermes node 32

and their local Hermes node • Processes  Objects as processes that are distributed across the network 32

and their local Hermes node • Processes  Objects as processes that are distributed across the network • Modula-2+  RPC and multithreading support from Digital SRC 32

Contributions Locating Objects 33

Objects • Globally unique identiﬁers  Each object has a globally
unique identiﬁer for its lifetime 35

unique identiﬁer for its lifetime • Age  Each object has an age containing a monotonic counter that advances when the object attempt to move between nodes 35

unique identiﬁer for its lifetime • Age  Each object has an age containing a monotonic counter that advances when the object attempt to move between nodes • Location and storesite  Each object contains a current location and a current storesite 35

Temporal Address Descriptors • Temporal Address Descriptors (tad)  Pair of
node identiﬁer and its age that represents an objects location at some point in time 36

node identiﬁer and its age that represents an objects location at some point in time • Passed implicitly  Passed implicitly along with the guid when the object is passed by reference 36

node identiﬁer and its age that represents an objects location at some point in time • Passed implicitly  Passed implicitly along with the guid when the object is passed by reference • Cached  Supervisors cache tad’s locally for currently, and previously, referenced objects 36

Locating an Object for Invocation • Local invocation attempted ﬁrst 
Invocation if local; else, we must follow the tad to identify the current location 37

Invocation if local; else, we must follow the tad to identify the current location • Follow path of tads until object located  Follow tads, returning the most recent tad back to the forwarding node until the object is found 37

Invocation if local; else, we must follow the tad to identify the current location • Follow path of tads until object located  Follow tads, returning the most recent tad back to the forwarding node until the object is found • Update cache  Each node updates its local cache of tad’s to optimize subsequent invocations 37

End of Forwarding Chain • Forwarding chain doesn’t locate object 
If we can’t ﬁnd the object via forwarding, we need to resort to asking the storesite for the current location 38

If we can’t ﬁnd the object via forwarding, we need to resort to asking the storesite for the current location • Forwarding pointers are still necessary  However, since objects can move immediately after we access the storesite, forwarding pointers are still necessary 38

If we can’t ﬁnd the object via forwarding, we need to resort to asking the storesite for the current location • Forwarding pointers are still necessary  However, since objects can move immediately after we access the storesite, forwarding pointers are still necessary • Resort to name service  If objects migrate storesites, it may be required to contact the global name service to identify the current storesite 38

RPC marshalling allows for guid/tad maintenance in structured responses! 39
Cool!

Invocation and location are combined to prevent an object from
moving after identifying it’s location. 40

Idempotence and Sequencing • Invocations can succeed but fail prior
to response  Processing of an invocation and response to an invocation are not atomic 42

to response  Processing of an invocation and response to an invocation are not atomic • Idempotence is one solution  Given recovery may trigger duplicate invocation, ensuring idempotence in methods is essential 42

to response  Processing of an invocation and response to an invocation are not atomic • Idempotence is one solution  Given recovery may trigger duplicate invocation, ensuring idempotence in methods is essential • Otherwise, sequencing  …or, simply put, you could just use consensus. 42

Contributions An Example 43

A B C X Y

A B C X Y migration (A, 0) -> (B,
1) Y

1) Y Y migration (B, 1) -> (C, 2)

1) Y Y migration (B, 1) -> (C, 2) local invocation fails

1) Y Y migration (B, 1) -> (C, 2) local invocation fails invoke on Y with tad (B, 1) updated tad (C, 2)

1) Y Y migration (B, 1) -> (C, 2) local invocation fails invoke on Y with tad (B, 1) updated tad (C, 2) invoke on Y with tad (C, 2)

1) Y Y migration (B, 1) -> (C, 2) local invocation fails invoke on Y with tad (B, 1) updated tad (C, 2) invoke on Y with tad (C, 2) send response!

Contributions Hybrid Approach 52

Two approaches for following forwarding chains 53

Throw Error • Return error if tad is out of
date  If temporal address descriptor is out of date, return an error immediately 54

date  If temporal address descriptor is out of date, return an error immediately • Simpliﬁes failure handling  Control is returned to the invoker immediately; thread of control is not consumed 54

date  If temporal address descriptor is out of date, return an error immediately • Simpliﬁes failure handling  Control is returned to the invoker immediately; thread of control is not consumed • Invoker must retry with new tad  Invoker must update local information with new tad and repeat invocation 54

Invocation Propagation • Propagation of invocation to location referenced by
tad  Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor 55

tad  Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor • Ties up thread of control  Until object located, thread of control is tied up 55

tad  Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor • Ties up thread of control  Until object located, thread of control is tied up • More prone to disruption  Failures in the middle of the chain can cause loss of availability 55

tad  Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor • Ties up thread of control  Until object located, thread of control is tied up • More prone to disruption  Failures in the middle of the chain can cause loss of availability • Allows cache maintenance  All nodes along the path can be updated with up-to-date tads as forwarding occurs 55

Hybrid Approach • Propagation is used for ﬁnite hop count 
Propagation of the invocation is done for a ﬁnite number of hops, until a hop count is exceeded 56

Propagation of the invocation is done for a ﬁnite number of hops, until a hop count is exceeded • Return error to invoker  Error is returned to the invoker, and the invoker must try again 56

Propagation of the invocation is done for a ﬁnite number of hops, until a hop count is exceeded • Return error to invoker  Error is returned to the invoker, and the invoker must try again • Short forwarding chains  Performed under the belief that forwarding chains will typically be short 56

Tad Maintenance • Stash  Local cache always keeps the most
recent tad for a given object 57

recent tad for a given object • Long chains reduce performance 57

recent tad for a given object • Long chains reduce performance • Latency & throughput are reduced linearly 57

recent tad for a given object • Long chains reduce performance • Latency & throughput are reduced linearly • Availability is reduced exponentially 57

Contributions Storesites 58

Locating through Storesites • Solution for broken forwarding chains  Fallback
to using storesites for locating objects 59

to using storesites for locating objects • Initial storesite  When created, objects have an initial storesite; this location along with 2PC is used to track object location after migrations 59

to using storesites for locating objects • Initial storesite  When created, objects have an initial storesite; this location along with 2PC is used to track object location after migrations • Invocation can race with migration  Forwarding pointers are still required for ﬁnding the current location of a process 59

Storesite Migration • Forwarding pointers  Support object migration from storesites
through the use of forwarding pointers 60

through the use of forwarding pointers • Encode initial and track after first migration  Encode the storesite in the globally unique identifier and register with the name service after first migration 60

through the use of forwarding pointers • Encode initial and track after first migration  Encode the storesite in the globally unique identifier and register with the name service after first migration • Query in parallel  Query both the storesite and global name service in parallel to reduce latency for locating objects 60

Contributions Related Work 61

Previous Systems • Eden: OS-level approach  Process migration supported with
‘hints’; fallback to stable storage and network broadcast for identifying current location. 62

Previous Systems • Eden: OS-level approach  Process migration supported with
‘hints’; fallback to stable storage and network broadcast for identifying current location. • Emerald: language-level approach  Process migration supported with forwarding pointers; fallback to stable storage with broadcast and pairwise inspection for identifying current location. 62

Emerald was technically superior 63 process migration during invocation

Emerald was technically superior 64 dynamic type checking

Emerald was technically superior 65 dynamic software update

“While an attractive paradigm for the future, we judge that
we were unlikely to successfully introduce a new programming language for commercial distributed applications.” 66

Related Systems • Demos/MP  Unidirectional links for process migration which
are extremely similar to temporal address descriptors 67

Related Systems • Demos/MP  Unidirectional links for process migration which
are extremely similar to temporal address descriptors • Locus, MOS, R*  “Home” machine tracks current location of processes that were birthed there which is a similar idea to storesites 67

Contributions Evaluation 68

Implementation • Modula-2+ speciﬁc  Many details in the implementation are
speciﬁc to Modula-2+ 69

speciﬁc to Modula-2+ • RPC ‘stubs’  Local and remote stubs are used to wrap calls with the code for performing maintenance of the tad lifecycle: caching, forwarding, etc. 69

specific to Modula-2+ • RPC ‘stubs’  Local and remote stubs are used to wrap calls with the code for performing maintenance of the tad lifecycle: caching, forwarding, etc. • Fix and unfix  Objects are ‘fixed’ during the duration of the call, as to prevent object migration during invocation 69

Questionable Evaluation • Actual, real-life system!  Hermes is an actual
system that had a working implementation! (albeit, in a laboratory) 70

system that had a working implementation! (albeit, in a laboratory) • Evaluation is somewhat questionable  Wide variance in latencies without explanation; large forwarding chains are never evaluated; hard to understand where certain latency is coming from 70

system that had a working implementation! (albeit, in a laboratory) • Evaluation is somewhat questionable  Wide variance in latencies without explanation; large forwarding chains are never evaluated; hard to understand where certain latency is coming from • "cost of communication is outweighed by the gain in parallelism."  Unclear where the parallelism gains originate from in the system, or why these would override the cost of communication in latency penalties 70

Partitions • Object unavailability  Objects in the system will become
temporarily unavailable under network partitions 71

temporarily unavailable under network partitions • Disruption to forwarding chains  Partitions can also be very disruptive to forwarding chains where intermediary nodes may be unavailable 71

temporarily unavailable under network partitions • Disruption to forwarding chains  Partitions can also be very disruptive to forwarding chains where intermediary nodes may be unavailable • Fallback  We can fallback to the object’s store site or using the name service, but these are also susceptible to network partitions as well 71

“Exactly-Once” Semantics • Invocation can fail for many reasons  Both
invoker and invokee can fail at any point; invokee can fail after performing side-effect but before responding 72

invoker and invokee can fail at any point; invokee can fail after performing side-effect but before responding • Recovery from stable storage  Recovery does not guarantee “exactly-once” semantics; some invocations may be retried upon recovery if they were performed before a checkpoint 72

invoker and invokee can fail at any point; invokee can fail after performing side-effect but before responding • Recovery from stable storage  Recovery does not guarantee “exactly-once” semantics; some invocations may be retried upon recovery if they were performed before a checkpoint • Idempotence  Idempotence is the best strategy for mitigation of these issues 72

In Conclusion 73

In Summary • Addresses how to locate objects that are
mobile  Through the use of forwarding chains and “home” sites, eliminate the explicit binding step 74

mobile  Through the use of forwarding chains and “home” sites, eliminate the explicit binding step • Selection and placement are still up to the user  Developers still need to be concerned with where to place objects, and how to select the target objects of invocation 74

mobile  Through the use of forwarding chains and “home” sites, eliminate the explicit binding step • Selection and placement are still up to the user  Developers still need to be concerned with where to place objects, and how to select the target objects of invocation • RPC is still a problematic paradigm  Issues with duplicate invocation, idempotence, and sequencing of operations remain challenges for the developer 74

Further Reading • Lee, Collin, Seo Jin Park, Ankita Kejriwal,
Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” 75

Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” • Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” 75

Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” • Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” • Kendall, Samuel C, Jim Waldo, Ann Wollrath, and Geoff Wyant. 1994. “A Note on Distributed Computing.” 75

Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” • Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” • Kendall, Samuel C, Jim Waldo, Ann Wollrath, and Geoff Wyant. 1994. “A Note on Distributed Computing.” • Black, Andrew P, Norman C Hutchinson, Eric Jul, and Henry M Levy. 2007. “The Development of the Emerald Programming Language.” 75

76 Christopher Meiklejohn @cmeik Thanks!

Implementing Location Independent Invocation

Implementing Location Independent Invocation

More Decks by Christopher Meiklejohn

Other Decks in Research

Featured

Transcript