Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Implementing Location Independent Invocation

Implementing Location Independent Invocation

Papers We Love, Paris
April 2016

Christopher Meiklejohn

April 26, 2016
Tweet

More Decks by Christopher Meiklejohn

Other Decks in Research

Transcript

  1. Implementing Location Independent Invocation 1 Andrew P. Black and Yeshayahu

    Artsy Digital Equipment Corporation Distributed Systems Advanced Development Group IEEE Transactions on Parallel and Distributed Systems 1990
  2. History • Digital Equipment Corporation 
 Work performed at the

    Distributed Systems Advanced Development Group 3
  3. History • Digital Equipment Corporation 
 Work performed at the

    Distributed Systems Advanced Development Group • IEEE ICDCS 1989
 Preliminary version at 9th International Conference on Distributed Computing Systems 3
  4. History • Digital Equipment Corporation 
 Work performed at the

    Distributed Systems Advanced Development Group • IEEE ICDCS 1989
 Preliminary version at 9th International Conference on Distributed Computing Systems • IEEE TPDS 1990
 Final version in the IEEE Transactions on Parallel and Distributed Systems 3
  5. Remote Procedure Call • Remote Procedure Call (RPC)
 Eases the

    development of distributed applications 5
  6. Remote Procedure Call • Remote Procedure Call (RPC)
 Eases the

    development of distributed applications • Transfer of control
 Between, instead of within, address spaces 5
  7. Remote Procedure Call • Remote Procedure Call (RPC)
 Eases the

    development of distributed applications • Transfer of control
 Between, instead of within, address spaces • “Alleviate the need to be aware…”
 Abstraction hides away network protocols, parameter marshaling, external data representations… 5
  8. Intrinsic Differences • Different address spaces
 Makes passing objects by

    reference difficult, given the reference will no longer be valid 6
  9. Intrinsic Differences • Different address spaces
 Makes passing objects by

    reference difficult, given the reference will no longer be valid • Failure modes
 Both the caller and callee can fail independently 6
  10. Two Methods For Binding • Default binding
 Server is chosen

    automatically • Single server, no other choice 9
  11. Two Methods For Binding • Default binding
 Server is chosen

    automatically • Single server, no other choice • All servers are semantically equivalent 9
  12. Two Methods For Binding • Default binding
 Server is chosen

    automatically • Single server, no other choice • All servers are semantically equivalent • Clerks
 Application specific subroutines or packages 9
  13. Two Methods For Binding • Default binding
 Server is chosen

    automatically • Single server, no other choice • All servers are semantically equivalent • Clerks
 Application specific subroutines or packages • Each application must provide one 9
  14. Location Independent Invocation • Removes the binding step
 Abstraction above

    RPC that hides the explicit binding from the application developer 10
  15. Location Independent Invocation • Removes the binding step
 Abstraction above

    RPC that hides the explicit binding from the application developer • Conceptual presentation
 Provides a conceptual presentation, without specifics of previous implementations 10
  16. Location Independent Invocation • Removes the binding step
 Abstraction above

    RPC that hides the explicit binding from the application developer • Conceptual presentation
 Provides a conceptual presentation, without specifics of previous implementations • Emerald
 Distributed programming language with runtime system 10
  17. Location Independent Invocation • Removes the binding step
 Abstraction above

    RPC that hides the explicit binding from the application developer • Conceptual presentation
 Provides a conceptual presentation, without specifics of previous implementations • Emerald
 Distributed programming language with runtime system • Eden
 Distributed operating system 10
  18. When is LII useful? • Pure Functions / Binding as

    Load Balancing
 Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing 11
  19. When is LII useful? • Pure Functions / Binding as

    Load Balancing
 Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing • Application Data
 Selection based on correctness and performance 11
  20. When is LII useful? • Pure Functions / Binding as

    Load Balancing
 Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing • Application Data
 Selection based on correctness and performance • Partitioned data and correctness
 Not all servers can answer for all requests 11
  21. When is LII useful? • Pure Functions / Binding as

    Load Balancing
 Example: a fast Fourier transform where all servers are equivalent and selection is for load balancing • Application Data
 Selection based on correctness and performance • Partitioned data and correctness
 Not all servers can answer for all requests • Replicated data and performance (or availability)
 Choice can be based on desired performance or inherent availability trade-off with consistency 11
  22. The Registry Problem • Mapping
 If service instances can move,

    we need to keep track of where they are running 12
  23. The Registry Problem • Mapping
 If service instances can move,

    we need to keep track of where they are running • Churn Rate
 Based on what the churn rate is, different mechanisms for tracking might be required 12
  24. “frequently” 15 automation when instances "move frequently enough to make

    the implementation of a static mapping function impractical."
  25. Process Migration • Sharing memory
 Better utilization of memory across

    a cluster • Reducing communication costs
 Co-location of processes that work together on a task 17
  26. Process Migration • Sharing memory
 Better utilization of memory across

    a cluster • Reducing communication costs
 Co-location of processes that work together on a task • Increasing availability
 Replication or mobility to increase fault-tolerance 17
  27. Process Migration • Sharing memory
 Better utilization of memory across

    a cluster • Reducing communication costs
 Co-location of processes that work together on a task • Increasing availability
 Replication or mobility to increase fault-tolerance • Reconfigurability
 Reconfiguration of machines, application, or network topology 17
  28. Process Migration • Sharing memory
 Better utilization of memory across

    a cluster • Reducing communication costs
 Co-location of processes that work together on a task • Increasing availability
 Replication or mobility to increase fault-tolerance • Reconfigurability
 Reconfiguration of machines, application, or network topology • Special capabilities
 Access to specialized hardware only available on certain machines 17
  29. Object Migration • Locating the object
 Need to ensure object

    can be found at its new home • Recovery of object
 In the event of a failure, the object has to be able to be recovered safely 20
  30. Object Migration • Locating the object
 Need to ensure object

    can be found at its new home • Recovery of object
 In the event of a failure, the object has to be able to be recovered safely • Required by some applications
 Objects may have to be local for some operations to succeed 20
  31. Object Migration • Locating the object
 Need to ensure object

    can be found at its new home • Recovery of object
 In the event of a failure, the object has to be able to be recovered safely • Required by some applications
 Objects may have to be local for some operations to succeed • “Quality” improvement in interaction
 “Cost is justified” when moving objects if interaction quality increases 20
  32. Overview • Expense voucher system at Digital
 Corporation-wide, real-life application

    • Process • Vouchers filled in by employees • Approved or rejected by managers 22
  33. Overview • Expense voucher system at Digital
 Corporation-wide, real-life application

    • Process • Vouchers filled in by employees • Approved or rejected by managers • If approved, cash disbursement is made and forms are archived 22
  34. Overview • Expense voucher system at Digital
 Corporation-wide, real-life application

    • Process • Vouchers filled in by employees • Approved or rejected by managers • If approved, cash disbursement is made and forms are archived • Remote, geo-distributed, asynchronous process
 Actions can occur on the order of minutes or days, at any location in Digital’s 36,000 node global, internal network 22
  35. “These requirements make it infeasible to store all the forms

    in a single centralized database, or even in a number of geographically dispersed databases.” 23
  36. Represent the data and code of each form as an

    object that can move around the network as the application demands. 24
  37. (Mobile) Objects • Objects have: • State • Methods •

    Objects control: • Persistence
 How state should be persisted for each object 25
  38. (Mobile) Objects • Objects have: • State • Methods •

    Objects control: • Persistence
 How state should be persisted for each object • Recovery
 How objects can be recovered, if they happen to fail 25
  39. (Mobile) Objects • Objects have: • State • Methods •

    Objects control: • Persistence
 How state should be persisted for each object • Recovery
 How objects can be recovered, if they happen to fail • Placement
 Where objects should be located on the network 25
  40. (Mobile) Objects • Objects have: • State • Methods •

    Objects control: • Persistence
 How state should be persisted for each object • Recovery
 How objects can be recovered, if they happen to fail • Placement
 Where objects should be located on the network • Remove invocation
 Objects can invoke methods on other objects 25
  41. "Thus, the view of the world presented to application programmers

    is a distributed ocean in which application-dependent objects of their own design can be floated." 26
  42. Node Services • Active Objects
 Objects that are active are

    stored in virtual memory • Stable Storage
 Stable storage is used for objects that are not currently referenced 27
  43. Node Makeup • Supervisor
 Object creation, location, and relocation •

    Application Objects
 Virtual memory containing the active objects in the system for each application 28
  44. Node Makeup • Supervisor
 Object creation, location, and relocation •

    Application Objects
 Virtual memory containing the active objects in the system for each application • Intra-object Communication System
 Location Independent Invocation system with underlying RPC mechanism 28
  45. Storesites • Checkpointing
 Objects periodically checkpoint their state at storesites.

    • Recovery
 Storesites support object recovery in the event of node failure 30
  46. Storesites • Checkpointing
 Objects periodically checkpoint their state at storesites.

    • Recovery
 Storesites support object recovery in the event of node failure • Recovery safety and liveness
 Object’s current location stored at the storesite, and until object is confirmed dead, recovery is prevented 30
  47. Additional Services • Name Service
 Locating storesites, supervisors, and objects

    by name. • Authentication Service
 For authentication between supervisors 31
  48. How’s it built? • Window-based interface
 For interacting with objects

    and their local Hermes node • Processes
 Objects as processes that are distributed across the network 32
  49. How’s it built? • Window-based interface
 For interacting with objects

    and their local Hermes node • Processes
 Objects as processes that are distributed across the network • Modula-2+
 RPC and multithreading support from Digital SRC 32
  50. Objects • Globally unique identifiers
 Each object has a globally

    unique identifier for its lifetime • Age
 Each object has an age containing a monotonic counter that advances when the object attempt to move between nodes 35
  51. Objects • Globally unique identifiers
 Each object has a globally

    unique identifier for its lifetime • Age
 Each object has an age containing a monotonic counter that advances when the object attempt to move between nodes • Location and storesite
 Each object contains a current location and a current storesite 35
  52. Temporal Address Descriptors • Temporal Address Descriptors (tad)
 Pair of

    node identifier and its age that represents an objects location at some point in time 36
  53. Temporal Address Descriptors • Temporal Address Descriptors (tad)
 Pair of

    node identifier and its age that represents an objects location at some point in time • Passed implicitly
 Passed implicitly along with the guid when the object is passed by reference 36
  54. Temporal Address Descriptors • Temporal Address Descriptors (tad)
 Pair of

    node identifier and its age that represents an objects location at some point in time • Passed implicitly
 Passed implicitly along with the guid when the object is passed by reference • Cached
 Supervisors cache tad’s locally for currently, and previously, referenced objects 36
  55. Locating an Object for Invocation • Local invocation attempted first


    Invocation if local; else, we must follow the tad to identify the current location 37
  56. Locating an Object for Invocation • Local invocation attempted first


    Invocation if local; else, we must follow the tad to identify the current location • Follow path of tads until object located
 Follow tads, returning the most recent tad back to the forwarding node until the object is found 37
  57. Locating an Object for Invocation • Local invocation attempted first


    Invocation if local; else, we must follow the tad to identify the current location • Follow path of tads until object located
 Follow tads, returning the most recent tad back to the forwarding node until the object is found • Update cache
 Each node updates its local cache of tad’s to optimize subsequent invocations 37
  58. End of Forwarding Chain • Forwarding chain doesn’t locate object


    If we can’t find the object via forwarding, we need to resort to asking the storesite for the current location 38
  59. End of Forwarding Chain • Forwarding chain doesn’t locate object


    If we can’t find the object via forwarding, we need to resort to asking the storesite for the current location • Forwarding pointers are still necessary
 However, since objects can move immediately after we access the storesite, forwarding pointers are still necessary 38
  60. End of Forwarding Chain • Forwarding chain doesn’t locate object


    If we can’t find the object via forwarding, we need to resort to asking the storesite for the current location • Forwarding pointers are still necessary
 However, since objects can move immediately after we access the storesite, forwarding pointers are still necessary • Resort to name service
 If objects migrate storesites, it may be required to contact the global name service to identify the current storesite 38
  61. Invocation and location are combined to prevent an object from

    moving after identifying it’s location. 40
  62. Idempotence and Sequencing • Invocations can succeed but fail prior

    to response
 Processing of an invocation and response to an invocation are not atomic 42
  63. Idempotence and Sequencing • Invocations can succeed but fail prior

    to response
 Processing of an invocation and response to an invocation are not atomic • Idempotence is one solution
 Given recovery may trigger duplicate invocation, ensuring idempotence in methods is essential 42
  64. Idempotence and Sequencing • Invocations can succeed but fail prior

    to response
 Processing of an invocation and response to an invocation are not atomic • Idempotence is one solution
 Given recovery may trigger duplicate invocation, ensuring idempotence in methods is essential • Otherwise, sequencing
 …or, simply put, you could just use consensus. 42
  65. A B C X Y migration (A, 0) -> (B,

    1) Y Y migration (B, 1) -> (C, 2)
  66. A B C X Y migration (A, 0) -> (B,

    1) Y Y migration (B, 1) -> (C, 2) local invocation fails
  67. A B C X Y migration (A, 0) -> (B,

    1) Y Y migration (B, 1) -> (C, 2) local invocation fails invoke on Y with tad (B, 1) updated tad (C, 2)
  68. A B C X Y migration (A, 0) -> (B,

    1) Y Y migration (B, 1) -> (C, 2) local invocation fails invoke on Y with tad (B, 1) updated tad (C, 2) invoke on Y with tad (C, 2)
  69. A B C X Y migration (A, 0) -> (B,

    1) Y Y migration (B, 1) -> (C, 2) local invocation fails invoke on Y with tad (B, 1) updated tad (C, 2) invoke on Y with tad (C, 2) send response!
  70. Throw Error • Return error if tad is out of

    date
 If temporal address descriptor is out of date, return an error immediately 54
  71. Throw Error • Return error if tad is out of

    date
 If temporal address descriptor is out of date, return an error immediately • Simplifies failure handling
 Control is returned to the invoker immediately; thread of control is not consumed 54
  72. Throw Error • Return error if tad is out of

    date
 If temporal address descriptor is out of date, return an error immediately • Simplifies failure handling
 Control is returned to the invoker immediately; thread of control is not consumed • Invoker must retry with new tad
 Invoker must update local information with new tad and repeat invocation 54
  73. Invocation Propagation • Propagation of invocation to location referenced by

    tad
 Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor 55
  74. Invocation Propagation • Propagation of invocation to location referenced by

    tad
 Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor • Ties up thread of control
 Until object located, thread of control is tied up 55
  75. Invocation Propagation • Propagation of invocation to location referenced by

    tad
 Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor • Ties up thread of control
 Until object located, thread of control is tied up • More prone to disruption
 Failures in the middle of the chain can cause loss of availability 55
  76. Invocation Propagation • Propagation of invocation to location referenced by

    tad
 Propagation of the invocation to the location that is referenced by the nodes temporal address descriptor • Ties up thread of control
 Until object located, thread of control is tied up • More prone to disruption
 Failures in the middle of the chain can cause loss of availability • Allows cache maintenance
 All nodes along the path can be updated with up-to-date tads as forwarding occurs 55
  77. Hybrid Approach • Propagation is used for finite hop count


    Propagation of the invocation is done for a finite number of hops, until a hop count is exceeded 56
  78. Hybrid Approach • Propagation is used for finite hop count


    Propagation of the invocation is done for a finite number of hops, until a hop count is exceeded • Return error to invoker
 Error is returned to the invoker, and the invoker must try again 56
  79. Hybrid Approach • Propagation is used for finite hop count


    Propagation of the invocation is done for a finite number of hops, until a hop count is exceeded • Return error to invoker
 Error is returned to the invoker, and the invoker must try again • Short forwarding chains
 Performed under the belief that forwarding chains will typically be short 56
  80. Tad Maintenance • Stash
 Local cache always keeps the most

    recent tad for a given object • Long chains reduce performance 57
  81. Tad Maintenance • Stash
 Local cache always keeps the most

    recent tad for a given object • Long chains reduce performance • Latency & throughput are reduced linearly 57
  82. Tad Maintenance • Stash
 Local cache always keeps the most

    recent tad for a given object • Long chains reduce performance • Latency & throughput are reduced linearly • Availability is reduced exponentially 57
  83. Locating through Storesites • Solution for broken forwarding chains
 Fallback

    to using storesites for locating objects • Initial storesite
 When created, objects have an initial storesite; this location along with 2PC is used to track object location after migrations 59
  84. Locating through Storesites • Solution for broken forwarding chains
 Fallback

    to using storesites for locating objects • Initial storesite
 When created, objects have an initial storesite; this location along with 2PC is used to track object location after migrations • Invocation can race with migration
 Forwarding pointers are still required for finding the current location of a process 59
  85. Storesite Migration • Forwarding pointers
 Support object migration from storesites

    through the use of forwarding pointers • Encode initial and track after first migration
 Encode the storesite in the globally unique identifier and register with the name service after first migration 60
  86. Storesite Migration • Forwarding pointers
 Support object migration from storesites

    through the use of forwarding pointers • Encode initial and track after first migration
 Encode the storesite in the globally unique identifier and register with the name service after first migration • Query in parallel
 Query both the storesite and global name service in parallel to reduce latency for locating objects 60
  87. Previous Systems • Eden: OS-level approach
 Process migration supported with

    ‘hints’; fallback to stable storage and network broadcast for identifying current location. 62
  88. Previous Systems • Eden: OS-level approach
 Process migration supported with

    ‘hints’; fallback to stable storage and network broadcast for identifying current location. • Emerald: language-level approach
 Process migration supported with forwarding pointers; fallback to stable storage with broadcast and pairwise inspection for identifying current location. 62
  89. “While an attractive paradigm for the future, we judge that

    we were unlikely to successfully introduce a new programming language for commercial distributed applications.” 66
  90. Related Systems • Demos/MP
 Unidirectional links for process migration which

    are extremely similar to temporal address descriptors 67
  91. Related Systems • Demos/MP
 Unidirectional links for process migration which

    are extremely similar to temporal address descriptors • Locus, MOS, R*
 “Home” machine tracks current location of processes that were birthed there which is a similar idea to storesites 67
  92. Implementation • Modula-2+ specific
 Many details in the implementation are

    specific to Modula-2+ • RPC ‘stubs’
 Local and remote stubs are used to wrap calls with the code for performing maintenance of the tad lifecycle: caching, forwarding, etc. 69
  93. Implementation • Modula-2+ specific
 Many details in the implementation are

    specific to Modula-2+ • RPC ‘stubs’
 Local and remote stubs are used to wrap calls with the code for performing maintenance of the tad lifecycle: caching, forwarding, etc. • Fix and unfix
 Objects are ‘fixed’ during the duration of the call, as to prevent object migration during invocation 69
  94. Questionable Evaluation • Actual, real-life system!
 Hermes is an actual

    system that had a working implementation! (albeit, in a laboratory) 70
  95. Questionable Evaluation • Actual, real-life system!
 Hermes is an actual

    system that had a working implementation! (albeit, in a laboratory) • Evaluation is somewhat questionable
 Wide variance in latencies without explanation; large forwarding chains are never evaluated; hard to understand where certain latency is coming from 70
  96. Questionable Evaluation • Actual, real-life system!
 Hermes is an actual

    system that had a working implementation! (albeit, in a laboratory) • Evaluation is somewhat questionable
 Wide variance in latencies without explanation; large forwarding chains are never evaluated; hard to understand where certain latency is coming from • "cost of communication is outweighed by the gain in parallelism."
 Unclear where the parallelism gains originate from in the system, or why these would override the cost of communication in latency penalties 70
  97. Partitions • Object unavailability
 Objects in the system will become

    temporarily unavailable under network partitions 71
  98. Partitions • Object unavailability
 Objects in the system will become

    temporarily unavailable under network partitions • Disruption to forwarding chains
 Partitions can also be very disruptive to forwarding chains where intermediary nodes may be unavailable 71
  99. Partitions • Object unavailability
 Objects in the system will become

    temporarily unavailable under network partitions • Disruption to forwarding chains
 Partitions can also be very disruptive to forwarding chains where intermediary nodes may be unavailable • Fallback
 We can fallback to the object’s store site or using the name service, but these are also susceptible to network partitions as well 71
  100. “Exactly-Once” Semantics • Invocation can fail for many reasons
 Both

    invoker and invokee can fail at any point; invokee can fail after performing side-effect but before responding 72
  101. “Exactly-Once” Semantics • Invocation can fail for many reasons
 Both

    invoker and invokee can fail at any point; invokee can fail after performing side-effect but before responding • Recovery from stable storage
 Recovery does not guarantee “exactly-once” semantics; some invocations may be retried upon recovery if they were performed before a checkpoint 72
  102. “Exactly-Once” Semantics • Invocation can fail for many reasons
 Both

    invoker and invokee can fail at any point; invokee can fail after performing side-effect but before responding • Recovery from stable storage
 Recovery does not guarantee “exactly-once” semantics; some invocations may be retried upon recovery if they were performed before a checkpoint • Idempotence
 Idempotence is the best strategy for mitigation of these issues 72
  103. In Summary • Addresses how to locate objects that are

    mobile
 Through the use of forwarding chains and “home” sites, eliminate the explicit binding step 74
  104. In Summary • Addresses how to locate objects that are

    mobile
 Through the use of forwarding chains and “home” sites, eliminate the explicit binding step • Selection and placement are still up to the user
 Developers still need to be concerned with where to place objects, and how to select the target objects of invocation 74
  105. In Summary • Addresses how to locate objects that are

    mobile
 Through the use of forwarding chains and “home” sites, eliminate the explicit binding step • Selection and placement are still up to the user
 Developers still need to be concerned with where to place objects, and how to select the target objects of invocation • RPC is still a problematic paradigm
 Issues with duplicate invocation, idempotence, and sequencing of operations remain challenges for the developer 74
  106. Further Reading • Lee, Collin, Seo Jin Park, Ankita Kejriwal,

    Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” 75
  107. Further Reading • Lee, Collin, Seo Jin Park, Ankita Kejriwal,

    Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” • Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” 75
  108. Further Reading • Lee, Collin, Seo Jin Park, Ankita Kejriwal,

    Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” • Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” • Kendall, Samuel C, Jim Waldo, Ann Wollrath, and Geoff Wyant. 1994. “A Note on Distributed Computing.” 75
  109. Further Reading • Lee, Collin, Seo Jin Park, Ankita Kejriwal,

    Satoshi Matsushita, and John Ousterhout. 2015. “Implementing Linearizability at Large Scale and Low Latency.” • Helland, Pat. 2012. “Idempotence Is Not a Medical Condition.” • Kendall, Samuel C, Jim Waldo, Ann Wollrath, and Geoff Wyant. 1994. “A Note on Distributed Computing.” • Black, Andrew P, Norman C Hutchinson, Eric Jul, and Henry M Levy. 2007. “The Development of the Emerald Programming Language.” 75