Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Systems Are a UX Problem

Dcbf01e42178cd9698fb3d4806e33d84?s=47 Tyler Treat
October 30, 2018

Distributed Systems Are a UX Problem

Distributed systems are not strictly an engineering problem. It’s far too easy to assume a backend development concern, but the reality is there are implications at every point in the stack. Often the trade-offs we make lower in the stack in order to buy responsiveness bubble up to the top—so much, in fact, that it rarely doesn’t impact the application in some way.

Distributed systems affect the user. We need to shift the focus from system properties and guarantees to business rules and application behavior. We need to understand the limitations and trade-offs at each level in the stack and why they exist. We need to assume failure and plan for recovery. We need to start thinking of distributed systems as a UX problem.

Tyler Treat looks at distributed systems through the lens of user experience, observing how architecture, design patterns, and business problems all coalesce into UX. Tyler also shares system design anti-patterns and alternative patterns for building reliable and scalable systems with respect to business outcomes.

Topic include:

- The “truth” can be prohibitively expensive: When does strong consistency make sense, and when does it not? How do we reconcile this with application UX?
- Failure as an inevitability: If we can’t build perfect systems, what is “good enough”?
- Dealing with partial knowledge: Systems usually operate in the real world (e.g., an inventory application for a widget warehouse). How do we design for the “disconnect” between the real world and the system?

Dcbf01e42178cd9698fb3d4806e33d84?s=128

Tyler Treat

October 30, 2018
Tweet

Transcript

  1. @tyler_treat Distributed Systems Are a
 UX Problem Tyler Treat /

    O’Reilly Software Architecture Conference / October 30, 2018
  2. @tyler_treat Tyler Treat
 tyler.treat@realkinetic.com

  3. @tyler_treat I like distributed systems.

  4. @tyler_treat

  5. @tyler_treat

  6. @tyler_treat Disclaimer:
 I know approximately nothing about UX…

  7. @tyler_treat …other than when I’m the user, I know when

    my experience is good and when it’s bad.
  8. @tyler_treat

  9. @tyler_treat UX

  10. @tyler_treat UX Systems

  11. @tyler_treat UX Systems

  12. @tyler_treat UX Systems Business

  13. @tyler_treat UX Systems Business This
 Talk

  14. @tyler_treat The Yin and Yang of UX and Architecture

  15. @tyler_treat Monolith

  16. @tyler_treat Monolith

  17. @tyler_treat Service Service Service Service Service Service Service Serv Service

  18. @tyler_treat Service Service Service Service Service Service Service Serv Service

  19. @tyler_treat Service Service Service Service Service Service Service Serv Service

  20. @tyler_treat Implications

  21. @tyler_treat

  22. @tyler_treat book trip Trip Service Trip Database transaction Good old

    days
  23. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service

    Trip Service transaction transaction transaction
  24. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service

    Trip Service transaction transaction transaction ACID ACID ACID
  25. @tyler_treat UX Implications of Microservices • Data consistency

  26. @tyler_treat Service Service Service Service Service Service Service Serv Service

  27. @tyler_treat Service Service Service Service Service Service Service Serv Service

  28. @tyler_treat UX Implications of Microservices • Data consistency • Race

    conditions
  29. @tyler_treat

  30. @tyler_treat UX Implications of Microservices • Data consistency • Race

    conditions • Performance
  31. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service

    Trip Service transaction transaction transaction
  32. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service

    Trip Service transaction transaction transaction
  33. @tyler_treat UX Implications of Microservices • Data consistency • Race

    conditions • Performance • Partial failure
  34. @tyler_treat So are microservices bad?

  35. @tyler_treat Microservices are about
 people scale.

  36. @tyler_treat Transparency

  37. @tyler_treat A Study of Transparency and Adaptability of Heterogeneous Computer

    Networks with TCP/IP and IPv6 Protocols
 Das, 2012 “Any change in a computing system, such as a new feature or new component, is transparent if the system after change adheres to previous external interface as much as possible while changing its internal behavior.”
  38. @tyler_treat System

  39. @tyler_treat System

  40. @tyler_treat High Transparency Low Transparency

  41. @tyler_treat NFS High Transparency Low Transparency

  42. @tyler_treat NFS FTP High Transparency Low Transparency

  43. @tyler_treat Types of Transparencies Access transparency Location transparency Migration transparency

    Relocation transparency Replication transparency Concurrent transparency Failure transparency Persistence transparency Security transparency
  44. @tyler_treat Transparency is about usability.

  45. @tyler_treat Usability Control

  46. @tyler_treat Usability Control

  47. @tyler_treat Usability Control

  48. @tyler_treat Simplicity Flexibility, Performance,
 Correctness RPC

  49. @tyler_treat Simplicity Flexibility, Performance,
 Correctness Erlang Message Passing

  50. @tyler_treat RPC Erlang
 Message Passing High Transparency Low Transparency

  51. @tyler_treat Translating UX for developers: APIs

  52. @tyler_treat Transparencies simplify the API of a system.

  53. @tyler_treat UX is about deciding what knobs to expose.

  54. @tyler_treat The Truth is Prohibitively Expensive Balancing Consistency and UX

  55. @tyler_treat book trip Trip Service Trip Database transaction Good old

    days
  56. @tyler_treat book trip Trip Service Trip Database transaction Good old

    days Transparency
  57. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service

    Trip Service transaction transaction transaction Transparency
  58. @tyler_treat book trip Microservices Airline Service Hotel Service Car Service

    Trip Service transaction transaction transaction ACID ACID ACID Transparency
  59. @tyler_treat

  60. @tyler_treat

  61. @tyler_treat

  62. @tyler_treat Spreadsheet service

  63. @tyler_treat Spreadsheet service Document service

  64. @tyler_treat Spreadsheet service Document service Presentation service

  65. @tyler_treat Spreadsheet service Document service Presentation service IAM service

  66. @tyler_treat Spreadsheet service Document service Presentation service IAM service consistent

  67. @tyler_treat Consistency is about ordering of events in a distributed

    system.
  68. @tyler_treat Why is this hard?

  69. None
  70. @tyler_treat So what can we do?

  71. @tyler_treat Coordinate

  72. @tyler_treat Two-Phase Commit

  73. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car

    Service Trip Service propose propose propose
  74. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car

    Service Trip Service vote vote vote
  75. @tyler_treat book trip 2PC Commit Airline Service Hotel Service Car

    Service Trip Service commit/abort commit/abort commit/abort
  76. @tyler_treat book trip 2PC Commit Airline Service Hotel Service Car

    Service Trip Service done done done
  77. @tyler_treat Problems with 2PC • Chatty protocol: beholden to network

    latency • Limited throughput • Transaction coordinator: single point of failure • Blocking protocol: susceptible to deadlock
  78. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car

    Service Trip Service propose propose propose
  79. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car

    Service Trip Service propose propose propose
  80. @tyler_treat book trip 2PC Prepare Airline Service Hotel Service Car

    Service Trip Service propose propose propose
  81. @tyler_treat Add more phases!

  82. @tyler_treat Three-Phase Commit

  83. @tyler_treat

  84. @tyler_treat atomic clocks NTP GPS TrueTime

  85. @tyler_treat Good news:
 we solved physics.

  86. @tyler_treat Bad news:
 it costs all the money.

  87. @tyler_treat Not exactly…

  88. @tyler_treat Spanner: Google’s Globally-Distributed Database
 Corbett et al.

  89. @tyler_treat TrueTime forces that uncertainty to the surface, and Spanner

    provides a transparency over it.
  90. @tyler_treat Spanner doesn’t avoid trade-offs, it just minimizes their probability.

  91. @tyler_treat Spanner is expensive and proprietary.

  92. @tyler_treat But it’s not the end of the story…

  93. @tyler_treat Unless every service is backed by the same database,

    you probably still have to deal with consistency problems.
  94. @tyler_treat Challenges to Adopting Stronger Consistency at Scale
 Ajoux et

    al., 2015 “The biggest barrier to providing stronger consistency guarantees…is that the consistency mechanism must integrate consistency across many stateful services.”
  95. @tyler_treat Coordination is expensive because processes can’t make progress independently.

  96. @tyler_treat

  97. @tyler_treat

  98. @tyler_treat Peter Bailis, 2015 https://speakerdeck.com/pbailis/silence-is-golden-coordination-avoiding-systems-design

  99. @tyler_treat And what about partial failure?

  100. @tyler_treat

  101. @tyler_treat

  102. @tyler_treat

  103. @tyler_treat

  104. @tyler_treat

  105. @tyler_treat Memories, Guesses, and Apologies Dealing with Partial Knowledge

  106. @tyler_treat The cost of knowing the “truth” can be prohibitively

    expensive.
  107. @tyler_treat And partial failure means the “truth” is also fragile.

  108. @tyler_treat Where does this leave us?

  109. @tyler_treat We could go back to the monolith.

  110. @tyler_treat We could build expensive data centers with fancy hardware…

    @tyler_treat
  111. @tyler_treat …or we could rethink our transparencies.

  112. @tyler_treat @tyler_treat

  113. None
  114. @tyler_treat Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf

  115. @tyler_treat Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf

  116. @tyler_treat Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf

  117. @tyler_treat Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf

  118. @tyler_treat Exception Handling in Asynchronous Systems

  119. @tyler_treat

  120. @tyler_treat Exception Handling in Asynchronous Systems • Write-off

  121. @tyler_treat

  122. @tyler_treat Exception Handling in Asynchronous Systems • Write-off • Retry

  123. @tyler_treat

  124. @tyler_treat Exception Handling in Asynchronous Systems • Write-off • Retry

    • Compensating action
  125. @tyler_treat Revisiting Two-Phase Commit

  126. @tyler_treat Sagas

  127. @tyler_treat Sagas
 Garcia-Molina & Salem, 1987 “A long-lived transaction is

    a saga if it can be written as a sequence of transactions that can be interleaved with other transactions…Either all the transactions in a saga are successfully completed or compensating transactions are run to amend a partial execution.”
  128. @tyler_treat Sagas
 Garcia-Molina & Salem, 1987 “A long-lived transaction is

    a saga if it can be written as a sequence of transactions that can be interleaved with other transactions…Either all the transactions in a saga are successfully completed or compensating transactions are run to amend a partial execution.”
  129. @tyler_treat Sagas split long-lived transactions into individual, interleaved sub-transactions: T

    = T1 , T2 , . . . , Tn
  130. @tyler_treat And each sub-transaction has a compensating transaction: C1 ,

    C2 , . . . , Cn
  131. @tyler_treat T1 , T2 , . . . , Tn

    T1 , T2 , . . . , Tj , Cj , . . . , C2 , C1 Sagas guarantee one of two execution sequences:
  132. @tyler_treat book trip Airline Service Hotel Service Car Service Trip

    Service transaction transaction transaction
  133. @tyler_treat • Book flight • Book hotel • Book car

    • Charge money T = T1 , T2 , . . . , Tn
  134. @tyler_treat • Cancel flight • Cancel hotel • Cancel car

    • Refund money C1 , C2 , . . . , Cn
  135. @tyler_treat Compensating transactions must be idempotent.

  136. @tyler_treat Sagas trade off isolation for availability.

  137. @tyler_treat Event-Driven

  138. @tyler_treat book trip Airline Service Hotel Service Car Service Trip

    Service transaction transaction transaction
  139. @tyler_treat event Airline Service Hotel Service Car Service Trip Service

    event event event
  140. @tyler_treat event Airline Service Hotel Service Car Service Trip Service

    event event event
  141. @tyler_treat System Properties Business Rules

  142. @tyler_treat Sean T. Allen “People don’t want distributed transactions, they

    just want the guarantees that distributed transactions give them.”
  143. @tyler_treat CAP theorem

  144. @tyler_treat CAP Theorem • Consistency, Availability, Partition Tolerance • When

    a partition occurs, do we: • Choose availability and give up consistency?
 
 - or - • Choose consistency and give up availability?
  145. @tyler_treat CAP Theorem • Consistency, Availability, Partition Tolerance • When

    a partition occurs, do we: • Choose availability and give up consistency?
 
 - or - • Choose consistency and give up availability? (or YOLO it)
  146. @tyler_treat The CAP theorem is a UX question…

  147. @tyler_treat When a partial failure occurs, how do you want

    the application to behave?
  148. @tyler_treat

  149. @tyler_treat

  150. @tyler_treat We can choose consistency and sacrifice availability…

  151. @tyler_treat …or we can choose availability by making local decisions

    with the knowledge at hand and designing the UX accordingly.
  152. @tyler_treat Managing partial failure is a matter of dealing with

    partial knowledge…
  153. @tyler_treat …and managing risk.

  154. @tyler_treat Check value
 < $10,000? Our risk appetite can drive

    business rules. Clear locally Double check with
 all replicas before
 clearing yes no
  155. @tyler_treat Memories, guesses, and apologies

  156. @tyler_treat Computers operate with partial knowledge.

  157. @tyler_treat Either there’s a disconnect with the “real world”…

  158. @tyler_treat …or there’s a disconnect between systems.

  159. @tyler_treat Systems don’t make decisions, they make guesses.

  160. @tyler_treat Systems have memory.

  161. @tyler_treat Memories help systems make better guesses in the future.

  162. @tyler_treat Forgetfulness is a business decision.

  163. @tyler_treat Sometimes the system guesses wrong.

  164. @tyler_treat Systems need the capacity to apologize.

  165. @tyler_treat Customers judge you not by your failures, but by

    how you handle your failures.
  166. @tyler_treat Are you building systems that never fail or systems

    that fail gracefully?
  167. @tyler_treat

  168. @tyler_treat Businesses need both code and people to manage apologies.

  169. @tyler_treat It becomes less about trying to build the perfect

    system and more about how we cope with an imperfect one.
  170. @tyler_treat Wrapping Up Summary and Observations

  171. @tyler_treat

  172. @tyler_treat @tyler_treat

  173. @tyler_treat ACID distributed transactions exactly-once delivery ordered delivery serializable isolation

    linearizability System Properties
  174. @tyler_treat ACID distributed transactions exactly-once delivery ordered delivery serializable isolation

    linearizability System Properties negative account balance Business Rules / Application Invariants two users sharing same ID room double-booked balance reconciles
  175. @tyler_treat

  176. @tyler_treat We put ourselves at the mercy of our infrastructure

    and hope it makes good on its promises.
  177. @tyler_treat Kyle Kingsbury, 2015 http://jepsen.io It often doesn’t.

  178. @tyler_treat When do we actually need consistency?

  179. @tyler_treat

  180. @tyler_treat We can use consistency when the stakes are high

    and the cost is worth it.
  181. @tyler_treat And design our transparencies accordingly.

  182. @tyler_treat We could try to build perfect systems.

  183. @tyler_treat Should we build perfect systems or pragmatic systems?

  184. @tyler_treat Systems that can compensate.

  185. @tyler_treat Systems that can recover.

  186. @tyler_treat Systems that can apologize.

  187. @tyler_treat UX Systems Business

  188. @tyler_treat Data Consistency Race Conditions Performance Partial Failure

  189. @tyler_treat Data Consistency Race Conditions Performance Partial Failure Transparency Informs

  190. @tyler_treat Thank You bravenewgeek.com
 realkinetic.com

  191. @tyler_treat References • https://gotocon.com/dl/goto-chicago-2015/slides/CaitieMcCaffrey_ApplyingTheSagaPattern.pdf • http://ijcsits.org/papers/vol2no62012/42vol2no6.pdf • http://steve.vinoski.net/pdf/IEEE-Convenience_Over_Correctness.pdf • https://queue.acm.org/detail.cfm?id=2745385

    • https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf • http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_133.pdf • https://bravenewgeek.com/distributed-systems-are-a-ux-problem/ • http://www.cs.princeton.edu/~wlloyd/papers/challenges-hotos15.pdf • https://www.cs.cornell.edu/andru/cs711/2002fa/reading/sagas.pdf • https://www.youtube.com/watch?v=lsKaNDj4TrE • Starbucks photo - https://www.geekwire.com/2015/starbucks-mobile-ordering-now-blankets-the-u-s-with-coverage-in-san-francisco-new-york-and-more-coming-today/ • Friction image - https://byjus.com/physics/friction-in-automobiles/ • Carbon copy forms - http://www.rainiercopy.com/forms.html • Rosetta Stone photo - https://en.wikipedia.org/wiki/Rosetta_Stone#/media/File:Rosetta_Stone.JPG