@tyler_treat
Disclaimer:
I know approximately nothing about UX…
Slide 7
Slide 7 text
@tyler_treat
…other than when I’m the user, I know when
my experience is good and when it’s bad.
Slide 8
Slide 8 text
@tyler_treat
Slide 9
Slide 9 text
@tyler_treat
UX
Slide 10
Slide 10 text
@tyler_treat
UX Systems
Slide 11
Slide 11 text
@tyler_treat
UX Systems
Slide 12
Slide 12 text
@tyler_treat
UX Systems
Business
Slide 13
Slide 13 text
@tyler_treat
UX Systems
Business
This
Talk
Slide 14
Slide 14 text
@tyler_treat
The Yin and Yang of
UX and Architecture
Slide 15
Slide 15 text
@tyler_treat
Monolith
Slide 16
Slide 16 text
@tyler_treat
Monolith
Slide 17
Slide 17 text
@tyler_treat
Service
Service
Service
Service
Service
Service
Service
Serv
Service
Slide 18
Slide 18 text
@tyler_treat
Service
Service
Service
Service
Service
Service
Service
Serv
Service
Slide 19
Slide 19 text
@tyler_treat
Service
Service
Service
Service
Service
Service
Service
Serv
Service
Slide 20
Slide 20 text
@tyler_treat
Implications
Slide 21
Slide 21 text
@tyler_treat
Slide 22
Slide 22 text
@tyler_treat
book trip
Trip
Service
Trip
Database
transaction
Good old days
Slide 23
Slide 23 text
@tyler_treat
book trip
Microservices
Airline
Service
Hotel
Service
Car
Service
Trip
Service
transaction
transaction
transaction
Slide 24
Slide 24 text
@tyler_treat
book trip
Microservices
Airline
Service
Hotel
Service
Car
Service
Trip
Service
transaction
transaction
transaction
ACID
ACID
ACID
Slide 25
Slide 25 text
@tyler_treat
UX Implications of Microservices
• Data consistency
Slide 26
Slide 26 text
@tyler_treat
Service
Service
Service
Service
Service
Service
Service
Serv
Service
Slide 27
Slide 27 text
@tyler_treat
Service
Service
Service
Service
Service
Service
Service
Serv
Service
Slide 28
Slide 28 text
@tyler_treat
UX Implications of Microservices
• Data consistency
• Race conditions
Slide 29
Slide 29 text
@tyler_treat
Slide 30
Slide 30 text
@tyler_treat
UX Implications of Microservices
• Data consistency
• Race conditions
• Performance
Slide 31
Slide 31 text
@tyler_treat
book trip
Microservices
Airline
Service
Hotel
Service
Car
Service
Trip
Service
transaction
transaction
transaction
Slide 32
Slide 32 text
@tyler_treat
book trip
Microservices
Airline
Service
Hotel
Service
Car
Service
Trip
Service
transaction
transaction
transaction
Slide 33
Slide 33 text
@tyler_treat
UX Implications of Microservices
• Data consistency
• Race conditions
• Performance
• Partial failure
Slide 34
Slide 34 text
@tyler_treat
So are microservices bad?
Slide 35
Slide 35 text
@tyler_treat
Microservices are about
people scale.
Slide 36
Slide 36 text
@tyler_treat
Transparency
Slide 37
Slide 37 text
@tyler_treat
A Study of Transparency and Adaptability of Heterogeneous
Computer Networks with TCP/IP and IPv6 Protocols
Das, 2012
“Any change in a computing system, such as a new feature or new
component, is transparent if the system after change adheres to
previous external interface as much as possible while changing its
internal behavior.”
Slide 38
Slide 38 text
@tyler_treat
System
Slide 39
Slide 39 text
@tyler_treat
System
Slide 40
Slide 40 text
@tyler_treat
High Transparency
Low Transparency
Slide 41
Slide 41 text
@tyler_treat
NFS
High Transparency
Low Transparency
Slide 42
Slide 42 text
@tyler_treat
NFS
FTP
High Transparency
Low Transparency
@tyler_treat
RPC
Erlang
Message Passing
High Transparency
Low Transparency
Slide 51
Slide 51 text
@tyler_treat
Translating UX for developers:
APIs
Slide 52
Slide 52 text
@tyler_treat
Transparencies simplify the API
of a system.
Slide 53
Slide 53 text
@tyler_treat
UX is about deciding what
knobs to expose.
Slide 54
Slide 54 text
@tyler_treat
The Truth is Prohibitively Expensive
Balancing Consistency and UX
Slide 55
Slide 55 text
@tyler_treat
book trip
Trip
Service
Trip
Database
transaction
Good old days
Slide 56
Slide 56 text
@tyler_treat
book trip
Trip
Service
Trip
Database
transaction
Good old days
Transparency
Slide 57
Slide 57 text
@tyler_treat
book trip
Microservices
Airline
Service
Hotel
Service
Car
Service
Trip
Service
transaction
transaction
transaction
Transparency
Slide 58
Slide 58 text
@tyler_treat
book trip
Microservices
Airline
Service
Hotel
Service
Car
Service
Trip
Service
transaction
transaction
transaction
ACID
ACID
ACID
Transparency
Slide 59
Slide 59 text
@tyler_treat
Slide 60
Slide 60 text
@tyler_treat
Slide 61
Slide 61 text
@tyler_treat
Slide 62
Slide 62 text
@tyler_treat
Spreadsheet service
Slide 63
Slide 63 text
@tyler_treat
Spreadsheet service
Document service
Slide 64
Slide 64 text
@tyler_treat
Spreadsheet service
Document service
Presentation service
Slide 65
Slide 65 text
@tyler_treat
Spreadsheet service
Document service
Presentation service
IAM service
Slide 66
Slide 66 text
@tyler_treat
Spreadsheet service
Document service
Presentation service
IAM service
consistent
Slide 67
Slide 67 text
@tyler_treat
Consistency is about ordering of
events in a distributed system.
Slide 68
Slide 68 text
@tyler_treat
Why is this hard?
Slide 69
Slide 69 text
No content
Slide 70
Slide 70 text
@tyler_treat
So what can we do?
Slide 71
Slide 71 text
@tyler_treat
Coordinate
Slide 72
Slide 72 text
@tyler_treat
Two-Phase Commit
Slide 73
Slide 73 text
@tyler_treat
book trip
2PC Prepare
Airline
Service
Hotel
Service
Car
Service
Trip
Service
propose
propose
propose
Slide 74
Slide 74 text
@tyler_treat
book trip
2PC Prepare
Airline
Service
Hotel
Service
Car
Service
Trip
Service
vote
vote
vote
Slide 75
Slide 75 text
@tyler_treat
book trip
2PC Commit
Airline
Service
Hotel
Service
Car
Service
Trip
Service
commit/abort
commit/abort
commit/abort
Slide 76
Slide 76 text
@tyler_treat
book trip
2PC Commit
Airline
Service
Hotel
Service
Car
Service
Trip
Service
done
done
done
Slide 77
Slide 77 text
@tyler_treat
Problems with 2PC
• Chatty protocol: beholden to network latency
• Limited throughput
• Transaction coordinator: single point of failure
• Blocking protocol: susceptible to deadlock
Slide 78
Slide 78 text
@tyler_treat
book trip
2PC Prepare
Airline
Service
Hotel
Service
Car
Service
Trip
Service
propose
propose
propose
Slide 79
Slide 79 text
@tyler_treat
book trip
2PC Prepare
Airline
Service
Hotel
Service
Car
Service
Trip
Service
propose
propose
propose
Slide 80
Slide 80 text
@tyler_treat
book trip
2PC Prepare
Airline
Service
Hotel
Service
Car
Service
Trip
Service
propose
propose
propose
Slide 81
Slide 81 text
@tyler_treat
Add more phases!
Slide 82
Slide 82 text
@tyler_treat
Three-Phase Commit
Slide 83
Slide 83 text
@tyler_treat
Slide 84
Slide 84 text
@tyler_treat
atomic clocks
NTP
GPS
TrueTime
Slide 85
Slide 85 text
@tyler_treat
Good news:
we solved physics.
Slide 86
Slide 86 text
@tyler_treat
Bad news:
it costs all the money.
Slide 87
Slide 87 text
@tyler_treat
Not exactly…
Slide 88
Slide 88 text
@tyler_treat
Spanner: Google’s Globally-Distributed Database
Corbett et al.
Slide 89
Slide 89 text
@tyler_treat
TrueTime forces that uncertainty to the
surface, and Spanner provides a
transparency over it.
Slide 90
Slide 90 text
@tyler_treat
Spanner doesn’t avoid trade-offs,
it just minimizes their probability.
Slide 91
Slide 91 text
@tyler_treat
Spanner is expensive and
proprietary.
Slide 92
Slide 92 text
@tyler_treat
But it’s not the end of the story…
Slide 93
Slide 93 text
@tyler_treat
Unless every service is backed by the
same database, you probably still have
to deal with consistency problems.
Slide 94
Slide 94 text
@tyler_treat
Challenges to Adopting Stronger Consistency at Scale
Ajoux et al., 2015
“The biggest barrier to providing stronger consistency guarantees…is
that the consistency mechanism must integrate consistency across
many stateful services.”
Slide 95
Slide 95 text
@tyler_treat
Coordination is expensive because
processes can’t make progress
independently.
Slide 96
Slide 96 text
@tyler_treat
Slide 97
Slide 97 text
@tyler_treat
Slide 98
Slide 98 text
@tyler_treat
Peter Bailis, 2015 https://speakerdeck.com/pbailis/silence-is-golden-coordination-avoiding-systems-design
Slide 99
Slide 99 text
@tyler_treat
And what about partial failure?
Slide 100
Slide 100 text
@tyler_treat
Slide 101
Slide 101 text
@tyler_treat
Slide 102
Slide 102 text
@tyler_treat
Slide 103
Slide 103 text
@tyler_treat
Slide 104
Slide 104 text
@tyler_treat
Slide 105
Slide 105 text
@tyler_treat
Memories, Guesses, and Apologies
Dealing with Partial Knowledge
Slide 106
Slide 106 text
@tyler_treat
The cost of knowing the “truth”
can be prohibitively expensive.
Slide 107
Slide 107 text
@tyler_treat
And partial failure means the
“truth” is also fragile.
Slide 108
Slide 108 text
@tyler_treat
Where does this leave us?
Slide 109
Slide 109 text
@tyler_treat
We could go
back to the
monolith.
Slide 110
Slide 110 text
@tyler_treat
We could build
expensive data centers
with fancy hardware…
@tyler_treat
Slide 111
Slide 111 text
@tyler_treat
…or we could
rethink our
transparencies.
Slide 112
Slide 112 text
@tyler_treat
@tyler_treat
Slide 113
Slide 113 text
No content
Slide 114
Slide 114 text
@tyler_treat
Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf
Slide 115
Slide 115 text
@tyler_treat
Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf
Slide 116
Slide 116 text
@tyler_treat
Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf
Slide 117
Slide 117 text
@tyler_treat
Gregor Hohpe, 2005 https://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf
Slide 118
Slide 118 text
@tyler_treat
Exception Handling in
Asynchronous Systems
Slide 119
Slide 119 text
@tyler_treat
Slide 120
Slide 120 text
@tyler_treat
Exception Handling in Asynchronous Systems
• Write-off
Slide 121
Slide 121 text
@tyler_treat
Slide 122
Slide 122 text
@tyler_treat
Exception Handling in Asynchronous Systems
• Write-off
• Retry
Slide 123
Slide 123 text
@tyler_treat
Slide 124
Slide 124 text
@tyler_treat
Exception Handling in Asynchronous Systems
• Write-off
• Retry
• Compensating action
Slide 125
Slide 125 text
@tyler_treat
Revisiting Two-Phase Commit
Slide 126
Slide 126 text
@tyler_treat
Sagas
Slide 127
Slide 127 text
@tyler_treat
Sagas
Garcia-Molina & Salem, 1987
“A long-lived transaction is a saga if it can be written as a sequence of
transactions that can be interleaved with other transactions…Either all
the transactions in a saga are successfully completed or
compensating transactions are run to amend a partial execution.”
Slide 128
Slide 128 text
@tyler_treat
Sagas
Garcia-Molina & Salem, 1987
“A long-lived transaction is a saga if it can be written as a sequence of
transactions that can be interleaved with other transactions…Either all
the transactions in a saga are successfully completed or
compensating transactions are run to amend a partial execution.”
Slide 129
Slide 129 text
@tyler_treat
Sagas split long-lived transactions into
individual, interleaved sub-transactions:
T = T1
, T2
, . . . , Tn
Slide 130
Slide 130 text
@tyler_treat
And each sub-transaction has a
compensating transaction:
C1
, C2
, . . . , Cn
Slide 131
Slide 131 text
@tyler_treat
T1
, T2
, . . . , Tn
T1
, T2
, . . . , Tj
, Cj
, . . . , C2
, C1
Sagas guarantee one of two
execution sequences:
Slide 132
Slide 132 text
@tyler_treat
book trip
Airline
Service
Hotel
Service
Car
Service
Trip
Service
transaction
transaction
transaction
Slide 133
Slide 133 text
@tyler_treat
• Book flight
• Book hotel
• Book car
• Charge money
T = T1
, T2
, . . . , Tn
@tyler_treat
Compensating transactions
must be idempotent.
Slide 136
Slide 136 text
@tyler_treat
Sagas trade off isolation for
availability.
Slide 137
Slide 137 text
@tyler_treat
Event-Driven
Slide 138
Slide 138 text
@tyler_treat
book trip
Airline
Service
Hotel
Service
Car
Service
Trip
Service
transaction
transaction
transaction
Slide 139
Slide 139 text
@tyler_treat
event
Airline
Service
Hotel
Service
Car
Service
Trip
Service
event
event
event
Slide 140
Slide 140 text
@tyler_treat
event
Airline
Service
Hotel
Service
Car
Service
Trip
Service
event
event
event
Slide 141
Slide 141 text
@tyler_treat
System Properties Business Rules
Slide 142
Slide 142 text
@tyler_treat
Sean T. Allen
“People don’t want distributed transactions,
they just want the guarantees that distributed
transactions give them.”
Slide 143
Slide 143 text
@tyler_treat
CAP theorem
Slide 144
Slide 144 text
@tyler_treat
CAP Theorem
• Consistency, Availability, Partition Tolerance
• When a partition occurs, do we:
• Choose availability and give up consistency?
- or -
• Choose consistency and give up availability?
Slide 145
Slide 145 text
@tyler_treat
CAP Theorem
• Consistency, Availability, Partition Tolerance
• When a partition occurs, do we:
• Choose availability and give up consistency?
- or -
• Choose consistency and give up availability?
(or YOLO it)
Slide 146
Slide 146 text
@tyler_treat
The CAP theorem is a UX
question…
Slide 147
Slide 147 text
@tyler_treat
When a partial failure occurs, how do
you want the application to behave?
Slide 148
Slide 148 text
@tyler_treat
Slide 149
Slide 149 text
@tyler_treat
Slide 150
Slide 150 text
@tyler_treat
We can choose consistency and
sacrifice availability…
Slide 151
Slide 151 text
@tyler_treat
…or we can choose availability by making
local decisions with the knowledge at
hand and designing the UX accordingly.
Slide 152
Slide 152 text
@tyler_treat
Managing partial failure is a matter
of dealing with partial knowledge…
Slide 153
Slide 153 text
@tyler_treat
…and managing risk.
Slide 154
Slide 154 text
@tyler_treat
Check value
< $10,000?
Our risk appetite can
drive business rules.
Clear locally
Double check with
all replicas before
clearing
yes
no
Slide 155
Slide 155 text
@tyler_treat
Memories, guesses, and
apologies
Slide 156
Slide 156 text
@tyler_treat
Computers operate with partial
knowledge.
Slide 157
Slide 157 text
@tyler_treat
Either there’s a
disconnect with
the “real world”…
Slide 158
Slide 158 text
@tyler_treat
…or there’s a
disconnect
between systems.
Slide 159
Slide 159 text
@tyler_treat
Systems don’t make decisions,
they make guesses.
Slide 160
Slide 160 text
@tyler_treat
Systems have memory.
Slide 161
Slide 161 text
@tyler_treat
Memories help systems make
better guesses in the future.
Slide 162
Slide 162 text
@tyler_treat
Forgetfulness is a business
decision.
Slide 163
Slide 163 text
@tyler_treat
Sometimes the system guesses
wrong.
Slide 164
Slide 164 text
@tyler_treat
Systems need the capacity to
apologize.
Slide 165
Slide 165 text
@tyler_treat
Customers judge you not by your
failures, but by how you handle your
failures.
Slide 166
Slide 166 text
@tyler_treat
Are you building systems that never
fail or systems that fail gracefully?
Slide 167
Slide 167 text
@tyler_treat
Slide 168
Slide 168 text
@tyler_treat
Businesses need both code and
people to manage apologies.
Slide 169
Slide 169 text
@tyler_treat
It becomes less about trying to build the
perfect system and more about how we
cope with an imperfect one.