Slide 1

Slide 1 text

@markhibberd programming in the large Architecture and Experimentation

Slide 2

Slide 2 text

“Simplicity is prerequisite for reliability” Edsger W. Dijkstra -! How do we tell truths that might hurt? (1975)

Slide 3

Slide 3 text

Legacy Systems and Organisations z ģ G Y

Slide 4

Slide 4 text

How Did We Get Here

Slide 5

Slide 5 text

The Hand-me Down Code Last Touched

Slide 6

Slide 6 text

Code Last Touched You Started The Hand-me Down

Slide 7

Slide 7 text

Code Last Touched You Started Everyone Else Started The Hand-me Down

Slide 8

Slide 8 text

Code Last Touched You Started Everyone Else Started You’re The Expert The Hand-me Down

Slide 9

Slide 9 text

The Rush Job Start Work

Slide 10

Slide 10 text

Start Work A Working System The Rush Job

Slide 11

Slide 11 text

Start Work System Delivered A Working System The Rush Job

Slide 12

Slide 12 text

Start Work System Delivered The Rush Job

Slide 13

Slide 13 text

The Rewrite Someone Else’s Code

Slide 14

Slide 14 text

Someone Else’s Code System Delivered The Rewrite

Slide 15

Slide 15 text

Someone Else’s Code System Delivered Bob Knows Better The Rewrite

Slide 16

Slide 16 text

Someone Else’s Code System Delivered A New System Bob Knows Better The Rewrite

Slide 17

Slide 17 text

Someone Else’s Code System Delivered A New, Not Quite Working System Bob Knows Better The Rewrite

Slide 18

Slide 18 text

Someone Else’s Code System Delivered An Old, Not Quite Working System Bob Knows Better The Rewrite

Slide 19

Slide 19 text

The Greenfield Enthusiasm

Slide 20

Slide 20 text

Enthusiasm System Delivered The Greenfield

Slide 21

Slide 21 text

Enthusiasm Realisation and Despair System Delivered The Greenfield

Slide 22

Slide 22 text

An Idea Oh, Sorry, We Shipped That 30 Minutes Later The Prototype

Slide 23

Slide 23 text

The Bandwagon

Slide 24

Slide 24 text

How We Pick Our Technology The Bandwagon

Slide 25

Slide 25 text

Perhaps we need a microservice to deploy Docker The Bandwagon

Slide 26

Slide 26 text

So we can run a microservice The Bandwagon

Slide 27

Slide 27 text

To display some text The Bandwagon

Slide 28

Slide 28 text

legacy is the default

Slide 29

Slide 29 text

The Ideal New Ideas

Slide 30

Slide 30 text

The Ideal New Ideas Stable Ideas

Slide 31

Slide 31 text

The Ideal New Ideas Stable Ideas We Now Know Better

Slide 32

Slide 32 text

Taking Responsibility

Slide 33

Slide 33 text

Too Important to Ignore, Too Important to Change an anecdote

Slide 34

Slide 34 text

100 million+ active users 100 million+ transactions a day millions of $$$ a couple of “simple” services

Slide 35

Slide 35 text

server client

Slide 36

Slide 36 text

/call server client on-demand

Slide 37

Slide 37 text

/call server client /check on-demand periodically

Slide 38

Slide 38 text

/call server client /check on-demand periodically

Slide 39

Slide 39 text

/call server client /check on-demand periodically /check2 /check2z /v3check

Slide 40

Slide 40 text

/call server client /check on-demand periodically /check2 /check2z /v3check

Slide 41

Slide 41 text

/call server /check /check2 /check2z /v3check

Slide 42

Slide 42 text

enter our protagonists…

Slide 43

Slide 43 text

/call server /check we spent a lot of time “fire fighting” /check2 /check2z /v3check

Slide 44

Slide 44 text

/call server /check we spent a lot of time “fire fighting” /check2 /check2z /v3check

Slide 45

Slide 45 text

/call server /check /check2 /check2z /v3check we spent a lot of time improving “quality”

Slide 46

Slide 46 text

/call server /check we spent a lot of time improving “quality”

Slide 47

Slide 47 text

/call server /check we spent a lot of time improving “quality”

Slide 48

Slide 48 text

Programmer Myth #1 It Is Someone Else’s Fault

Slide 49

Slide 49 text

we completely failed to adapt the system for change

Slide 50

Slide 50 text

we remained hostage to a fear of change

Slide 51

Slide 51 text

Autonomous Systems and Rates of Change ģ Y z G

Slide 52

Slide 52 text

Systems

Slide 53

Slide 53 text

Code Search an example

Slide 54

Slide 54 text

code search

Slide 55

Slide 55 text

web du jour db ui

Slide 56

Slide 56 text

web du jour db ui indexer api

Slide 57

Slide 57 text

web du jour db ui indexer api

Slide 58

Slide 58 text

web du jour db ui indexer api

Slide 59

Slide 59 text

db ui indexer api

Slide 60

Slide 60 text

the thing about real systems is their autonomy

Slide 61

Slide 61 text

?

Slide 62

Slide 62 text

rules not boxes

Slide 63

Slide 63 text

architecture is the concepts on which we formulate our systems

Slide 64

Slide 64 text

architecture is the rules for how these systems interact

Slide 65

Slide 65 text

architecture is the rules for how these systems are implemented

Slide 66

Slide 66 text

indexer search independent problem domains

Slide 67

Slide 67 text

indexer search code ctags ctags application/html application/search.v1+json well defined interfaces

Slide 68

Slide 68 text

indexer search code ctags ctags application/html application/search.v1+json well defined interfaces

Slide 69

Slide 69 text

indexer independent technical decisions search shell scala

Slide 70

Slide 70 text

indexer independent technical decisions search shell scala git hook embedded

Slide 71

Slide 71 text

indexer independent technical decisions search shell scala git hook embedded os logging os logging

Slide 72

Slide 72 text

indexer consistency helps avoid chaos search shell scala git hook embedded os logging os logging

Slide 73

Slide 73 text

Autonomy

Slide 74

Slide 74 text

#1 individually deployable

Slide 75

Slide 75 text

indexer search

Slide 76

Slide 76 text

indexer search v1 v1

Slide 77

Slide 77 text

indexer search v2 v1

Slide 78

Slide 78 text

indexer search v3 v1

Slide 79

Slide 79 text

indexer search v3 v2

Slide 80

Slide 80 text

#2 independent domain models

Slide 81

Slide 81 text

indexer search

Slide 82

Slide 82 text

different notions of “index”

Slide 83

Slide 83 text

really don’t do this

Slide 84

Slide 84 text

really don’t do this

Slide 85

Slide 85 text

#3 standards for interchange formats

Slide 86

Slide 86 text

indexer search

Slide 87

Slide 87 text

indexer search

Slide 88

Slide 88 text

indexer search standard rules for these help avoid chaos

Slide 89

Slide 89 text

#4 no shared state

Slide 90

Slide 90 text

No content

Slide 91

Slide 91 text

really don’t do this

Slide 92

Slide 92 text

really don’t do this

Slide 93

Slide 93 text

autonomy builds in reliability

Slide 94

Slide 94 text

indexer search

Slide 95

Slide 95 text

x search x /\/\/\/\/\

Slide 96

Slide 96 text

x search x /\/\/\/\/\

Slide 97

Slide 97 text

autonomy builds in the ability to change

Slide 98

Slide 98 text

indexer search shell scala git hook embedded os logging os logging

Slide 99

Slide 99 text

indexer search haskell scala git hook embedded os logging os logging

Slide 100

Slide 100 text

How long does it take to get a 1 line change to production?

Slide 101

Slide 101 text

No content

Slide 102

Slide 102 text

No content

Slide 103

Slide 103 text

No content

Slide 104

Slide 104 text

No content

Slide 105

Slide 105 text

No content

Slide 106

Slide 106 text

warning signs an anecdote

Slide 107

Slide 107 text

multi database - multi data center replication 100 million+ transactions a day

Slide 108

Slide 108 text

No content

Slide 109

Slide 109 text

No content

Slide 110

Slide 110 text

No content

Slide 111

Slide 111 text

No content

Slide 112

Slide 112 text

No content

Slide 113

Slide 113 text

No content

Slide 114

Slide 114 text

No content

Slide 115

Slide 115 text

No content

Slide 116

Slide 116 text

x x

Slide 117

Slide 117 text

x x /\/\/\/\/\/\/\

Slide 118

Slide 118 text

the data-model was entirely shared between replication and otp system

Slide 119

Slide 119 text

it was ALL shared state

Slide 120

Slide 120 text

it was really only feasible to change if one team was working on both “systems”

Slide 121

Slide 121 text

if one system failed, they often both failed

Slide 122

Slide 122 text

as we patched failure modes, reliability never improved

Slide 123

Slide 123 text

x x /\/\/\/\/\/\/\

Slide 124

Slide 124 text

x /\/\/\/\/\/\/\

Slide 125

Slide 125 text

autonomy is far more important for reliability than code improvements

Slide 126

Slide 126 text

Programmer Myth #2 The Bad Code is to Blame

Slide 127

Slide 127 text

System Evolution z ģ G Y

Slide 128

Slide 128 text

“... with proper design, the features come cheaply. This approach is arduous, but continues to succeed.” Dennis Ritchie

Slide 129

Slide 129 text

thinking ahead is not about avoiding change

Slide 130

Slide 130 text

indexer search shell scala git hook embedded os logging os logging

Slide 131

Slide 131 text

indexer search haskell scala git hook embedded os logging os logging

Slide 132

Slide 132 text

thinking ahead is about letting us change at different rates for different problems

Slide 133

Slide 133 text

thinking ahead is about letting us make short term decisions that don’t have long term effects

Slide 134

Slide 134 text

attempting change an anecdote

Slide 135

Slide 135 text

small company analytics product very quality focused team inherited a small piece of code very bad code

Slide 136

Slide 136 text

the product

Slide 137

Slide 137 text

the jsp

Slide 138

Slide 138 text

the rewrite heavy focus on quality

Slide 139

Slide 139 text

the rewrite but… rebuilt same structure

Slide 140

Slide 140 text

No content

Slide 141

Slide 141 text

the indivisible blob

Slide 142

Slide 142 text

websphere the indivisible blob

Slide 143

Slide 143 text

websphere the indivisible blob The Plan ui core split

Slide 144

Slide 144 text

websphere the indivisible blob The Plan ui core tech upgrade

Slide 145

Slide 145 text

websphere the indivisible blob The Plan ui core indexer websphere isolate

Slide 146

Slide 146 text

The Reality ui core indexer websphere

Slide 147

Slide 147 text

The Reality ui core indexer websphere data model + state

Slide 148

Slide 148 text

The Reality ui core indexer websphere data model + state WEBSHERE

Slide 149

Slide 149 text

Programmer Myth #3 We Must Do Something Now

Slide 150

Slide 150 text

Rewrites

Slide 151

Slide 151 text

Programmer Myth #4 We Should Rewrite

Slide 152

Slide 152 text

(not) Rewrites

Slide 153

Slide 153 text

architecture is controlled by developers not architects

Slide 154

Slide 154 text

#1 version everything

Slide 155

Slide 155 text

indexer search

Slide 156

Slide 156 text

indexer v1 search v1

Slide 157

Slide 157 text

indexer v1 search v1 v1 v1 v1

Slide 158

Slide 158 text

the internet is broken an aside

Slide 159

Slide 159 text

MIME-Version: 1.0

Slide 160

Slide 160 text

what should a client do if it sees something that isn’t version 1.0?

Slide 161

Slide 161 text

#2 the wedge

Slide 162

Slide 162 text

the status quo

Slide 163

Slide 163 text

a wedge the status quo

Slide 164

Slide 164 text

a wedge the status quo

Slide 165

Slide 165 text

a wedge the status quo

Slide 166

Slide 166 text

a wedge

Slide 167

Slide 167 text

a wedge

Slide 168

Slide 168 text

mega-code-search-tool

Slide 169

Slide 169 text

mega-code-search-tool R

Slide 170

Slide 170 text

mega-code-search-tool external indexer support R

Slide 171

Slide 171 text

mega-code-search-tool R external indexer support

Slide 172

Slide 172 text

mega-code-search-tool R external indexer support scala

Slide 173

Slide 173 text

R scala haskell javascript search

Slide 174

Slide 174 text

#3 embrace partial moves

Slide 175

Slide 175 text

mega-code-search-tool

Slide 176

Slide 176 text

mega-code-search-tool {incomplete}

Slide 177

Slide 177 text

control in progress moves at a single point

Slide 178

Slide 178 text

track and cap the number of moves in progress

Slide 179

Slide 179 text

plan for rollback as much as rollforward

Slide 180

Slide 180 text

#4 validate as you go

Slide 181

Slide 181 text

mega-code-search-tool

Slide 182

Slide 182 text

mega-code-search-tool external indexer support R

Slide 183

Slide 183 text

R make sure you can run this straight away external indexer support mega-code-search-tool

Slide 184

Slide 184 text

R make sure you can run this straight away external indexer support mega-code-search-tool

Slide 185

Slide 185 text

mega-code-search-tool R external indexer support scala

Slide 186

Slide 186 text

mega-code-search-tool R external indexer support scala

Slide 187

Slide 187 text

mega-code-search-tool R external indexer support scala

Slide 188

Slide 188 text

R scala haskell javascript search

Slide 189

Slide 189 text

Experimentation and Measurement G ģ z Y

Slide 190

Slide 190 text

Change Without Fear

Slide 191

Slide 191 text

we need confidence that things don’t break when we ship code

Slide 192

Slide 192 text

confidence stems from knowing code works in production before it affects a customer

Slide 193

Slide 193 text

#1 move production to development

Slide 194

Slide 194 text

production quality data automation of environments lots of testing

Slide 195

Slide 195 text

production quality data automation of environments lots of testing Rather Old Hat

Slide 196

Slide 196 text

#2 move development to production

Slide 197

Slide 197 text

yes, really. i want to ship your worst, un-tried, experimental code to production

Slide 198

Slide 198 text

Programmer Myth #5 We Can’t Ship That

Slide 199

Slide 199 text

Safety First

Slide 200

Slide 200 text

@ambiata we deal with ingesting and processing lots of data 100s TB / per day / per customer scientific experiment and measurement is key experiments affect users directly researchers / non-specialist engineers produce code

Slide 201

Slide 201 text

ingest store the machine package publish

Slide 202

Slide 202 text

ingest store the machine package publish

Slide 203

Slide 203 text

ingest store package publish the machine

Slide 204

Slide 204 text

#1 split environments

Slide 205

Slide 205 text

ingest store package publish the machine

Slide 206

Slide 206 text

ingest store package publish the machine production:live

Slide 207

Slide 207 text

ingest store package publish the machine production:exp

Slide 208

Slide 208 text

ingest store package publish the machine production:* package publish

Slide 209

Slide 209 text

implemented through machine level acls experiment live control

Slide 210

Slide 210 text

implemented through machine level acls experiment live control write read

Slide 211

Slide 211 text

implemented through machine level acls experiment live control

Slide 212

Slide 212 text

implemented through machine level acls experiment live control write read

Slide 213

Slide 213 text

implemented through machine level acls experiment live control write read

Slide 214

Slide 214 text

#2 checkpoints

Slide 215

Slide 215 text

ingest store package publish the machine

Slide 216

Slide 216 text

ingest store package publish the machine x x

Slide 217

Slide 217 text

ingest store package publish the machine x x

Slide 218

Slide 218 text

ingest store package publish the machine x x

Slide 219

Slide 219 text

ingest store package publish the machine x x

Slide 220

Slide 220 text

deep implementation, intra- and inter- process crosschecks

Slide 221

Slide 221 text

#3 tandem deployments

Slide 222

Slide 222 text

ingest store package publish the machine

Slide 223

Slide 223 text

ingest store package publish the machine

Slide 224

Slide 224 text

ingest store package publish the machine x x x x

Slide 225

Slide 225 text

ingest store package publish the machine x x x x

Slide 226

Slide 226 text

ingest store package publish the machine x x x x

Slide 227

Slide 227 text

ingest store package publish the machine x x x x

Slide 228

Slide 228 text

#4 measure everything

Slide 229

Slide 229 text

every result computed should have traceability back to the code & data

Slide 230

Slide 230 text

package publish the machine

Slide 231

Slide 231 text

package publish the machine publish-ab12f2e

Slide 232

Slide 232 text

package publish the machine publish-ab12f2e

Slide 233

Slide 233 text

package publish the machine publish-ab12f2e

Slide 234

Slide 234 text

package publish the machine package-ab12f2e

Slide 235

Slide 235 text

package publish the machine score-ab12f2e

Slide 236

Slide 236 text

package publish the machine

Slide 237

Slide 237 text

package publish the machine size: 192GB checksum: d32fe1a created: 2014-03-02T10:01 loaded: store-a122fe3

Slide 238

Slide 238 text

statistics work, measurements over time will find errors

Slide 239

Slide 239 text

package publish the machine

Slide 240

Slide 240 text

package publish the machine wall-time: 13411s cpu-time: 429130s records: 19 million histogram: a: 13million b: 2million c: 4million

Slide 241

Slide 241 text

package publish the machine wall-time: 13411s cpu-time: 429130s records: 19 million histogram: a: 13million b: 2million c: 4million aggregate over time

Slide 242

Slide 242 text

package publish the machine median: … averages: cpu-time: 420030s quantiles: … aggregate over time

Slide 243

Slide 243 text

package publish the machine cross check everything wall-time: 13411s cpu-time: 429130s records: 19 million histogram: a: 13million b: 2million c: 4million

Slide 244

Slide 244 text

package publish the machine cross check everything wall-time: 13411s cpu-time: 429130s records: 19 million histogram: a: 13million b: 2million c: 4million

Slide 245

Slide 245 text

Programmer Myth #6 But We Can’t Do That In Our Situation

Slide 246

Slide 246 text

these techniques adapt

Slide 247

Slide 247 text

WebCloud (tm) live live

Slide 248

Slide 248 text

WebCloud (tm) live proxy live

Slide 249

Slide 249 text

WebCloud (tm) experiment live proxy experiment live

Slide 250

Slide 250 text

WebCloud (tm) experiment live proxy experiment live

Slide 251

Slide 251 text

Packaged Products live live live

Slide 252

Slide 252 text

Packaged Products live measurement live live

Slide 253

Slide 253 text

Packaged Products live measurement live live policy

Slide 254

Slide 254 text

Packaged Products experiment live measurement live live policy

Slide 255

Slide 255 text

change is the default

Slide 256

Slide 256 text

architecture is every day

Slide 257

Slide 257 text

experiment for reliability

Slide 258

Slide 258 text

measure always

Slide 259

Slide 259 text

end z ģ G Y