Slide 1

Slide 1 text

Getting a System to Production ... and keeping it there 1

Slide 2

Slide 2 text

Who Am I? Eoin Woods - CTO at Endava 2005 - 2014 in capital markets (UBS, BGI) 2000 - 2004 in product engineering & consultancy 
 (Bull, Sybase, InterTrust, independent) Author, editor, speaker, community-guy 2

Slide 3

Slide 3 text

Who are Endava? Software Engineering & IT Services Firm 2800+ people UK, US, Germany, Romania, Moldova, Serbia, Macedonia Agile and Digital Transformation Consulting, Architecture, Development, Testing Data and Analytics Application Management, Infrastructure, DevOps 3

Slide 4

Slide 4 text

Content Introducing Production Systems What Goes Wrong in Production? Solutions for Production Systems Conclusions 4

Slide 5

Slide 5 text

Production Systems 5

Slide 6

Slide 6 text

What is a production system? 6 Any system
 being used
 for real work

Slide 7

Slide 7 text

Why is Productionisation Hard? No one teaches you about production who do you talk to? what do they want? what is the definition of “done” ? Production is difficult for developers hard to access, interrogate, debug, change, ... 7

Slide 8

Slide 8 text

A new cast of characters 8 Developers Development Users

Slide 9

Slide 9 text

A new cast of characters 8 Production Users Developers Auditors Operations Acquirers Infrastructure Business
 Management

Slide 10

Slide 10 text

Production is constrained Highly controlled Content is all valuable Change can be difficult 9

Slide 11

Slide 11 text

Production is unpredictable 10

Slide 12

Slide 12 text

Production is highly visible! 11

Slide 13

Slide 13 text

You don’t own production 12

Slide 14

Slide 14 text

What goes wrong? 13

Slide 15

Slide 15 text

Performance surprises Interactive load Batch time surprises System abusers! “all transactions this year”, “average since 1967”, ... 14

Slide 16

Slide 16 text

Environment bombshells Constraints and contention Unexpected behaviour Integration points 15

Slide 17

Slide 17 text

Failures happen Software defects Platform failures Environment failures 16

Slide 18

Slide 18 text

Security tangles Security is simple in Development Much more complex in Production! 17

Slide 19

Slide 19 text

Finding Solutions 18

Slide 20

Slide 20 text

Key requirements for production Functionally correct does what the business process requires Stability behaves predictably in all situations Capacity can process the workload required (at all times) Security limits access to those who are authorised to have it 19

Slide 21

Slide 21 text

Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices 20

Slide 22

Slide 22 text

Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices Simplicity 20

Slide 23

Slide 23 text

Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices Simplicity Resource Governor 20

Slide 24

Slide 24 text

Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices Simplicity Resource Governor Threat Modelling 20

Slide 25

Slide 25 text

Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices Simplicity Resource Governor Threat Modelling 20 Our focus today

Slide 26

Slide 26 text

General Principles One Team Automate Measure and Improve (feedback loops) Good Enough over Perfection 21 Timeless principles … that led to CD and DevOps

Slide 27

Slide 27 text

DevOps Principles Communication Automation Lean thinking Measurement Sharing 22 CALMS - itrevolution.com/devops-culture-part-1

Slide 28

Slide 28 text

Solutions: Achieving Stability 23

Slide 29

Slide 29 text

Stability - design principles Fail quickly fail fast, timeouts Isolate problems flow control, circuit breakers, bulkheads, asynchronous integration Ensure steady state operation housekeeping, predictable resource allocation, governors, throttling 24

Slide 30

Slide 30 text

Stability - technology solutions 25

Slide 31

Slide 31 text

Stability - technology solutions Fail fast 25

Slide 32

Slide 32 text

Stability - technology solutions Fail fast Bulkhead 25

Slide 33

Slide 33 text

Stability - technology solutions Timeouts Fail fast Bulkhead 25

Slide 34

Slide 34 text

Stability - technology solutions Timeouts Fail fast Bulkhead Governor 25

Slide 35

Slide 35 text

Stability - technology solutions Timeouts Circuit Breaker Fail fast Bulkhead Governor 25

Slide 36

Slide 36 text

Stability - technology solutions Timeouts Circuit Breaker Fail fast Bulkhead Governor Housekeeping 25

Slide 37

Slide 37 text

Example - Circuit Breaker Clear Checking Tripped err_returned timeout err_returned &&
 err_count > 10 err_returned 26

Slide 38

Slide 38 text

Stability - practices Repeatability defined processes, practice scenarios, prelive environments Automation automate the routine, automate the difficult allow the human back in the loop on demand Transparency logging, monitoring, alerts, trends 27

Slide 39

Slide 39 text

Stability - process automation Logging 
 & Metrics Monitoring Automation 28

Slide 40

Slide 40 text

Stability - environments Development UAT Prelive Production 29

Slide 41

Slide 41 text

“Uncontrolled” Stability - environments Development UAT Prelive Production 29

Slide 42

Slide 42 text

“Controlled” “Uncontrolled” Stability - environments Development UAT Prelive Production 29

Slide 43

Slide 43 text

“Controlled” “Uncontrolled” Stability - environments Development UAT Prelive Production 29 The DevOps Zone

Slide 44

Slide 44 text

Stability - production runbooks Security, Audit,
 Compliance, ... Production
 Operations Developers System design Experience Constraints •Overview •Install •Backout •Op Procs •Investigation •Recovery 30

Slide 45

Slide 45 text

Solutions: Achieving Capacity 31

Slide 46

Slide 46 text

Capacity - design principles Minimise workload efficiency is important Flatten the peaks move workload around Design for the large (scalability) understand where the time goes multiply by a million 32

Slide 47

Slide 47 text

Capacity - technology solutions Measure and minimise understand where the work is Caching and pre-computing reduce the work to be done Sharding and partitioning separate workload to allow scale 33

Slide 48

Slide 48 text

Capacity - solutions 34

Slide 49

Slide 49 text

Capacity - solutions Segment Timings 34

Slide 50

Slide 50 text

Capacity - solutions Segment Timings Static cache 34

Slide 51

Slide 51 text

Capacity - solutions Segment Timings Static cache Lookaside cache 34

Slide 52

Slide 52 text

Capacity - solutions Segment Timings Static cache Lookaside cache Result set caching 34

Slide 53

Slide 53 text

Capacity - solutions Segment Timings Static cache Lookaside cache Precompute Result set caching 34

Slide 54

Slide 54 text

Capacity - solutions Segment Timings Static cache Lookaside cache Precompute Result set caching Phased batch 34

Slide 55

Slide 55 text

Moving Work Around Utilisation 0 25 50 75 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Utilisation 0 25 50 75 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 35

Slide 56

Slide 56 text

Capacity - practices Model and estimate Test capacity on realistic environments allows model calibration Monitoring and trend analysis tests theory against reality spots impending storms before they hit 36

Slide 57

Slide 57 text

Solutions: Achieving Security 37

Slide 58

Slide 58 text

Security - key design principles What they don’t have won’t hurt you least privilege - grant the minimum needed Security needs simplicity what you can’t analyse you can’t be sure about Don’t put your eggs in one basket separate privileges to avoid total breaches Fail safely 38

Slide 59

Slide 59 text

Security - solutions 39

Slide 60

Slide 60 text

Security - solutions Authentication & Roles 39

Slide 61

Slide 61 text

Security - solutions Authentication & Roles Least privilege / separation 39

Slide 62

Slide 62 text

Security - solutions Authentication & Roles Least privilege / separation Privacy (TLS) 39

Slide 63

Slide 63 text

Security - solutions Authentication & Roles Least privilege / separation Privacy (TLS) Trust (certs) 39

Slide 64

Slide 64 text

Security - solutions Authentication & Roles Least privilege / separation Privacy (TLS) Isolation (firewalls & zones) Trust (certs) 39

Slide 65

Slide 65 text

Security - key practices Model threats to identify mitigation Define policy to know what to protect Apply mechanisms to mitigate threats Test security as well as functions 40

Slide 66

Slide 66 text

Security - techniques Security Model Threat
 Model 41

Slide 67

Slide 67 text

Summary 42

Slide 68

Slide 68 text

Production is just different it’s not yours and you need to respect that Production is demanding Correctness Stability Capacity Security Summary 43

Slide 69

Slide 69 text

Summary (ii) Identify solutions by requirement & area principles technologies practices 44

Slide 70

Slide 70 text

Summary (iii) Production requirements and principles go back to the age of the mainframe CD and DevOps the latest incarnation welcome attention from developers new tech enabling new possibilities breaking down silos to make it happen 45

Slide 71

Slide 71 text

Books Software Systems Architecture Second Edition NICK ROZANSKI • EOIN WOODS Working with Stakeholders Using Viewpoints and Perspectives Second Edition 46

Slide 72

Slide 72 text

Eoin Woods
 [email protected]
 www.eoinwoods.info
 @eoinwoodz Thank you. Questions? 47 Acknowledgements http://www.icons-land.com http://www.alamy.com/ http://www.42u.com