Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting Your System to Production (and keeping it there)

6facddda8e4536c0b0bfbdaf45e50675?s=47 Eoin Woods
December 01, 2015

Getting Your System to Production (and keeping it there)

It can be dispiriting to find that a well-designed system that has been carefully implemented runs into problems as soon as it hits production, but such things do happen. This session explores why this happens and discusses why good software development practice is important but ultimately isn't sufficient to create a reliable and effective enterprise system. We'll discuss what being "production ready" really means in order to allow us to understand the principles, patterns and practices that we need to be aware of and apply in order to get our systems into production safely and keep them there.

6facddda8e4536c0b0bfbdaf45e50675?s=128

Eoin Woods

December 01, 2015
Tweet

Transcript

  1. Getting a System to Production ... and keeping it there

    1
  2. Who Am I? Eoin Woods - CTO at Endava 2005

    - 2014 in capital markets (UBS, BGI) 2000 - 2004 in product engineering & consultancy 
 (Bull, Sybase, InterTrust, independent) Author, editor, speaker, community-guy 2
  3. Who are Endava? Software Engineering & IT Services Firm 2800+

    people UK, US, Germany, Romania, Moldova, Serbia, Macedonia Agile and Digital Transformation Consulting, Architecture, Development, Testing Data and Analytics Application Management, Infrastructure, DevOps 3
  4. Content Introducing Production Systems What Goes Wrong in Production? Solutions

    for Production Systems Conclusions 4
  5. Production Systems 5

  6. What is a production system? 6 Any system
 being used


    for real work
  7. Why is Productionisation Hard? No one teaches you about production

    who do you talk to? what do they want? what is the definition of “done” ? Production is difficult for developers hard to access, interrogate, debug, change, ... 7
  8. A new cast of characters 8 Developers Development Users

  9. A new cast of characters 8 Production Users Developers Auditors

    Operations Acquirers Infrastructure Business
 Management
  10. Production is constrained Highly controlled Content is all valuable Change

    can be difficult 9
  11. Production is unpredictable 10

  12. Production is highly visible! 11

  13. You don’t own production 12

  14. What goes wrong? 13

  15. Performance surprises Interactive load Batch time surprises System abusers! “all

    transactions this year”, “average since 1967”, ... 14
  16. Environment bombshells Constraints and contention Unexpected behaviour Integration points 15

  17. Failures happen Software defects Platform failures Environment failures 16

  18. Security tangles Security is simple in Development Much more complex

    in Production! 17
  19. Finding Solutions 18

  20. Key requirements for production Functionally correct does what the business

    process requires Stability behaves predictably in all situations Capacity can process the workload required (at all times) Security limits access to those who are authorised to have it 19
  21. Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices

    20
  22. Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices

    Simplicity 20
  23. Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices

    Simplicity Resource Governor 20
  24. Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices

    Simplicity Resource Governor Threat Modelling 20
  25. Solution Framework Correctness Stability Capacity Security Design Principles Technology Practices

    Simplicity Resource Governor Threat Modelling 20 Our focus today
  26. General Principles One Team Automate Measure and Improve (feedback loops)

    Good Enough over Perfection 21 Timeless principles … that led to CD and DevOps
  27. DevOps Principles Communication Automation Lean thinking Measurement Sharing 22 CALMS

    - itrevolution.com/devops-culture-part-1
  28. Solutions: Achieving Stability 23

  29. Stability - design principles Fail quickly fail fast, timeouts Isolate

    problems flow control, circuit breakers, bulkheads, asynchronous integration Ensure steady state operation housekeeping, predictable resource allocation, governors, throttling 24
  30. Stability - technology solutions 25

  31. Stability - technology solutions Fail fast 25

  32. Stability - technology solutions Fail fast Bulkhead 25

  33. Stability - technology solutions Timeouts Fail fast Bulkhead 25

  34. Stability - technology solutions Timeouts Fail fast Bulkhead Governor 25

  35. Stability - technology solutions Timeouts Circuit Breaker Fail fast Bulkhead

    Governor 25
  36. Stability - technology solutions Timeouts Circuit Breaker Fail fast Bulkhead

    Governor Housekeeping 25
  37. Example - Circuit Breaker Clear Checking Tripped err_returned timeout err_returned

    &&
 err_count > 10 err_returned 26
  38. Stability - practices Repeatability defined processes, practice scenarios, prelive environments

    Automation automate the routine, automate the difficult allow the human back in the loop on demand Transparency logging, monitoring, alerts, trends 27
  39. Stability - process automation Logging 
 & Metrics Monitoring Automation

    28
  40. Stability - environments Development UAT Prelive Production 29

  41. “Uncontrolled” Stability - environments Development UAT Prelive Production 29

  42. “Controlled” “Uncontrolled” Stability - environments Development UAT Prelive Production 29

  43. “Controlled” “Uncontrolled” Stability - environments Development UAT Prelive Production 29

    The DevOps Zone
  44. Stability - production runbooks Security, Audit,
 Compliance, ... Production
 Operations

    Developers System design Experience Constraints •Overview •Install •Backout •Op Procs •Investigation •Recovery 30
  45. Solutions: Achieving Capacity 31

  46. Capacity - design principles Minimise workload efficiency is important Flatten

    the peaks move workload around Design for the large (scalability) understand where the time goes multiply by a million 32
  47. Capacity - technology solutions Measure and minimise understand where the

    work is Caching and pre-computing reduce the work to be done Sharding and partitioning separate workload to allow scale 33
  48. Capacity - solutions 34

  49. Capacity - solutions Segment Timings 34

  50. Capacity - solutions Segment Timings Static cache 34

  51. Capacity - solutions Segment Timings Static cache Lookaside cache 34

  52. Capacity - solutions Segment Timings Static cache Lookaside cache Result

    set caching 34
  53. Capacity - solutions Segment Timings Static cache Lookaside cache Precompute

    Result set caching 34
  54. Capacity - solutions Segment Timings Static cache Lookaside cache Precompute

    Result set caching Phased batch 34
  55. Moving Work Around Utilisation 0 25 50 75 100 0

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Utilisation 0 25 50 75 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 35
  56. Capacity - practices Model and estimate Test capacity on realistic

    environments allows model calibration Monitoring and trend analysis tests theory against reality spots impending storms before they hit 36
  57. Solutions: Achieving Security 37

  58. Security - key design principles What they don’t have won’t

    hurt you least privilege - grant the minimum needed Security needs simplicity what you can’t analyse you can’t be sure about Don’t put your eggs in one basket separate privileges to avoid total breaches Fail safely 38
  59. Security - solutions 39

  60. Security - solutions Authentication & Roles 39

  61. Security - solutions Authentication & Roles Least privilege / separation

    39
  62. Security - solutions Authentication & Roles Least privilege / separation

    Privacy (TLS) 39
  63. Security - solutions Authentication & Roles Least privilege / separation

    Privacy (TLS) Trust (certs) 39
  64. Security - solutions Authentication & Roles Least privilege / separation

    Privacy (TLS) Isolation (firewalls & zones) Trust (certs) 39
  65. Security - key practices Model threats to identify mitigation Define

    policy to know what to protect Apply mechanisms to mitigate threats Test security as well as functions 40
  66. Security - techniques Security Model Threat
 Model 41

  67. Summary 42

  68. Production is just different it’s not yours and you need

    to respect that Production is demanding Correctness Stability Capacity Security Summary 43
  69. Summary (ii) Identify solutions by requirement & area principles technologies

    practices 44
  70. Summary (iii) Production requirements and principles go back to the

    age of the mainframe CD and DevOps the latest incarnation welcome attention from developers new tech enabling new possibilities breaking down silos to make it happen 45
  71. Books Software Systems Architecture Second Edition NICK ROZANSKI • EOIN

    WOODS Working with Stakeholders Using Viewpoints and Perspectives Second Edition 46
  72. Eoin Woods
 eoin.woods@endava.com
 www.eoinwoods.info
 @eoinwoodz Thank you. Questions? 47 Acknowledgements

    http://www.icons-land.com http://www.alamy.com/ http://www.42u.com