Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Operational Excellence

Towards Operational Excellence

Once systems are designed, implemented, and tested, we come to what is arguably one of the hardest aspects in the lifecycle of a system: bringing it to life and sustaining it in operations. In this series of posts, I’ll discuss Operational Excellence, focusing on the three essential interconnecting elements that enable you to successfully operate the technology you’ve built — Culture, Tools, and Processes.

E6c942c0f8e6042fbd47fcd3c4113b90?s=128

Adrian Hornsby

June 17, 2020
Tweet

Transcript

  1. None
  2. © 2020, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Towards Operational Excellence Adrian Hornsby Principal Evangelist - Architecture Amazon Web Services S e s s i o n I D @adhorn
  3. What is Operational Excellence?

  4. When your whole business is fundamentally dependent on technology, operational

    excellence is critical.
  5. 1995

  6. Internet Web Server customers Inventory Orders Database Customer Service Tools

    Fulfillment Center Tools
  7. None
  8. None
  9. What is Operational Excellence?

  10. None
  11. What is Operational Excellence? • Happy customers! • Consistently exceeding

    operational goals • Anticipating and addressing problems • Effectively responding to operational issues • Continuously improving …and doing all of this at significant scale.
  12. How does a technology organization move toward OE?

  13. Achieving Operational Excellence Tools Processes Culture Technology

  14. Achieving Operational Excellence Culture

  15. Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3.

    Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results https://www.amazon.jobs/en/principles
  16. Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3.

    Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results
  17. Amazon Flywheel

  18. Innovation Convenience Fast Delivery Reduce Customer’s Costs Wide Selection of

    Products
  19. What would Low-Flying-Hawk say?”

  20. Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3.

    Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results
  21. 2 Pizza Team Responsibilities Responsible for Their product Deployment tools

    CI/CD tools Monitoring tools Metrics tool Logging tools APM tools Infrastructure provisioning tools Security tools Database management tools Testing tools …. Not responsible for * *Unless their product belongs in the blue
  22. You build it; you ship it

  23. Achieving Operational Excellence Tools

  24. Tools to Operate the Cloud • Test Automation • Configuration

    Management • Software Deployment • Monitoring and Visualization • Reporting • Change Management • Incident Management • Trouble Ticketing • Security Auditing • Forecasting and Planning
  25. Calling Houston… Website Deployment team “website-push” perl script

  26. Calling Houston… Website Deployment team “website-push” perl script Command line

    tools Hand build Hand deploy to NFS % /opt/amazon/customer-service/bin/request-refund
  27. Breaking the monolith

  28. Breaking the monolith ü Small ü Focused ü Single-purpose ü

    Connected via HTTP API
  29. Conway’s law Architecture Organization THEIR PRODUCT Deployment tools CI/CD tools

    Monitoring tools Metrics tool Logging tools APM tools Infrastructure provisioning tools Security tools Database management tools Testing tools …. “Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.” — M. Conway
  30. You measure. You collect data. You listen to anecdotes.

  31. Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3.

    Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results
  32. Wait Write Code Wait Build Code Wait Deploy to Test

    Deploy to Prod
  33. • Centralized and hosted build system • Generating artifacts to

    deploy Brazil
  34. • Deployment service • No downtime deployments • Health checking

    • Versioned artifacts and rollbacks https://www.allthingsdistributed.com/2014/11/apollo-amazon-deployment-engine.html
  35. Pipelines • Path code takes from check-in to production •

    Where automation, testing, and approvals happen • Enabler of continuous deployment
  36. Example Pipeline and Stages Packages Revision history VersionSet Revision history

    Gamma Revision history Status Approval status - Diff PDX-Prod Revision history Compliance verification Status L1 approval L2 approval Deploy when ready Status Cancel Approval Workflow Prod - Rest Revision history Whitelisting Status Approval Workflow Approve Not Approve Not >> >>
  37. Hundreds of millions of deployments a year - as of

    2019
  38. https://aws.amazon.com/devops/

  39. Achieving Operational Excellence Processes

  40. “Oh! Those tables always come back, and they’re always damaged.

    They’re not packaged right, so the surface of the table always gets scratched.”
  41. People already have good intentions

  42. If good intentions don’t work, what does?

  43. Mechanisms

  44. 1902

  45. Toyota will not allow any defect that they know about

    to go down the manufacturing line.
  46. Image Source: https://www.autoguide.com/auto-news/2016/01/toyota-production-japan-may-stop-next-month- due-to-steel-shortage.html

  47. Image Source: https://www.autoguide.com/auto-news/2016/01/toyota-production-japan-may-stop-next-month- due-to-steel-shortage.html Andon Cord

  48. The Andon Cord

  49. Andon Customer Service

  50. None
  51. Jeff Bezos 2012 Shareholder Letter We noticed that you experienced

    poor video playback while watching the following rental on Amazon Video On Demand: Casablanca. We’re sorry for the inconvenience and have issued you a refund for the following amount: $2.99. We hope to see you again soon.
  52. "Good intentions never work, you need good mechanisms to make

    anything happen." Jeff Bezos
  53. Good Mechanisms ≈ Complete Processes Tools Adoption Audit

  54. Correction of Errors (COE) Mechanism to learn from our mistakes

    • technical flaws • process flaws • documentation flaws • organizational flaws • other flaws Mechanism to identify contributing factors to failures Mechanism to drive CONTINUOUS IMPROVEMENT
  55. Anatomy of a COE • What happened? • What data

    do you have to support this? • Metrics and graphs • What was the impact on customers and your business? • What are the contributing factors? • Don’t stop at operators. • What lessons did you learn? • What corrective actions are you taking? • Actions items • Related items (trouble tickets etc.) https://www.youtube.com/watch?v=yQiRli2ZPxU
  56. Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3.

    Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results
  57. Audit Weekly Operational Metrics Review • Continuous inspection mechanism •

    Maintains focus on operations • Foundation of a healthy operations program Typical Agenda (~15min) • Share successes and failings • Action items follow up • Review COEs • Review key service metrics • Identify new best practices https://aws.amazon.com/blogs/opensource/the-wheel/
  58. Continuous Improvement

  59. Policy Engine • Automated risk and opportunity analyzer • Identifies

    potential risks to availability, infrastructure, security and more • Both inherited and direct • Highlights potential opportunities to optimize resource utilization • Extensible and configurable • Provides single-pane-of-glass view into policy compliance • Allows acknowledgment • Reports roll-up the organization hierarchy Mechanism to propagate local learnings globally
  60. In conclusion... Achieving operational excellence requires: an operationally focused culture

    a rich set of tools the right processes • Good Intentions Don’t Work • Mechanisms Work
  61. “The world, thankfully, is full of many high-performing, highly distinctive

    corporate cultures. We never claim that our approach is the right one – just that it’s ours – and over the last two decades, we’ve collected a large group of like-minded people. Folks who find our approach energizing and meaningful.” Jeff Bezos - 2015 Amazon.com letter to shareholders
  62. Thank you! © 2020, Amazon Web Services, Inc. or its

    affiliates. All rights reserved. Adrian Hornsby @adhorn https://medium.com/@adhorn https://dev.to/adhorn