Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Towards Operational Excellence Adrian Hornsby Principal Evangelist - Architecture Amazon Web Services S e s s i o n I D @adhorn

Slide 3

Slide 3 text

What is Operational Excellence?

Slide 4

Slide 4 text

When your whole business is fundamentally dependent on technology, operational excellence is critical.

Slide 5

Slide 5 text

1995

Slide 6

Slide 6 text

Internet Web Server customers Inventory Orders Database Customer Service Tools Fulfillment Center Tools

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

What is Operational Excellence?

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

What is Operational Excellence? • Happy customers! • Consistently exceeding operational goals • Anticipating and addressing problems • Effectively responding to operational issues • Continuously improving …and doing all of this at significant scale.

Slide 12

Slide 12 text

How does a technology organization move toward OE?

Slide 13

Slide 13 text

Achieving Operational Excellence Tools Processes Culture Technology

Slide 14

Slide 14 text

Achieving Operational Excellence Culture

Slide 15

Slide 15 text

Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results https://www.amazon.jobs/en/principles

Slide 16

Slide 16 text

Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results

Slide 17

Slide 17 text

Amazon Flywheel

Slide 18

Slide 18 text

Innovation Convenience Fast Delivery Reduce Customer’s Costs Wide Selection of Products

Slide 19

Slide 19 text

What would Low-Flying-Hawk say?”

Slide 20

Slide 20 text

Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results

Slide 21

Slide 21 text

2 Pizza Team Responsibilities Responsible for Their product Deployment tools CI/CD tools Monitoring tools Metrics tool Logging tools APM tools Infrastructure provisioning tools Security tools Database management tools Testing tools …. Not responsible for * *Unless their product belongs in the blue

Slide 22

Slide 22 text

You build it; you ship it

Slide 23

Slide 23 text

Achieving Operational Excellence Tools

Slide 24

Slide 24 text

Tools to Operate the Cloud • Test Automation • Configuration Management • Software Deployment • Monitoring and Visualization • Reporting • Change Management • Incident Management • Trouble Ticketing • Security Auditing • Forecasting and Planning

Slide 25

Slide 25 text

Calling Houston… Website Deployment team “website-push” perl script

Slide 26

Slide 26 text

Calling Houston… Website Deployment team “website-push” perl script Command line tools Hand build Hand deploy to NFS % /opt/amazon/customer-service/bin/request-refund

Slide 27

Slide 27 text

Breaking the monolith

Slide 28

Slide 28 text

Breaking the monolith ü Small ü Focused ü Single-purpose ü Connected via HTTP API

Slide 29

Slide 29 text

Conway’s law Architecture Organization THEIR PRODUCT Deployment tools CI/CD tools Monitoring tools Metrics tool Logging tools APM tools Infrastructure provisioning tools Security tools Database management tools Testing tools …. “Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.” — M. Conway

Slide 30

Slide 30 text

You measure. You collect data. You listen to anecdotes.

Slide 31

Slide 31 text

Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results

Slide 32

Slide 32 text

Wait Write Code Wait Build Code Wait Deploy to Test Deploy to Prod

Slide 33

Slide 33 text

• Centralized and hosted build system • Generating artifacts to deploy Brazil

Slide 34

Slide 34 text

• Deployment service • No downtime deployments • Health checking • Versioned artifacts and rollbacks https://www.allthingsdistributed.com/2014/11/apollo-amazon-deployment-engine.html

Slide 35

Slide 35 text

Pipelines • Path code takes from check-in to production • Where automation, testing, and approvals happen • Enabler of continuous deployment

Slide 36

Slide 36 text

Example Pipeline and Stages Packages Revision history VersionSet Revision history Gamma Revision history Status Approval status - Diff PDX-Prod Revision history Compliance verification Status L1 approval L2 approval Deploy when ready Status Cancel Approval Workflow Prod - Rest Revision history Whitelisting Status Approval Workflow Approve Not Approve Not >> >>

Slide 37

Slide 37 text

Hundreds of millions of deployments a year - as of 2019

Slide 38

Slide 38 text

https://aws.amazon.com/devops/

Slide 39

Slide 39 text

Achieving Operational Excellence Processes

Slide 40

Slide 40 text

“Oh! Those tables always come back, and they’re always damaged. They’re not packaged right, so the surface of the table always gets scratched.”

Slide 41

Slide 41 text

People already have good intentions

Slide 42

Slide 42 text

If good intentions don’t work, what does?

Slide 43

Slide 43 text

Mechanisms

Slide 44

Slide 44 text

1902

Slide 45

Slide 45 text

Toyota will not allow any defect that they know about to go down the manufacturing line.

Slide 46

Slide 46 text

Image Source: https://www.autoguide.com/auto-news/2016/01/toyota-production-japan-may-stop-next-month- due-to-steel-shortage.html

Slide 47

Slide 47 text

Image Source: https://www.autoguide.com/auto-news/2016/01/toyota-production-japan-may-stop-next-month- due-to-steel-shortage.html Andon Cord

Slide 48

Slide 48 text

The Andon Cord

Slide 49

Slide 49 text

Andon Customer Service

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

Jeff Bezos 2012 Shareholder Letter We noticed that you experienced poor video playback while watching the following rental on Amazon Video On Demand: Casablanca. We’re sorry for the inconvenience and have issued you a refund for the following amount: $2.99. We hope to see you again soon.

Slide 52

Slide 52 text

"Good intentions never work, you need good mechanisms to make anything happen." Jeff Bezos

Slide 53

Slide 53 text

Good Mechanisms ≈ Complete Processes Tools Adoption Audit

Slide 54

Slide 54 text

Correction of Errors (COE) Mechanism to learn from our mistakes • technical flaws • process flaws • documentation flaws • organizational flaws • other flaws Mechanism to identify contributing factors to failures Mechanism to drive CONTINUOUS IMPROVEMENT

Slide 55

Slide 55 text

Anatomy of a COE • What happened? • What data do you have to support this? • Metrics and graphs • What was the impact on customers and your business? • What are the contributing factors? • Don’t stop at operators. • What lessons did you learn? • What corrective actions are you taking? • Actions items • Related items (trouble tickets etc.) https://www.youtube.com/watch?v=yQiRli2ZPxU

Slide 56

Slide 56 text

Culture: Amazon Leadership Principles 1. Customer Obsession 2. Ownership 3. Invent and Simplify 4. Are Right, A Lot 5. Hire and Develop the Best 6. Insist on the Highest Standards 7. Think Big 8. Bias for Action 9. Frugality 10. Learn and Be Curious 11. Earn Trust 12. Dive Deep 13. Have Backbone; Disagree and Commit 14. Deliver Results

Slide 57

Slide 57 text

Audit Weekly Operational Metrics Review • Continuous inspection mechanism • Maintains focus on operations • Foundation of a healthy operations program Typical Agenda (~15min) • Share successes and failings • Action items follow up • Review COEs • Review key service metrics • Identify new best practices https://aws.amazon.com/blogs/opensource/the-wheel/

Slide 58

Slide 58 text

Continuous Improvement

Slide 59

Slide 59 text

Policy Engine • Automated risk and opportunity analyzer • Identifies potential risks to availability, infrastructure, security and more • Both inherited and direct • Highlights potential opportunities to optimize resource utilization • Extensible and configurable • Provides single-pane-of-glass view into policy compliance • Allows acknowledgment • Reports roll-up the organization hierarchy Mechanism to propagate local learnings globally

Slide 60

Slide 60 text

In conclusion... Achieving operational excellence requires: an operationally focused culture a rich set of tools the right processes • Good Intentions Don’t Work • Mechanisms Work

Slide 61

Slide 61 text

“The world, thankfully, is full of many high-performing, highly distinctive corporate cultures. We never claim that our approach is the right one – just that it’s ours – and over the last two decades, we’ve collected a large group of like-minded people. Folks who find our approach energizing and meaningful.” Jeff Bezos - 2015 Amazon.com letter to shareholders

Slide 62

Slide 62 text

Thank you! © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adrian Hornsby @adhorn https://medium.com/@adhorn https://dev.to/adhorn