Slide 1

Slide 1 text

THE IMPORTANCE OF BEING WELL-ARCHITECTED_ JON TOPPER | @jtopper | he/him/his

Slide 2

Slide 2 text

$ whoami Founder/CEO/CTO The Scale Factory Working in hosting/infrastructure for 20 years Infrastructure / AWS / DevOps

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Leading Well-Architected Partner Worldwide >200 Reviews Completed Since April 2018 Book a Well-Architected review today https:/ /scalefactory.com/services/well-architected/ $5,000 funding available to support improvement work

Slide 5

Slide 5 text

0 45 90 135 180 Mar-2018 May-2018 Jul-2018 Sep-2018 Nov-2018 Jan-2019 Mar-2019 May-2019 Jul-2019 Sep-2019 Nov-2019 Jan-2020 REVIEWS RUN_

Slide 6

Slide 6 text

THE TEAM_

Slide 7

Slide 7 text

OUR CLIENTS_

Slide 8

Slide 8 text

TODAY’S AGENDA_ What is Well-Architected? What is a Well-Architected Review? Pillar overview Common review findings

Slide 9

Slide 9 text

WHAT IS WELL-ARCHITECTED?_

Slide 10

Slide 10 text

WELL ARCHITECTED ORIGINS_ Catalogue of emergent good practices Observed by AWS Field Solutions Architects Codified and shared Platform agnostic*

Slide 11

Slide 11 text

! !" # $# #! !! %! # % !%" $ White Papers Review Tool

Slide 12

Slide 12 text

Performance Efficiency Cost Optimisation Operational Excellence Reliability Security

Slide 13

Slide 13 text

IoT (Internet of Things) High Performance Computing Serverless Applications Lenses Machine Learning

Slide 14

Slide 14 text

USING WELL-ARCHITECTED_ Gap analysis / planning Teaching Team alignment

Slide 15

Slide 15 text

WHAT IS A WELL-ARCHITECTED REVIEW?_

Slide 16

Slide 16 text

WELL ARCHITECTED REVIEW_ Foundational questions Up to 4 hours Qualitative

Slide 17

Slide 17 text

Performance Efficiency Cost Optimisation Operational Excellence Reliability Security Well Architected Core Serverless Applications High Performance Computing IoT (Internet of Things) 9 11 9 8 9 2 3 2 1 1 4 3 3 4 2 4 11 6 10 4 46 9 16 35 Machine Learning 7 3 5 2 4 21

Slide 18

Slide 18 text

QUESTION OPS 1_ • Evaluate external customer needs • Evaluate internal customer needs • Evaluate compliance requirements • Evaluate threat landscape • Evaluate tradeoffs • Manage benefits and risks • None of these How do you determine what your priorities are?

Slide 19

Slide 19 text

USING WELL-ARCHITECTED_ Gap analysis / planning Teaching Team alignment

Slide 20

Slide 20 text

PILLAR OVERVIEW_

Slide 21

Slide 21 text

Performance Efficiency Cost Optimisation Operational Excellence Reliability Security

Slide 22

Slide 22 text

MAIN AREAS_ Operational priorities Software Delivery Lifecycle Risk reduction Monitoring / Telemetry Documentation Continuous Improvement

Slide 23

Slide 23 text

CONTINUOUS DELIVERY PIELINE_ Linting Unit Tests Artefact Build SAST Deploy to Test Integration Test Performance Test DAST Deploy to UAT Deploy to Live

Slide 24

Slide 24 text

https:/ /services.google.com/fh/files/misc/state-of-devops-2019.pdf

Slide 25

Slide 25 text

CENTRAL MONITORING_ User monitoring Application monitoring (RED) Infrastructure monitoring Make dashboards available

Slide 26

Slide 26 text

CENTRAL LOGGING_ Structured logs Add relevant IDs (transaction, user, etc) Make dashboards available Represent “events” on dashboards

Slide 27

Slide 27 text

Performance Efficiency Cost Optimisation Operational Excellence Reliability Security

Slide 28

Slide 28 text

"Low performers take weeks to conduct security reviews and complete the changes identified. In contrast, elite performers build security in and can conduct security reviews and complete changes in just days." http:/ /services.google.com/fh/files/misc/state-of-devops-2018.pdf

Slide 29

Slide 29 text

MAIN AREAS_ Identity and access management Detective controls Infrastructure protection Data protection Incident response

Slide 30

Slide 30 text

Somebody Else's Problem

Slide 31

Slide 31 text

QUESTION SEC 11_ • Identify key personnel and external resources • Identify tooling • Develop incident response plans • Automate containment capability • Identify forensic capabilities • Pre-provision access • Pre-deploy tools • Run game days • None of these How do you respond to a [security] incident? WA CI High Risk 75% NI 51% HRI Rank: 2 WA WA NI 27% 39% 0% 11% 27% 10% 3% 35% NI NI NI (93%)

Slide 32

Slide 32 text

QUESTION SEC 8_ • Define data classification requirements • Define data protection controls • Implement data identification • Automate identification and classification • Identify the types of data • None of these How do you classify your data? WA CI High Risk 75% HRI Rank: 3 WA WA 61% 39% 17% 4% 59% 23% NI NI (88%)

Slide 33

Slide 33 text

TEAMS NEED TO DO BETTER AT SECURITY_ Poor hygiene around patching Limited data classification Mediocre human access control Bad programmatic access control Low adoption of security monitoring tools

Slide 34

Slide 34 text

Performance Efficiency Cost Optimisation Operational Excellence Reliability Security

Slide 35

Slide 35 text

MAIN AREAS_ Foundations Change Management Failure Management

Slide 36

Slide 36 text

AVAILABILITY DESIGN_ Clustering / Failover Autoscaling Multi-AZ (and Multi-Region) operation Caching Asynchronous processing Backpressure / Circuit breakers

Slide 37

Slide 37 text

PLAN FOR FAILURE_ Build a list of potential failure scenarios Understand how your platform will react Game days (tabletop) Mitigate / Document Game days (live failure injection)

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

QUESTION REL 8_ • Use playbooks for unanticipated failures • Conduct root cause analysis and share results • Inject failures to test resiliency • Conduct game days regularly • None of these How do you test resilience? WA CI High Risk 67% HRI Rank: 5 WA 25% NI (92%) NI 73% 6% 0% 16%

Slide 40

Slide 40 text

QUESTION REL 9_ • Define recovery objectives for downtime and data loss • Use defined recovery strategies to meet the recovery objectives • Test disaster recovery implementation to validate the implementation • Manage configuration drift on all changes • Automate recovery • None of these How do you plan for disaster recovery? WA CI High Risk 79% NI 33% HRI Rank: 1 WA WA NI 33% 25% 39% 16% 31% (87%)

Slide 41

Slide 41 text

TEAMS ARE BAD AT THINKING ABOUT FAILURE MODES_ Not considering business requirements No risk analysis of failure modes Poor documentation Almost no attempt to rehearse outages

Slide 42

Slide 42 text

Performance Efficiency Cost Optimisation Operational Excellence Reliability Security

Slide 43

Slide 43 text

MAIN AREAS_ Choosing the right components Configuring things correctly Reviewing these choices regularly Monitoring for performance

Slide 44

Slide 44 text

EC2 Instances Containers on k8s or ECS Containers on Fargate Lambda Compute Most Security effort required Least Security effort required Least Serverless Most Serverless Least Opinionated Most Opinionated Least suitable for microservices Most suitable for microservices

Slide 45

Slide 45 text

Data RDS PostgreSQL RDS MySQL Aurora RedShift DynamoDB MongoDB DocumentDB AWS Neptune Neo4J Cassandra Amazon Timestream InfluxDB Relational Key-Value Document Graph Time Series Quantum Ledger Ledger Hyperledger Fabric Ethereum MySQL PostgreSQL

Slide 46

Slide 46 text

Performance Efficiency Cost Optimisation Operational Excellence Reliability Security

Slide 47

Slide 47 text

QUESTION COST 9_ • Establish a cost optimisation function • Develop a workload review process • Review and implement services in an unplanned way • Review and analyse this workload regularly • Keep up to date with new service releases • None of these How do you evaluate new services? WA CI High Risk 71% HRI Rank: 4 WA 34% 26% 84% NI NI (79%) NI 43% 63% 1%

Slide 48

Slide 48 text

MAIN AREAS_ Governance Monitoring for usage Decommissioning unused resources Matching supply/demand Using pricing models

Slide 49

Slide 49 text

WHAT NEXT?_ Read the white papers: https:/ /aws.amazon.com/architecture/well-architected/ Run your own review(s) https:/ /aws.amazon.com/well-architected-tool/ Engage a AWS Well-Architected partner https:/ /scalefactory.com/services/well-architected/ (funding available)

Slide 50

Slide 50 text

EVERYONE IS BETTER AT BUILDING PLATFORMS THAN THEY ARE AT SECURING OR RUNNING THEM_

Slide 51

Slide 51 text

TALK TO US ABOUT: CONSULTANCY TRAINING WELL-ARCHITECTED MIGRATION

Slide 52

Slide 52 text

Leading Well-Architected Partner Worldwide >200 Reviews Completed Since April 2018 Book a Well-Architected review today https:/ /scalefactory.com/services/well-architected/ $5,000 funding available to support improvement work

Slide 53

Slide 53 text

BREAKFAST OPS_ Monthly hosted discussion For CTOs and tech decision makers

Slide 54

Slide 54 text

Q&A_

Slide 55

Slide 55 text

KEEP IN TOUCH_ http:/ /www.scalefactory.com/ @scalefactory [email protected]