Slide 1

Slide 1 text

OPS213: GCP: Move Fast and Don’t Break Things Ankit Mehta, Engineering Productivity Director, Google Cloud Mithra Rajah, Software Engineer, Tools and Infrastructure, Google Cloud

Slide 2

Slide 2 text

● Engineering Productivity (EngProd) makes quality software development easier and faster. ● We do this by adhering to our Guiding Principles of Driving Test Health, Avoiding Performance Regressions and Ensuring High Quality Releases. ● Anthos Engineering Productivity tools and how you can benefit from them. What are we going to talk about...

Slide 3

Slide 3 text

● Innovate ● Address flaws quickly ● Better productivity ● Better Code Health ● User trust/satisfaction ● Uphold the brand ● Launch products ● Set a high bar Moving fast is good ● Innovate ● Address flaws quickly ● Better productivity ● Better Code Health But breaking things isn’t ● Erodes user safety satisfaction ● Diminishes the brand ● Hard to launch products ● Hard to set a high bar

Slide 4

Slide 4 text

Balancing Velocity and Quality

Slide 5

Slide 5 text

Make engineers more productive through tools, infrastructure, automation and analysis From Test Engineering to Engineering Productivity Version 1.0 Release Test Submit

Slide 6

Slide 6 text

Make engineers more productive through tools, infrastructure, automation and analysis From Test Engineering to Engineering Productivity Version 2.0 Production/Stability Monitor Experiment Release Test Submit Review Author Design ML Mobile Frontends Backends + Pipelines Metrics/Insights

Slide 7

Slide 7 text

Development at Google 800k+ Builds per day 2 billion+ lines of code 60,000+ code submissions per day 150M automated tests run per day

Slide 8

Slide 8 text

Biweekly - twice a week Releases

Slide 9

Slide 9 text

Prevent bugs - don’t just catch them Local Development Workflow Build/Test Run blaze locally. The builds, unit tests, outputs and test results run remotely. Continuous Integration Remote execution service that runs hermetic builds/tests Code Create a workspace (Git equivalent branch)

Slide 10

Slide 10 text

Guiding Principles Debuggable Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 11

Slide 11 text

Cloud Integration Testing Framework Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 12

Slide 12 text

Performance Benchmarking Library Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 13

Slide 13 text

Predictable Releases ● Release velocity: rate of pushing features from development to production ● Decouple feature release from binary release ● A release should be a train going out the door ● Rollback first, fix in a subsequent release Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 14

Slide 14 text

Project Health ● Overall score = minimum of 10 dimensions -> focus ● Assessed on 5pt scale ● Assessed daily automatically ● Based on trailing 90d pH-Level-1 (configured) pH-Level-2 (improving) pH-Level-3 (acceptable) pH-Level-4 (commendable) pH-Level-5 (exemplary) tap-greeness configured 60% 74% 90% 95% tap-flakiness configured < 20% < 7% < 2% < 1% absolute- coverage ? configured 60% 75% 90% incremental- coverage ? configured 70% 80% 90% presubmit- coverage ? configured > 67% > 98% > 99% presubmit- ignored configured 20CLs 6CLs 3CLs 1CL presubmit- latency configured 35m 25m 15m 10m release- duration configured 3w 1w 3d 6h release- cherrypick- count configured 10CLs 5CLs 3CLs 1CL release- granularity configured 1000CLs 500CLs 100CLs 10CLs Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 15

Slide 15 text

Metrics Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 16

Slide 16 text

Guiding Principles Debuggable Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 17

Slide 17 text

Changing Landscape Google’s software is in a single shared repository. Now the landscape everywhere is changing. The world now uses VMs, Kubernetes, Github, Scalable storage and so on. We need to evolve to match the landscape.

Slide 18

Slide 18 text

Open Source Development

Slide 19

Slide 19 text

● Continuous Integration for Kubernetes ● Started in ~2016. Fully replaced Jenkins - Jan 2018 ○ No longer need a Jenkins master ○ Update service daily without down time ● Test k8s on k8s! ○ Easy to maintain/deploy ○ Job config is basically a podspec ○ Respond to github webhooks, rather than a munger Prow Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 20

Slide 20 text

Testgrid ● Grid of test results ● Fast Access: Pre-computed aggregation of test results ● Interactive: Sorting, filtering, grouping, filing bugs, finding changes ● Highly Configurable ● Integrated with cloud build Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 21

Slide 21 text

KIND Hermetic testing ● Local K8s cluster ● No external dependencies ● Integration testing K8s applications Kubernetes IN Docker Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 22

Slide 22 text

KIND Demo Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 23

Slide 23 text

Mako ● Visualize performance data ● Automate identification of performance regressions ● Collaborate with developers around performance Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics http://mako.dev

Slide 24

Slide 24 text

Jenkins X CI/CD using Prow and Knative ● Developer creates PR Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics ● Github sends PR notification to Prow ● Prow creates knative build ● Knative build runs jenkinsfile pipeline ● Prow uploads results to resultstore

Slide 25

Slide 25 text

File master 1.12 1.11 api/ 49.8% 49.6% 48.1% apis/ 32.8% 32.6% 32.4% auth/ 9.8% 9.8% 9.8% capabilities/ 36.4% 36.4% 36.4% client/ 1.3% 1.3% 1.3% controller/ 30.2% 30.3% 28.1% credentialprovider/ 11.1% 11.1% 11.1% features/ 0.0% 0.0% 0.0% fieldpath/ 51.4% 51.4% 51.4% kubeapiserver/ 2.8% 2.8% 2.8% kubectl/ 0.0% 0.0% 0.0% kubelet/ 39.4% 39.0% 38.3% master/ 16.1% 16.1% 16.1% printers/ 2.1% 2.1% 2.1% probe/ 59.2% 59.2% 59.2% proxy/ 16.1% 16.1% 16.1% quota/ 6.7% 6.5% 6.9% registry/ 24.4% 24.2% 25.4% Conformance Tooling Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 26

Slide 26 text

Knative Conformance Coverage results display on TestGrid Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics

Slide 27

Slide 27 text

VALIDATING GCP Anthos - On to a Hybrid Cloud GitHub GitHub, Google3 Cloud Build Cloud Build GCP VSphere Lab VALIDATING ANTHOS Source Build/Test Verify Release

Slide 28

Slide 28 text

Anthos Qualification ● Continuously and reliably validate Anthos binaries ● Enable teams to integrate easily ● Provide release readiness signals Development Continuous Integration Reporting Git Checkout Presubmit Conformance Tests Testgrid Periodic System Under Test Cluster Management Cluster Operators LogMon C O D E Tests Passed ? Submit Yes Presubmit No Test Results vSphere Lab Listen Create Dynamic Admin On Prem GKE Cluster Test Results

Slide 29

Slide 29 text

● Engineering Productivity (EngProd) makes quality software development easier and faster. ● We do this by adhering to our Guiding Principles of Driving Test Health, Avoiding Performance Regressions and Ensuring High Quality Releases. ● Anthos Engineering Productivity tools and how you can benefit from them. To Summarize

Slide 30

Slide 30 text

That’s a wrap.

Slide 31

Slide 31 text

Your Feedback is Greatly Appreciated! Complete the session survey in mobile app 1-5 star rating system Open field for comments Rate icon in status bar