Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GCP: Move Fast and Don’t Break Things

GCP: Move Fast and Don’t Break Things

Tools developers should know how to leverage to drive productivity

Ankit Mehta

March 12, 2019
Tweet

More Decks by Ankit Mehta

Other Decks in Technology

Transcript

  1. OPS213: GCP: Move Fast and Don’t Break Things Ankit Mehta,

    Engineering Productivity Director, Google Cloud Mithra Rajah, Software Engineer, Tools and Infrastructure, Google Cloud
  2. • Engineering Productivity (EngProd) makes quality software development easier and

    faster. • We do this by adhering to our Guiding Principles of Driving Test Health, Avoiding Performance Regressions and Ensuring High Quality Releases. • Anthos Engineering Productivity tools and how you can benefit from them. What are we going to talk about...
  3. • Innovate • Address flaws quickly • Better productivity •

    Better Code Health • User trust/satisfaction • Uphold the brand • Launch products • Set a high bar Moving fast is good • Innovate • Address flaws quickly • Better productivity • Better Code Health But breaking things isn’t • Erodes user safety satisfaction • Diminishes the brand • Hard to launch products • Hard to set a high bar
  4. Make engineers more productive through tools, infrastructure, automation and analysis

    From Test Engineering to Engineering Productivity Version 1.0 Release Test Submit
  5. Make engineers more productive through tools, infrastructure, automation and analysis

    From Test Engineering to Engineering Productivity Version 2.0 Production/Stability Monitor Experiment Release Test Submit Review Author Design ML Mobile Frontends Backends + Pipelines Metrics/Insights
  6. Development at Google 800k+ Builds per day 2 billion+ lines

    of code 60,000+ code submissions per day 150M automated tests run per day
  7. Prevent bugs - don’t just catch them Local Development Workflow

    Build/Test Run blaze locally. The builds, unit tests, outputs and test results run remotely. Continuous Integration Remote execution service that runs hermetic builds/tests Code Create a workspace (Git equivalent branch)
  8. Cloud Integration Testing Framework Drive Test Health Avoid Performance Regressions

    Ensure High Quality Releases Anchor with actionable metrics
  9. Predictable Releases • Release velocity: rate of pushing features from

    development to production • Decouple feature release from binary release • A release should be a train going out the door • Rollback first, fix in a subsequent release Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics
  10. Project Health • Overall score = minimum of 10 dimensions

    -> focus • Assessed on 5pt scale • Assessed daily automatically • Based on trailing 90d pH-Level-1 (configured) pH-Level-2 (improving) pH-Level-3 (acceptable) pH-Level-4 (commendable) pH-Level-5 (exemplary) tap-greeness configured 60% 74% 90% 95% tap-flakiness configured < 20% < 7% < 2% < 1% absolute- coverage ? configured 60% 75% 90% incremental- coverage ? configured 70% 80% 90% presubmit- coverage ? configured > 67% > 98% > 99% presubmit- ignored configured 20CLs 6CLs 3CLs 1CL presubmit- latency configured 35m 25m 15m 10m release- duration configured 3w 1w 3d 6h release- cherrypick- count configured 10CLs 5CLs 3CLs 1CL release- granularity configured 1000CLs 500CLs 100CLs 10CLs Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics
  11. Changing Landscape Google’s software is in a single shared repository.

    Now the landscape everywhere is changing. The world now uses VMs, Kubernetes, Github, Scalable storage and so on. We need to evolve to match the landscape.
  12. • Continuous Integration for Kubernetes • Started in ~2016. Fully

    replaced Jenkins - Jan 2018 ◦ No longer need a Jenkins master ◦ Update service daily without down time • Test k8s on k8s! ◦ Easy to maintain/deploy ◦ Job config is basically a podspec ◦ Respond to github webhooks, rather than a munger Prow Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics
  13. Testgrid • Grid of test results • Fast Access: Pre-computed

    aggregation of test results • Interactive: Sorting, filtering, grouping, filing bugs, finding changes • Highly Configurable • Integrated with cloud build Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics
  14. KIND Hermetic testing • Local K8s cluster • No external

    dependencies • Integration testing K8s applications Kubernetes IN Docker Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics
  15. KIND Demo Drive Test Health Avoid Performance Regressions Ensure High

    Quality Releases Anchor with actionable metrics
  16. Mako • Visualize performance data • Automate identification of performance

    regressions • Collaborate with developers around performance Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics http://mako.dev
  17. Jenkins X CI/CD using Prow and Knative • Developer creates

    PR Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics • Github sends PR notification to Prow • Prow creates knative build • Knative build runs jenkinsfile pipeline • Prow uploads results to resultstore
  18. File master 1.12 1.11 api/ 49.8% 49.6% 48.1% apis/ 32.8%

    32.6% 32.4% auth/ 9.8% 9.8% 9.8% capabilities/ 36.4% 36.4% 36.4% client/ 1.3% 1.3% 1.3% controller/ 30.2% 30.3% 28.1% credentialprovider/ 11.1% 11.1% 11.1% features/ 0.0% 0.0% 0.0% fieldpath/ 51.4% 51.4% 51.4% kubeapiserver/ 2.8% 2.8% 2.8% kubectl/ 0.0% 0.0% 0.0% kubelet/ 39.4% 39.0% 38.3% master/ 16.1% 16.1% 16.1% printers/ 2.1% 2.1% 2.1% probe/ 59.2% 59.2% 59.2% proxy/ 16.1% 16.1% 16.1% quota/ 6.7% 6.5% 6.9% registry/ 24.4% 24.2% 25.4% Conformance Tooling Drive Test Health Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics
  19. Knative Conformance Coverage results display on TestGrid Drive Test Health

    Avoid Performance Regressions Ensure High Quality Releases Anchor with actionable metrics
  20. VALIDATING GCP Anthos - On to a Hybrid Cloud GitHub

    GitHub, Google3 Cloud Build Cloud Build GCP VSphere Lab VALIDATING ANTHOS Source Build/Test Verify Release
  21. Anthos Qualification • Continuously and reliably validate Anthos binaries •

    Enable teams to integrate easily • Provide release readiness signals Development Continuous Integration Reporting Git Checkout Presubmit Conformance Tests Testgrid Periodic System Under Test Cluster Management Cluster Operators LogMon C O D E Tests Passed ? Submit Yes Presubmit No Test Results vSphere Lab Listen Create Dynamic Admin On Prem GKE Cluster Test Results
  22. • Engineering Productivity (EngProd) makes quality software development easier and

    faster. • We do this by adhering to our Guiding Principles of Driving Test Health, Avoiding Performance Regressions and Ensuring High Quality Releases. • Anthos Engineering Productivity tools and how you can benefit from them. To Summarize
  23. Your Feedback is Greatly Appreciated! Complete the session survey in

    mobile app 1-5 star rating system Open field for comments Rate icon in status bar