Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building software at Google Scale

Building software at Google Scale

April 12. CSA event at Google.

Lee Boonstra

April 13, 2018

More Decks by Lee Boonstra

Other Decks in Technology


  1. Confidential & Proprietary Google Cloud Platform 1 By Lee Boonstra,

    Customer Engineer Google Cloud leeboonstra@google.com Building software at Google Scale How does Google build software & How can you benefit from this
  2. 2 1. How the engineering processes at Google works Engineering

    at Google 3. From open source to Google Cloud for enterprises 2. Our learnings, how we contribute back to open source
  3. Google Cloud Platform 13 What it takes to be a

    Google engineer Working on problems with SPEED AND SCALE is a challenge. Engineers keep raising the bar on the tools and infrastructure. Google Culture: • Collaboration and co-development • Sharing between products and teams (tools, libraries, services) • Engineers have autonomy. • Agile/Scrum, daily stand-up meetings
  4. Google Repository statistics As of Jan 2015 Total number of

    files 1+ billion Number of source files 9 million Lines of code 2+ billion Depth of history 35 million commits Size of content 86 terabytes
  5. Google Cloud Platform 17 Advantages of monolithic repo • Unified

    versioning - One source of truth • Extensive code sharing and reuse • Collaboration across teams • Simplified dependency management • Large scale refactoring • Flexible team boundaries & code ownership • Code visibility
  6. Google Cloud Platform 18 Automated Test / Analysis Google uses

    its own version control system called: Piper Sync workspace Write code Code Review Commit Read/Write Access per folder Code Quality & Syntax Check (by humans and by tooling) Create personal copy Auto Rollback if needed MANDATORY A single code tree, with fast access to the code through tooling. All types of code languages. Everyone, works in Trunk. - Branches are for releases.
  7. Google Cloud Platform 20 Testing at Google • Developing &

    Testing go hand in hand • 3 million test a day • 20+ OS and Browser combos
  8. Google Cloud Platform 22 Build systems Why do we need

    build systems? Well code has a lot of dependencies and you don’t want to compile and link these all manually. The steps of a general build system: 1. Loading 2. Analysis 3. Execution by build system
  9. Google Cloud Platform 23 Google’s continuous build and test system

    Google has its own continuous build & test system. Remember, at Google we develop everything at HEAD in the repo. Endless CPU, Cross User Caching, because of Cloud Computing.
  10. Confidential & Proprietary Google Cloud Platform 24 Devops at Google

    Product idea Writing code Testing Building Deploying
  11. Each week Google launches over 4 billion containers. Google is

    using container technology for more than 10 years.
  12. Enter the container Virtual machine OS Dependencies Application Code Hardware

    Bare-metal server OS Dependencies Application Code Hardware Container OS Dependencies Application Code Hardware
  13. Google Cloud Platform 27 So, you mean Docker? 2004 2016

    • Docker is a popular software container platform. • Containers are a way to package software in a format that can run isolated on a shared operating system.
  14. Enter the container… and new challenges • Scheduling, scaling across

    clusters of servers • Networking and connectivity • Security and Access control • Logging, Monitoring, and Debugging • Health checks and uptime preservation • ...
  15. Google Cloud Platform 29 Large-scale cluster management at Google with

    Borg 2004 2016 • It’s software that manages all production machines at Google and runs jobs (binaries) that engineers give it on them. • Borg ran pretty much everything inside the company, including Google Search, Gmail, Google Maps, Google Docs... • These binaries are run in a container environment. • When tasks die, they are automatically started up again, and they may run on a different machine.
  16. Confidential & Proprietary Google Cloud Platform 30 Site Reliability Engineering

    Product idea Writing code Testing Building Deploying SRE
  17. “Hope is not a strategy. Engineering solutions to design, build,

    and run large-scale systems scalably, reliably and efficiently is a strategy, and a good one.”
  18. 32 Site Reliability Engineering • Site Reliability Engineering is a

    specialized job function that focuses on the reliability and maintainability of large systems. • SRE is also a mindset, and a set of engineering approaches to running better production systems • Google has SRE teams of site reliability engineers responsible for a service globally available. https://landing.google.com/sre/book.html
  19. 34 Google is leader in Open Source 287,024 Commits by

    Googlers to Open Source Projects on GitHub in 2016 15,000+ Projects Contributed to in 2016
  20. 37 https://research.google.com/ Google wrote lots of white papers which inspires

    the big data community. • Bigtable • GFS • Mapreduce • Chubby • Sawzall • Dapper • Dremel • Borg
  21. Google Cloud Platform 38 From Google to OSS 2004 2016

    Internal Google • Internal Build System • Borg Container Orchestration • Machine Learning • Go Lang • Google Chrome Open Source • Bazel • Kubernetes • Tensorflow • Go Lang • Chromium
  22. 39 Tensorflow Tensorflow is what we use for our own

    internal machine learning projects, and now it’s available to you! Google made it open source. More than 480 contributions 10,000 commits in a year 53k star rating Tutorials to get started at https://www.tensorflow.org
  23. Google Cloud Platform 40 Bazel You will need a build

    system, if you work with teams. Google’s build system, is now available open source. Google has been working on this for more than 10 years. Now you can benefit from this. https://bazel.build/ • Scalable: Bazel helps you scale your organization, codebase and Continuous Integration system. It handles codebases of any size, in multiple repositories or a huge monorepo. • Platform independent: Works on Cloud or On Premise. • Any language: Build and test Java, C++, Android, iOS, Go and a wide variety of other language platforms (via extensions).
  24. 41 Kubernetes abstracts away the hardware infrastructure and exposes your

    whole data center as a single enormous computing resource. • Multiple container engines (Docker, rkt, Windows) • Cloud and bare-metal environments • Container Engine = Managed Kubernetes in Google Cloud Kubernetes https://kubernetes.io
  25. 42 Kubernetes Open Source Community 50k+ commits in Kubernetes 1,000+

    unique contributors Top 0.001% of all GitHub Projects 4000+ External Projects Based on Kubernetes Companies Contributing Supported by a broad ecosystem of partners, offering you cloud provider flexibility:
  26. 43 • A complete framework for connecting, securing, managing and

    monitoring services • Secure and monitor traffic for microservices and legacy services without requiring any changes to application code • An open platform with key contributions from Google, IBM, Lyft and others • Allows developers to authenticate and secure the communications between different applications using a TLS connection • Multi-environment and multi-platform, but Kubernetes first Istio https://istio.io
  27. Istio benefits: enabling hybrid GKE on GCP VMs on GCE

    (or elsewhere) K8s on-prem Vendor-managed K8s. EKS? AKS?
  28. Google Cloud Platform 47 From OSS to Google Cloud 2004

    2016 Open Source • Kubernetes • Istio • Tensorflow • MySQL / Postgresql • Spark / Hadoop • Apache Beam • iPython Google Cloud • Google Kubernetes Engine • Managed Istio • ML Engine • Cloud SQL • Dataproc • Dataflow • Datalab
  29. Then we got serious. We built our own hardware for

    AI. Cloud Machine Learning Engine
  30. Google Cloud Platform 50 Learnings From Google to Google Cloud

    2004 2016 Google • Build for Scalability • Build for Security Google Cloud • Build for Enterprise ◦ Secure ◦ Scalable ◦ Compliant
  31. Google Cloud Platform 51 1+ Billion Users • 2 trillion

    Google searches annually • 65 billion downloads of apps from its Google Play store. • More than 1 billion people are using the Chrome browser on mobile devices every month. • 200 million people per month are using its online photo service, Google Photos.
  32. Confidential & Proprietary Assessing Threats Who is the attacker? Lone-wolves

    Script kiddies Insider Risk Hacktivist groups Malicious users Criminal organizations Nation-state actors How are they attacking? DDoS Spear-phishing Malware XSS Man-in-the-middle User error Social 0-days What do they want? $$$$$ Intellectual property Espionage Vandalism Public perception Notoriety
  33. Confidential & Proprietary Usage Audit Logging Safe Browsing API BeyondCorp

    Security Key Enforcement Operations Compliance & Certifications Live Migration Infra maintenance & patching Threat analysis and intelligence Open Source Forensics tools Anomaly Detection (Infrastructure) Incident Response (Infrastructure) Deployment Google Services TLS encryption with perfect forward secrecy Certificate Authority Free and automatic certificates DDoS Mitigation (PaaS & SaaS) Application Peer code review & Static Analysis (Infrastructure SLDC) Source code provenance (Infrastructure) Binary Verification (Infrastructure code) WAF (PaaS & SaaS Use cases) IDS/ IPS (PaaS & SaaS Use cases) Web Application Scanner (Google Services) Network Infrastructure RPC encryption in transit between data centres DNS Global Private Network Andromeda SDN Controller Jupiter Datacenter Network B4 SDN Network Storage Encryption at rest Logging Identity and Access Management Global at scale Key Management Service OS + IPC Hardened KVM Hypervisor Authentication for each host and each job Curated Host Images Encryption of Interservice Communications Boot Trusted Boot Cryptographic Credentials Hardware Purpose-built Chips Purpose-built Servers Purpose-built Storage Purpose-built Network Purpose-built Data Centers Infrastructure security
  34. Confidential & Proprietary Secure yourself on Google Cloud By default

    Google products Partner tools Other Usage Cloud Audit Logging Safe Browsing API Identity-Aware Proxy Security Key Enforcement Operations Compliance and Certifications Automatic Updates and Patching Threat analysis and intelligence Forensics Anomaly detection Incident Response Deployment Google Services TLS encryption with perfect forward secrecy Certificate Authority Free and automatic certificates DDoS Mitigation via GCLB Alternative DDoS Mitigation Solutions Application Code review & Static Analysis Source code provenance Binary verification WAF IDS/ IPS Vuln Management Network Cloud DNS Cloud VPN Virtual Private Cloud (VPC) Cloud Router Shared VPC NGFW Storage Encryption at rest Logging Identity and Access Management Cloud Key Management Service Customer-Supplied Encryption Keys Data Loss Protection API OS + IPC Hardened KVM Hypervisor Authentication for each host and each job Curated Host Images Encryption of Interservice Communications Boot Trusted Boot Cryptographic Credentials Hardware Purpose-built Chips Purpose-built Servers Purpose-built Storage Purpose-built Network Purpose-built Data Centers Login anomalies for Google Identities Google Managed Infrastructure Foundation Threat Intelligence CDN Cloud Load Balancing Web Application Scanning DLP Secure Config/ Assessment/ Enforcement
  35. 58 Google has over a decade experience with building secure

    software on large scale. Conclusion Your company can make use of the same infrastructure like Google does. Scalable, Secure and Open. The learnings are shared through whitepapers and contributed back through open source.