Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building software at Google Scale

Building software at Google Scale

April 12. CSA event at Google.

Lee Boonstra

April 13, 2018
Tweet

More Decks by Lee Boonstra

Other Decks in Technology

Transcript

  1. Confidential & Proprietary
    Google Cloud Platform 1
    By Lee Boonstra, Customer Engineer Google Cloud
    [email protected]
    Building software at Google Scale
    How does Google build software & How can you benefit from this

    View full-size slide

  2. 2
    1. How the
    engineering
    processes at
    Google works
    Engineering at Google
    3. From open
    source to Google
    Cloud for
    enterprises
    2. Our learnings,
    how we contribute
    back to open
    source

    View full-size slide

  3. Confidential & Proprietary
    Google Cloud Platform 3
    Building software at Google

    View full-size slide

  4. Confidential & Proprietary
    Google Cloud Platform 4
    From product to idea 10x
    Product idea X 10

    View full-size slide

  5. 6
    “To organize the world’s information and make it
    universally accessible and useful.”
    - Google

    View full-size slide

  6. Project Loon:
    Balloon powered internet for everyone!

    View full-size slide

  7. Waymo:
    Self driving car

    View full-size slide

  8. Prototyping: First version of Google class was created in 90 min!

    View full-size slide

  9. Confidential & Proprietary
    Google Cloud Platform 12
    Code Development
    Product idea
    Writing code
    public class foo {}

    View full-size slide

  10. Google Cloud Platform 13
    What it takes to be a Google engineer
    Working on problems with SPEED AND SCALE is a challenge.
    Engineers keep raising the bar on the tools and infrastructure.
    Google Culture:
    • Collaboration and co-development
    • Sharing between products and teams (tools, libraries, services)
    • Engineers have autonomy.
    • Agile/Scrum, daily stand-up meetings

    View full-size slide

  11. Google’s entire codebase is a
    giant single repository of more
    than 2 billion lines of code

    View full-size slide

  12. Google Repository statistics
    As of Jan 2015
    Total number of files 1+ billion
    Number of source files 9 million
    Lines of code 2+ billion
    Depth of history 35 million commits
    Size of content 86 terabytes

    View full-size slide

  13. Google Cloud Platform 17
    Advantages of monolithic repo
    ● Unified versioning - One source of truth
    ● Extensive code sharing and reuse
    ● Collaboration across teams
    ● Simplified dependency management
    ● Large scale refactoring
    ● Flexible team boundaries & code
    ownership
    ● Code visibility

    View full-size slide

  14. Google Cloud Platform 18
    Automated Test / Analysis
    Google uses its own version control
    system called: Piper
    Sync
    workspace
    Write
    code
    Code
    Review
    Commit
    Read/Write Access per folder Code Quality & Syntax Check
    (by humans and by tooling)
    Create personal copy
    Auto Rollback if needed
    MANDATORY
    A single code tree, with fast access to the code through tooling.
    All types of code languages.
    Everyone, works in Trunk. - Branches are for releases.

    View full-size slide

  15. Confidential & Proprietary
    Google Cloud Platform 19
    Software testing
    Product idea
    Writing code
    Testing

    View full-size slide

  16. Google Cloud Platform 20
    Testing at Google
    ● Developing & Testing go hand in hand
    ● 3 million test a day
    ● 20+ OS and Browser combos

    View full-size slide

  17. Confidential & Proprietary
    Google Cloud Platform 21
    Build processes
    Product idea
    Writing code
    Testing
    Building

    View full-size slide

  18. Google Cloud Platform 22
    Build systems
    Why do we need build systems?
    Well code has a lot of dependencies
    and you don’t want to compile and link
    these all manually.
    The steps of a general build system:
    1. Loading
    2. Analysis
    3. Execution by build system

    View full-size slide

  19. Google Cloud Platform 23
    Google’s continuous build and test system
    Google has its own continuous build & test system.
    Remember, at Google we develop everything at HEAD in the repo.
    Endless CPU, Cross User Caching, because of Cloud Computing.

    View full-size slide

  20. Confidential & Proprietary
    Google Cloud Platform 24
    Devops at Google
    Product idea
    Writing code
    Testing
    Building
    Deploying

    View full-size slide

  21. Each week Google launches over
    4 billion containers.
    Google is using container technology
    for more than 10 years.

    View full-size slide

  22. Enter the container
    Virtual machine
    OS
    Dependencies
    Application Code
    Hardware
    Bare-metal server
    OS
    Dependencies
    Application Code
    Hardware
    Container
    OS
    Dependencies
    Application Code
    Hardware

    View full-size slide

  23. Google Cloud Platform 27
    So, you mean Docker?
    2004 2016
    ● Docker is a popular software container platform.
    ● Containers are a way to package software in a
    format that can run isolated on a shared operating
    system.

    View full-size slide

  24. Enter the container… and new challenges
    ● Scheduling, scaling across clusters of servers
    ● Networking and connectivity
    ● Security and Access control
    ● Logging, Monitoring, and Debugging
    ● Health checks and uptime preservation
    ● ...

    View full-size slide

  25. Google Cloud Platform 29
    Large-scale cluster management at
    Google with Borg
    2004 2016
    ● It’s software that manages all production machines at Google and
    runs jobs (binaries) that engineers give it on them.
    ● Borg ran pretty much everything inside the company, including
    Google Search, Gmail, Google Maps, Google Docs...
    ● These binaries are run in a container environment.
    ● When tasks die, they are automatically started up again, and they
    may run on a different machine.

    View full-size slide

  26. Confidential & Proprietary
    Google Cloud Platform 30
    Site Reliability Engineering
    Product idea
    Writing code
    Testing
    Building
    Deploying
    SRE

    View full-size slide

  27. “Hope is not a strategy.
    Engineering solutions to design, build, and run large-scale
    systems scalably, reliably and efficiently is a strategy,
    and a good one.”

    View full-size slide

  28. 32
    Site Reliability Engineering
    ● Site Reliability Engineering is a specialized job
    function that focuses on the reliability and
    maintainability of large systems.
    ● SRE is also a mindset, and a set of engineering
    approaches to running better production systems
    ● Google has SRE teams of site reliability engineers
    responsible for a service globally available.
    https://landing.google.com/sre/book.html

    View full-size slide

  29. Confidential & Proprietary
    Google Cloud Platform 33
    Open Source
    Googlers contribute
    back to the community.

    View full-size slide

  30. 34
    Google is leader in Open Source
    287,024 Commits by Googlers
    to Open Source Projects
    on GitHub in 2016
    15,000+ Projects Contributed
    to in 2016

    View full-size slide

  31. 35
    Popular Google open source projects
    https://opensource.google.com

    View full-size slide

  32. 36
    Contributions to other popular open source projects and
    standards by Google

    View full-size slide

  33. 37
    https://research.google.com/
    Google wrote lots of white papers which inspires the
    big data community.
    ● Bigtable
    ● GFS
    ● Mapreduce
    ● Chubby
    ● Sawzall
    ● Dapper
    ● Dremel
    ● Borg

    View full-size slide

  34. Google Cloud Platform 38
    From Google to OSS
    2004 2016
    Internal Google
    ● Internal Build System
    ● Borg Container Orchestration
    ● Machine Learning
    ● Go Lang
    ● Google Chrome
    Open Source
    ● Bazel
    ● Kubernetes
    ● Tensorflow
    ● Go Lang
    ● Chromium

    View full-size slide

  35. 39
    Tensorflow
    Tensorflow is what we use for our own internal
    machine learning projects, and now it’s available
    to you!
    Google made it open source.
    More than 480 contributions
    10,000 commits in a year
    53k star rating
    Tutorials to get started at
    https://www.tensorflow.org

    View full-size slide

  36. Google Cloud Platform 40
    Bazel
    You will need a build system, if you work with teams.
    Google’s build system, is now available open source.
    Google has been working on this for more than 10 years.
    Now you can benefit from this.
    https://bazel.build/
    ● Scalable: Bazel helps you scale your organization,
    codebase and Continuous Integration system. It
    handles codebases of any size, in multiple
    repositories or a huge monorepo.
    ● Platform independent: Works on Cloud or On
    Premise.
    ● Any language: Build and test Java, C++, Android,
    iOS, Go and a wide variety of other language
    platforms (via extensions).

    View full-size slide

  37. 41
    Kubernetes abstracts away the hardware
    infrastructure and exposes your whole data center
    as a single enormous computing resource.
    ● Multiple container engines (Docker, rkt,
    Windows)
    ● Cloud and bare-metal environments
    ● Container Engine = Managed Kubernetes in
    Google Cloud
    Kubernetes
    https://kubernetes.io

    View full-size slide

  38. 42
    Kubernetes Open Source Community
    50k+ commits in
    Kubernetes
    1,000+ unique
    contributors
    Top 0.001% of all
    GitHub Projects
    4000+ External
    Projects Based on
    Kubernetes
    Companies
    Contributing
    Supported by a broad ecosystem of partners, offering you cloud provider flexibility:

    View full-size slide

  39. 43
    ● A complete framework for connecting, securing, managing and
    monitoring services
    ● Secure and monitor traffic for microservices and legacy services without
    requiring any changes to application code
    ● An open platform with key contributions from Google, IBM, Lyft and
    others
    ● Allows developers to authenticate and secure the communications
    between different applications using a TLS connection
    ● Multi-environment and multi-platform, but Kubernetes first
    Istio
    https://istio.io

    View full-size slide

  40. Istio benefits: enabling hybrid
    GKE on GCP VMs on GCE
    (or elsewhere)
    K8s on-prem Vendor-managed K8s.
    EKS? AKS?

    View full-size slide

  41. Google Cloud
    Google infrastructure
    for your company.
    Open Source

    View full-size slide

  42. 46
    Storage Compute

    View full-size slide

  43. Google Cloud Platform 47
    From OSS to Google Cloud
    2004 2016
    Open Source
    ● Kubernetes
    ● Istio
    ● Tensorflow
    ● MySQL / Postgresql
    ● Spark / Hadoop
    ● Apache Beam
    ● iPython
    Google Cloud
    ● Google Kubernetes Engine
    ● Managed Istio
    ● ML Engine
    ● Cloud SQL
    ● Dataproc
    ● Dataflow
    ● Datalab

    View full-size slide

  44. Then we got
    serious.
    We built our own
    hardware for AI.
    Cloud Machine
    Learning Engine

    View full-size slide

  45. Training a large-scale
    machine translation model
    on 32 GPUs
    on ⅛ of a TPU Pod

    View full-size slide

  46. Google Cloud Platform 50
    Learnings From Google to Google Cloud
    2004 2016
    Google
    ● Build for Scalability
    ● Build for Security
    Google Cloud
    ● Build for Enterprise
    ○ Secure
    ○ Scalable
    ○ Compliant

    View full-size slide

  47. Google Cloud Platform 51
    1+ Billion Users
    ● 2 trillion Google searches annually
    ● 65 billion downloads of apps from its Google
    Play store.
    ● More than 1 billion people are using the
    Chrome browser on mobile devices every
    month.
    ● 200 million people per month are using its
    online photo service, Google Photos.

    View full-size slide

  48. Underwater Fiber-optic Cables:
    Fast Network infrastructure

    View full-size slide

  49. Confidential & Proprietary
    Assessing Threats
    Who is the attacker?
    Lone-wolves
    Script kiddies
    Insider Risk
    Hacktivist groups
    Malicious users
    Criminal organizations
    Nation-state actors
    How are they attacking?
    DDoS
    Spear-phishing
    Malware
    XSS
    Man-in-the-middle
    User error
    Social
    0-days
    What do they want?
    $$$$$
    Intellectual property
    Espionage
    Vandalism
    Public perception
    Notoriety

    View full-size slide

  50. Confidential & Proprietary
    Usage Audit Logging Safe Browsing API BeyondCorp
    Security Key
    Enforcement
    Operations Compliance &
    Certifications
    Live Migration Infra
    maintenance & patching
    Threat analysis and
    intelligence
    Open Source
    Forensics tools
    Anomaly Detection
    (Infrastructure)
    Incident Response
    (Infrastructure)
    Deployment
    Google Services TLS
    encryption with perfect
    forward secrecy
    Certificate
    Authority
    Free and automatic
    certificates
    DDoS Mitigation
    (PaaS & SaaS)
    Application
    Peer code review
    & Static Analysis
    (Infrastructure SLDC)
    Source code
    provenance
    (Infrastructure)
    Binary
    Verification
    (Infrastructure code)
    WAF
    (PaaS & SaaS Use cases)
    IDS/ IPS
    (PaaS & SaaS Use cases)
    Web Application Scanner
    (Google Services)
    Network
    Infrastructure RPC
    encryption in transit
    between data centres
    DNS Global Private Network
    Andromeda SDN
    Controller
    Jupiter Datacenter
    Network
    B4 SDN Network
    Storage Encryption at rest Logging
    Identity and Access
    Management
    Global at scale Key
    Management Service
    OS + IPC Hardened
    KVM Hypervisor
    Authentication for each
    host and each job
    Curated Host Images
    Encryption of
    Interservice
    Communications
    Boot Trusted Boot
    Cryptographic
    Credentials
    Hardware Purpose-built
    Chips
    Purpose-built
    Servers
    Purpose-built
    Storage
    Purpose-built
    Network
    Purpose-built
    Data Centers
    Infrastructure security

    View full-size slide

  51. Confidential & Proprietary
    Hardware
    Hardware Infrastructure: Titan

    View full-size slide

  52. Confidential & Proprietary

    View full-size slide

  53. Confidential & Proprietary
    Secure yourself on Google Cloud By default
    Google products
    Partner tools
    Other
    Usage Cloud Audit Logging Safe Browsing API Identity-Aware Proxy
    Security Key
    Enforcement
    Operations Compliance and
    Certifications
    Automatic Updates
    and Patching
    Threat analysis and
    intelligence
    Forensics
    Anomaly detection
    Incident
    Response
    Deployment
    Google Services TLS
    encryption with perfect
    forward secrecy
    Certificate
    Authority
    Free and automatic
    certificates
    DDoS Mitigation via
    GCLB
    Alternative DDoS
    Mitigation Solutions
    Application Code review
    & Static Analysis
    Source code
    provenance
    Binary
    verification
    WAF
    IDS/ IPS
    Vuln Management
    Network Cloud DNS
    Cloud VPN
    Virtual Private Cloud
    (VPC)
    Cloud Router
    Shared VPC NGFW
    Storage Encryption at rest Logging
    Identity and Access
    Management
    Cloud Key Management
    Service
    Customer-Supplied
    Encryption Keys
    Data Loss Protection API
    OS + IPC Hardened
    KVM Hypervisor
    Authentication
    for each host
    and each job
    Curated Host Images
    Encryption of
    Interservice
    Communications
    Boot Trusted Boot
    Cryptographic
    Credentials
    Hardware Purpose-built
    Chips
    Purpose-built Servers Purpose-built Storage Purpose-built Network
    Purpose-built
    Data Centers
    Login anomalies for
    Google Identities
    Google Managed Infrastructure Foundation
    Threat Intelligence
    CDN
    Cloud Load
    Balancing
    Web Application
    Scanning
    DLP
    Secure Config/
    Assessment/
    Enforcement

    View full-size slide

  54. 58
    Google has over a
    decade experience
    with building secure
    software on large
    scale.
    Conclusion
    Your company can
    make use of the
    same infrastructure
    like Google does.
    Scalable, Secure and
    Open.
    The learnings are
    shared through
    whitepapers and
    contributed back
    through open source.

    View full-size slide