Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building software at Google Scale

Building software at Google Scale

April 12. CSA event at Google.

Lee Boonstra

April 13, 2018
Tweet

More Decks by Lee Boonstra

Other Decks in Technology

Transcript

  1. Confidential & Proprietary
    Google Cloud Platform 1
    By Lee Boonstra, Customer Engineer Google Cloud
    [email protected]
    Building software at Google Scale
    How does Google build software & How can you benefit from this

    View Slide

  2. 2
    1. How the
    engineering
    processes at
    Google works
    Engineering at Google
    3. From open
    source to Google
    Cloud for
    enterprises
    2. Our learnings,
    how we contribute
    back to open
    source

    View Slide

  3. Confidential & Proprietary
    Google Cloud Platform 3
    Building software at Google

    View Slide

  4. Confidential & Proprietary
    Google Cloud Platform 4
    From product to idea 10x
    Product idea X 10

    View Slide

  5. View Slide

  6. 6
    “To organize the world’s information and make it
    universally accessible and useful.”
    - Google

    View Slide

  7. Project Loon:
    Balloon powered internet for everyone!

    View Slide

  8. Waymo:
    Self driving car

    View Slide

  9. View Slide

  10. Prototyping: First version of Google class was created in 90 min!

    View Slide

  11. Dogfood

    View Slide

  12. Confidential & Proprietary
    Google Cloud Platform 12
    Code Development
    Product idea
    Writing code
    public class foo {}

    View Slide

  13. Google Cloud Platform 13
    What it takes to be a Google engineer
    Working on problems with SPEED AND SCALE is a challenge.
    Engineers keep raising the bar on the tools and infrastructure.
    Google Culture:
    • Collaboration and co-development
    • Sharing between products and teams (tools, libraries, services)
    • Engineers have autonomy.
    • Agile/Scrum, daily stand-up meetings

    View Slide

  14. Google’s entire codebase is a
    giant single repository of more
    than 2 billion lines of code

    View Slide

  15. Google Repository statistics
    As of Jan 2015
    Total number of files 1+ billion
    Number of source files 9 million
    Lines of code 2+ billion
    Depth of history 35 million commits
    Size of content 86 terabytes

    View Slide

  16. View Slide

  17. Google Cloud Platform 17
    Advantages of monolithic repo
    ● Unified versioning - One source of truth
    ● Extensive code sharing and reuse
    ● Collaboration across teams
    ● Simplified dependency management
    ● Large scale refactoring
    ● Flexible team boundaries & code
    ownership
    ● Code visibility

    View Slide

  18. Google Cloud Platform 18
    Automated Test / Analysis
    Google uses its own version control
    system called: Piper
    Sync
    workspace
    Write
    code
    Code
    Review
    Commit
    Read/Write Access per folder Code Quality & Syntax Check
    (by humans and by tooling)
    Create personal copy
    Auto Rollback if needed
    MANDATORY
    A single code tree, with fast access to the code through tooling.
    All types of code languages.
    Everyone, works in Trunk. - Branches are for releases.

    View Slide

  19. Confidential & Proprietary
    Google Cloud Platform 19
    Software testing
    Product idea
    Writing code
    Testing

    View Slide

  20. Google Cloud Platform 20
    Testing at Google
    ● Developing & Testing go hand in hand
    ● 3 million test a day
    ● 20+ OS and Browser combos

    View Slide

  21. Confidential & Proprietary
    Google Cloud Platform 21
    Build processes
    Product idea
    Writing code
    Testing
    Building

    View Slide

  22. Google Cloud Platform 22
    Build systems
    Why do we need build systems?
    Well code has a lot of dependencies
    and you don’t want to compile and link
    these all manually.
    The steps of a general build system:
    1. Loading
    2. Analysis
    3. Execution by build system

    View Slide

  23. Google Cloud Platform 23
    Google’s continuous build and test system
    Google has its own continuous build & test system.
    Remember, at Google we develop everything at HEAD in the repo.
    Endless CPU, Cross User Caching, because of Cloud Computing.

    View Slide

  24. Confidential & Proprietary
    Google Cloud Platform 24
    Devops at Google
    Product idea
    Writing code
    Testing
    Building
    Deploying

    View Slide

  25. Each week Google launches over
    4 billion containers.
    Google is using container technology
    for more than 10 years.

    View Slide

  26. Enter the container
    Virtual machine
    OS
    Dependencies
    Application Code
    Hardware
    Bare-metal server
    OS
    Dependencies
    Application Code
    Hardware
    Container
    OS
    Dependencies
    Application Code
    Hardware

    View Slide

  27. Google Cloud Platform 27
    So, you mean Docker?
    2004 2016
    ● Docker is a popular software container platform.
    ● Containers are a way to package software in a
    format that can run isolated on a shared operating
    system.

    View Slide

  28. Enter the container… and new challenges
    ● Scheduling, scaling across clusters of servers
    ● Networking and connectivity
    ● Security and Access control
    ● Logging, Monitoring, and Debugging
    ● Health checks and uptime preservation
    ● ...

    View Slide

  29. Google Cloud Platform 29
    Large-scale cluster management at
    Google with Borg
    2004 2016
    ● It’s software that manages all production machines at Google and
    runs jobs (binaries) that engineers give it on them.
    ● Borg ran pretty much everything inside the company, including
    Google Search, Gmail, Google Maps, Google Docs...
    ● These binaries are run in a container environment.
    ● When tasks die, they are automatically started up again, and they
    may run on a different machine.

    View Slide

  30. Confidential & Proprietary
    Google Cloud Platform 30
    Site Reliability Engineering
    Product idea
    Writing code
    Testing
    Building
    Deploying
    SRE

    View Slide

  31. “Hope is not a strategy.
    Engineering solutions to design, build, and run large-scale
    systems scalably, reliably and efficiently is a strategy,
    and a good one.”

    View Slide

  32. 32
    Site Reliability Engineering
    ● Site Reliability Engineering is a specialized job
    function that focuses on the reliability and
    maintainability of large systems.
    ● SRE is also a mindset, and a set of engineering
    approaches to running better production systems
    ● Google has SRE teams of site reliability engineers
    responsible for a service globally available.
    https://landing.google.com/sre/book.html

    View Slide

  33. Confidential & Proprietary
    Google Cloud Platform 33
    Open Source
    Googlers contribute
    back to the community.

    View Slide

  34. 34
    Google is leader in Open Source
    287,024 Commits by Googlers
    to Open Source Projects
    on GitHub in 2016
    15,000+ Projects Contributed
    to in 2016

    View Slide

  35. 35
    Popular Google open source projects
    https://opensource.google.com

    View Slide

  36. 36
    Contributions to other popular open source projects and
    standards by Google

    View Slide

  37. 37
    https://research.google.com/
    Google wrote lots of white papers which inspires the
    big data community.
    ● Bigtable
    ● GFS
    ● Mapreduce
    ● Chubby
    ● Sawzall
    ● Dapper
    ● Dremel
    ● Borg

    View Slide

  38. Google Cloud Platform 38
    From Google to OSS
    2004 2016
    Internal Google
    ● Internal Build System
    ● Borg Container Orchestration
    ● Machine Learning
    ● Go Lang
    ● Google Chrome
    Open Source
    ● Bazel
    ● Kubernetes
    ● Tensorflow
    ● Go Lang
    ● Chromium

    View Slide

  39. 39
    Tensorflow
    Tensorflow is what we use for our own internal
    machine learning projects, and now it’s available
    to you!
    Google made it open source.
    More than 480 contributions
    10,000 commits in a year
    53k star rating
    Tutorials to get started at
    https://www.tensorflow.org

    View Slide

  40. Google Cloud Platform 40
    Bazel
    You will need a build system, if you work with teams.
    Google’s build system, is now available open source.
    Google has been working on this for more than 10 years.
    Now you can benefit from this.
    https://bazel.build/
    ● Scalable: Bazel helps you scale your organization,
    codebase and Continuous Integration system. It
    handles codebases of any size, in multiple
    repositories or a huge monorepo.
    ● Platform independent: Works on Cloud or On
    Premise.
    ● Any language: Build and test Java, C++, Android,
    iOS, Go and a wide variety of other language
    platforms (via extensions).

    View Slide

  41. 41
    Kubernetes abstracts away the hardware
    infrastructure and exposes your whole data center
    as a single enormous computing resource.
    ● Multiple container engines (Docker, rkt,
    Windows)
    ● Cloud and bare-metal environments
    ● Container Engine = Managed Kubernetes in
    Google Cloud
    Kubernetes
    https://kubernetes.io

    View Slide

  42. 42
    Kubernetes Open Source Community
    50k+ commits in
    Kubernetes
    1,000+ unique
    contributors
    Top 0.001% of all
    GitHub Projects
    4000+ External
    Projects Based on
    Kubernetes
    Companies
    Contributing
    Supported by a broad ecosystem of partners, offering you cloud provider flexibility:

    View Slide

  43. 43
    ● A complete framework for connecting, securing, managing and
    monitoring services
    ● Secure and monitor traffic for microservices and legacy services without
    requiring any changes to application code
    ● An open platform with key contributions from Google, IBM, Lyft and
    others
    ● Allows developers to authenticate and secure the communications
    between different applications using a TLS connection
    ● Multi-environment and multi-platform, but Kubernetes first
    Istio
    https://istio.io

    View Slide

  44. Istio benefits: enabling hybrid
    GKE on GCP VMs on GCE
    (or elsewhere)
    K8s on-prem Vendor-managed K8s.
    EKS? AKS?

    View Slide

  45. Google Cloud
    Google infrastructure
    for your company.
    Open Source

    View Slide

  46. 46
    Storage Compute

    View Slide

  47. Google Cloud Platform 47
    From OSS to Google Cloud
    2004 2016
    Open Source
    ● Kubernetes
    ● Istio
    ● Tensorflow
    ● MySQL / Postgresql
    ● Spark / Hadoop
    ● Apache Beam
    ● iPython
    Google Cloud
    ● Google Kubernetes Engine
    ● Managed Istio
    ● ML Engine
    ● Cloud SQL
    ● Dataproc
    ● Dataflow
    ● Datalab

    View Slide

  48. Then we got
    serious.
    We built our own
    hardware for AI.
    Cloud Machine
    Learning Engine

    View Slide

  49. Training a large-scale
    machine translation model
    on 32 GPUs
    on ⅛ of a TPU Pod

    View Slide

  50. Google Cloud Platform 50
    Learnings From Google to Google Cloud
    2004 2016
    Google
    ● Build for Scalability
    ● Build for Security
    Google Cloud
    ● Build for Enterprise
    ○ Secure
    ○ Scalable
    ○ Compliant

    View Slide

  51. Google Cloud Platform 51
    1+ Billion Users
    ● 2 trillion Google searches annually
    ● 65 billion downloads of apps from its Google
    Play store.
    ● More than 1 billion people are using the
    Chrome browser on mobile devices every
    month.
    ● 200 million people per month are using its
    online photo service, Google Photos.

    View Slide

  52. Underwater Fiber-optic Cables:
    Fast Network infrastructure

    View Slide

  53. Confidential & Proprietary
    Assessing Threats
    Who is the attacker?
    Lone-wolves
    Script kiddies
    Insider Risk
    Hacktivist groups
    Malicious users
    Criminal organizations
    Nation-state actors
    How are they attacking?
    DDoS
    Spear-phishing
    Malware
    XSS
    Man-in-the-middle
    User error
    Social
    0-days
    What do they want?
    $$$$$
    Intellectual property
    Espionage
    Vandalism
    Public perception
    Notoriety

    View Slide

  54. Confidential & Proprietary
    Usage Audit Logging Safe Browsing API BeyondCorp
    Security Key
    Enforcement
    Operations Compliance &
    Certifications
    Live Migration Infra
    maintenance & patching
    Threat analysis and
    intelligence
    Open Source
    Forensics tools
    Anomaly Detection
    (Infrastructure)
    Incident Response
    (Infrastructure)
    Deployment
    Google Services TLS
    encryption with perfect
    forward secrecy
    Certificate
    Authority
    Free and automatic
    certificates
    DDoS Mitigation
    (PaaS & SaaS)
    Application
    Peer code review
    & Static Analysis
    (Infrastructure SLDC)
    Source code
    provenance
    (Infrastructure)
    Binary
    Verification
    (Infrastructure code)
    WAF
    (PaaS & SaaS Use cases)
    IDS/ IPS
    (PaaS & SaaS Use cases)
    Web Application Scanner
    (Google Services)
    Network
    Infrastructure RPC
    encryption in transit
    between data centres
    DNS Global Private Network
    Andromeda SDN
    Controller
    Jupiter Datacenter
    Network
    B4 SDN Network
    Storage Encryption at rest Logging
    Identity and Access
    Management
    Global at scale Key
    Management Service
    OS + IPC Hardened
    KVM Hypervisor
    Authentication for each
    host and each job
    Curated Host Images
    Encryption of
    Interservice
    Communications
    Boot Trusted Boot
    Cryptographic
    Credentials
    Hardware Purpose-built
    Chips
    Purpose-built
    Servers
    Purpose-built
    Storage
    Purpose-built
    Network
    Purpose-built
    Data Centers
    Infrastructure security

    View Slide

  55. Confidential & Proprietary
    Hardware
    Hardware Infrastructure: Titan

    View Slide

  56. Confidential & Proprietary

    View Slide

  57. Confidential & Proprietary
    Secure yourself on Google Cloud By default
    Google products
    Partner tools
    Other
    Usage Cloud Audit Logging Safe Browsing API Identity-Aware Proxy
    Security Key
    Enforcement
    Operations Compliance and
    Certifications
    Automatic Updates
    and Patching
    Threat analysis and
    intelligence
    Forensics
    Anomaly detection
    Incident
    Response
    Deployment
    Google Services TLS
    encryption with perfect
    forward secrecy
    Certificate
    Authority
    Free and automatic
    certificates
    DDoS Mitigation via
    GCLB
    Alternative DDoS
    Mitigation Solutions
    Application Code review
    & Static Analysis
    Source code
    provenance
    Binary
    verification
    WAF
    IDS/ IPS
    Vuln Management
    Network Cloud DNS
    Cloud VPN
    Virtual Private Cloud
    (VPC)
    Cloud Router
    Shared VPC NGFW
    Storage Encryption at rest Logging
    Identity and Access
    Management
    Cloud Key Management
    Service
    Customer-Supplied
    Encryption Keys
    Data Loss Protection API
    OS + IPC Hardened
    KVM Hypervisor
    Authentication
    for each host
    and each job
    Curated Host Images
    Encryption of
    Interservice
    Communications
    Boot Trusted Boot
    Cryptographic
    Credentials
    Hardware Purpose-built
    Chips
    Purpose-built Servers Purpose-built Storage Purpose-built Network
    Purpose-built
    Data Centers
    Login anomalies for
    Google Identities
    Google Managed Infrastructure Foundation
    Threat Intelligence
    CDN
    Cloud Load
    Balancing
    Web Application
    Scanning
    DLP
    Secure Config/
    Assessment/
    Enforcement

    View Slide

  58. 58
    Google has over a
    decade experience
    with building secure
    software on large
    scale.
    Conclusion
    Your company can
    make use of the
    same infrastructure
    like Google does.
    Scalable, Secure and
    Open.
    The learnings are
    shared through
    whitepapers and
    contributed back
    through open source.

    View Slide