Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Google's Production Environment

Google's Production Environment

Google datacenters are very different from most conventional datacenters and small-scale server farms. These differences present both extra problems and opportunities. This talk discusses the challenges and opportunities that characterize Google datacenters. It is based on the Google SRE book: https://landing.google.com/sre/sre-book/chapters/production-environment/

Florian Rathgeber

February 08, 2020
Tweet

More Decks by Florian Rathgeber

Other Decks in Technology

Transcript

  1. #GoogleSandbox Florian Site Reliability Engineer Google Cloud SRE for 2+

    years • On the Cloud Console SRE team • Spend most of my time on SLOs Previous life • Computational Scientist @ Imperial College • Data Engineer @ ECMWF Co-founded PyData London
  2. #GoogleSandbox • B4 • Edge Network • GSLB • Jupiter

    Google Data Centers • GFE Current Regions & Number of Zones Future Regions & Number of Zones https://cloud.google.com/about/locations/
  3. #GoogleSandbox • Campus • Data center • Cluster • Row

    • Rack • Machine Data Center Setup
  4. #GoogleSandbox Scheduler BorgMaster Persistent store Cluster Config Files Tools Borglet

    Borglet Borglet BNS addresses: /bns/<cluster>/<user>/<job name>/<task number> Cluster Management
  5. #GoogleSandbox Chubby Consistent data, e.g. - BNS paths->IP addresses -

    master election Chubby Chubby Paxos Paxos Cluster Cluster Cluster Lock Service
  6. #GoogleSandbox Server Scraping Borgmon Cluster Borgmon Cluster Scraping Borgmon Cluster

    Borgmon Cluster Scraping Borgmon Cluster Borgmon Global Borgmon Cluster Time Series Database Alert Manager 1 Prober Server Prober Server Prober Data Alerts Monitoring
  7. #GoogleSandbox Service Client Client Stubby Server Stubby Stub Stubby Stub

    C++ Java Ruby protobuf request protobuf response protobuf request protobuf response Server Communication
  8. #GoogleSandbox Piper Code Repository Author Changelist Reviewer Looks Good To

    Me Owner Approval Presubmit Checks OK! submit...done! change Code Repository
  9. #GoogleSandbox MPM Piper Code Repository Blaze Continuous Testing Framework Binaries

    Tests PASS FAIL PASS ... Rapid Sisyphus Production Continuous Build and Deployment
  10. #GoogleSandbox Tying it all together... • Develop the software: Piper,

    Blaze • Build the MPMs: Rapid • Run it in a cluster: Borg, which uses Chubby • Route requests/responses: GFE, GSLB, ProtoBuf, Stubby • Store and read messages: Colossus, Bigtable, Spanner • Monitor and fire alerts: Borgmon • Roll out new versions: Sisyphus
  11. #GoogleSandbox • Cluster management: Kubernetes kubernetes.io • Lock service: ZooKeeper

    zookeeper.apache.org, etcd coreos.com/etcd • Storage: HDFS hadoop.apache.org, Cassandra cassandra.apache.org • Monitoring: Prometheus prometheus.io • RPC: gRPC grpc.io • Data serialization: Protocol Buffers developers.google.com/protocol-buffers • Google style guides github.com/google/styleguide • The Go programming language golang.org • Code repository: Git git-scm.com • Code review: Rietveld github.com/rietveld-codereview/rietveld • Building: Bazel bazel.io List of related open-source projects
  12. #GoogleSandbox Cover images used with permission. These books can be

    found on shop.oreilly.com The full text of the Google SRE Books are available at www.google.com/sre