Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Dive into Google Cloud Technology

Deep Dive into Google Cloud Technology

for YAPC Asia Tokyo 2015

0cf0e64940658884ce5d88e10dfb2409?s=128

GoogleCloudPlatformJapan

August 21, 2015
Tweet

Transcript

  1. Deep Dive into Google Cloud Technology YAPC Asia Tokyo 2015

    #yapcasia #yapcasiaD
  2. +Kazunori Sato @kazunori_279 Kazunori Sato Developer Advocate, Cloud Platform, Google

    Inc. Cloud community advocacy Cloud product launch support
  3. is:

  4. None
  5. Enterprise

  6. 2012 2015 MapReduce Spanner 2003 2006 2010 2011 GFS Borg

    Colossus Dremel Bigtable Chubby 2004
  7. None
  8. Building what’s next 8

  9. The Google Cloud Technology Big Data Container 1 2 3

    Networking 4 The Future
  10. Confidential & Proprietary Google Cloud Platform 10 Big Data

  11. Confidential & Proprietary Google Cloud Platform 11 1 B 1

    B 100 B 900 M
  12. At Google, MapReduce is classic. We use Dremel, FlumeJava and

    Millwheel. Confidential & Proprietary Google Cloud Platform 12
  13. Dremel MillWheel FlumeJava MapReduce 2012 2013 2002 2004 2006 2008

    2010 GFS The World Beyond MapReduce Cloud Dataflow BigQuery
  14. None
  15. Confidential & Proprietary Google Cloud Platform 15 Google BigQuery Demo:

    RegEx + GROUP BY on 100 B rows response read RegEx 100 B shuffled 278 GB ~10 sec 4 TB
  16. SELECT your_data FROM billions_of_rows WHERE full_disk_scan_required = true; Scanning 1

    TB in 1 sec with 5,000 - 10,000 disk spindles
  17. Mixer 0 Mixer 1 Mixer 1 Shard Shard Shard Shard

    Colossus SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state
  18. Dremel Shard Dremel Shard Dremel Shard Super fast Shuffling: 1B

    x 1B JOIN in 30 sec. How? Dremel Shard Dremel Shard Dremel Shard ?
  19. But Dremel is not a silver bullet. What about complex

    batch or real-time stream processing? Confidential & Proprietary Google Cloud Platform 19
  20. None
  21. None
  22. Cloud Dataflow: FlumeJava + MillWheel Fully Managed & Optimized Super

    fast shuffling Exactly-Once pipeline Batch + Streaming
  23. Example: Autocomplete Tweets Predictions read #argentina scores, my #art project,

    watching #armenia vs #argentina ExtractTags #argentina #art #armenia #argentina Count (argentina, 5M) (art, 9M) (armenia, 2M) ExpandPrefixes a->(argentina,5M) ar->(argentina,5M) arg->(argentina,5M) ar->(art, 9M) ... Top(3) write a->[apple, art, argentina] ar->[art, argentina, armenia] .apply(TextIO.Read.from(...)) .apply(ParDo.of(new ExtractTags())) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3) Pipeline p = Pipeline.create(); p.begin(); .apply(TextIO.Write.to(...)); p.run()
  24. Streaming made easy Pipeline p = Pipeline.create(new PipelineOptions()); p.begin() .apply(PubsubIO.Read.topic(“input_topic”))

    .apply(Window.into(SlidingWindows.of( Duration.standardMinutes(60))) .apply(ParDo.of(new ExtractTags())) .apply(Count.perElement()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(PubsubIO.Write.topic(“output_topic”)); p.run();
  25. Confidential & Proprietary Google Cloud Platform 25 Container

  26. Google confidential | Do not distribute Every Google service runs

    on Borg 2 B containers every week
  27. None
  28. Borg No VMs, pure containers Manages 10K machines / Cell

    DC-scale proactive job sched (CPU, mem, disk IO, TCP ports) Paxos-based metadata store
  29. one machine

  30. Google App Engine: Borg for Everyone

  31. Google confidential | Do not distribute start up in ~40

    ms scales to 1 M req/s
  32. Kubelet Kubelet Kubelet Kubelet Kubernetes Master Replication Controller Scheduler API

    Server Kube-UI Kubernetes (k8s) 1.0 is open source, Docker based Google Container Engine is a fully managed k8s
  33. Google Cloud Storage Nearline is cheap. because Disk IO (not

    space) is the cost.
  34. Confidential & Proprietary Google Cloud Platform 34 Networking

  35. We build our network from scratch.

  36. 82 Tbps 1.3 Pbps

  37. None
  38. Jupiter network 40 G ports 10 G x 100 K

    = 1 Pbps total CLOS topology Software Defined Network
  39. Inter-zone iperf speed: 9G bps Inter-region private network: by default

    Google Compute Engine
  40. GCE Load Balancer One Global IP, Multi-region LB/fail-over, 1 M

    req/s EU Asia US VMs VMs VMs 11.22.33.44
  41. Confidential & Proprietary Google Cloud Platform 41 The Future

  42. None
  43. None
  44. None
  45. None
  46. None
  47. Vision API for Google Play services

  48. Confidential & Proprietary Google Cloud Platform 48 ...and More 48

  49. The Google Cloud Technology: Summary Big Data: the World beyond

    MapReduce Container: from Borg to k8s 1 2 3 Networking: Google is the Network 4 The Future: is Now
  50. Thank you