Deep Dive into Google Cloud Technology

Deep Dive into Google Cloud Technology

for YAPC Asia Tokyo 2015

0cf0e64940658884ce5d88e10dfb2409?s=128

GoogleCloudPlatformJapan

August 21, 2015
Tweet

Transcript

  1. 2.

    +Kazunori Sato @kazunori_279 Kazunori Sato Developer Advocate, Cloud Platform, Google

    Inc. Cloud community advocacy Cloud product launch support
  2. 3.

    is:

  3. 4.
  4. 6.

    2012 2015 MapReduce Spanner 2003 2006 2010 2011 GFS Borg

    Colossus Dremel Bigtable Chubby 2004
  5. 7.
  6. 12.

    At Google, MapReduce is classic. We use Dremel, FlumeJava and

    Millwheel. Confidential & Proprietary Google Cloud Platform 12
  7. 13.

    Dremel MillWheel FlumeJava MapReduce 2012 2013 2002 2004 2006 2008

    2010 GFS The World Beyond MapReduce Cloud Dataflow BigQuery
  8. 14.
  9. 15.

    Confidential & Proprietary Google Cloud Platform 15 Google BigQuery Demo:

    RegEx + GROUP BY on 100 B rows response read RegEx 100 B shuffled 278 GB ~10 sec 4 TB
  10. 17.

    Mixer 0 Mixer 1 Mixer 1 Shard Shard Shard Shard

    Colossus SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state
  11. 18.

    Dremel Shard Dremel Shard Dremel Shard Super fast Shuffling: 1B

    x 1B JOIN in 30 sec. How? Dremel Shard Dremel Shard Dremel Shard ?
  12. 19.

    But Dremel is not a silver bullet. What about complex

    batch or real-time stream processing? Confidential & Proprietary Google Cloud Platform 19
  13. 20.
  14. 21.
  15. 22.

    Cloud Dataflow: FlumeJava + MillWheel Fully Managed & Optimized Super

    fast shuffling Exactly-Once pipeline Batch + Streaming
  16. 23.

    Example: Autocomplete Tweets Predictions read #argentina scores, my #art project,

    watching #armenia vs #argentina ExtractTags #argentina #art #armenia #argentina Count (argentina, 5M) (art, 9M) (armenia, 2M) ExpandPrefixes a->(argentina,5M) ar->(argentina,5M) arg->(argentina,5M) ar->(art, 9M) ... Top(3) write a->[apple, art, argentina] ar->[art, argentina, armenia] .apply(TextIO.Read.from(...)) .apply(ParDo.of(new ExtractTags())) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3) Pipeline p = Pipeline.create(); p.begin(); .apply(TextIO.Write.to(...)); p.run()
  17. 24.

    Streaming made easy Pipeline p = Pipeline.create(new PipelineOptions()); p.begin() .apply(PubsubIO.Read.topic(“input_topic”))

    .apply(Window.into(SlidingWindows.of( Duration.standardMinutes(60))) .apply(ParDo.of(new ExtractTags())) .apply(Count.perElement()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(PubsubIO.Write.topic(“output_topic”)); p.run();
  18. 27.
  19. 28.

    Borg No VMs, pure containers Manages 10K machines / Cell

    DC-scale proactive job sched (CPU, mem, disk IO, TCP ports) Paxos-based metadata store
  20. 32.

    Kubelet Kubelet Kubelet Kubelet Kubernetes Master Replication Controller Scheduler API

    Server Kube-UI Kubernetes (k8s) 1.0 is open source, Docker based Google Container Engine is a fully managed k8s
  21. 37.
  22. 38.

    Jupiter network 40 G ports 10 G x 100 K

    = 1 Pbps total CLOS topology Software Defined Network
  23. 40.

    GCE Load Balancer One Global IP, Multi-region LB/fail-over, 1 M

    req/s EU Asia US VMs VMs VMs 11.22.33.44
  24. 42.
  25. 43.
  26. 44.
  27. 45.
  28. 46.
  29. 49.

    The Google Cloud Technology: Summary Big Data: the World beyond

    MapReduce Container: from Borg to k8s 1 2 3 Networking: Google is the Network 4 The Future: is Now
  30. 50.