Slide 1

Slide 1 text

Deep Dive into Google Cloud Technology YAPC Asia Tokyo 2015 #yapcasia #yapcasiaD

Slide 2

Slide 2 text

+Kazunori Sato @kazunori_279 Kazunori Sato Developer Advocate, Cloud Platform, Google Inc. Cloud community advocacy Cloud product launch support

Slide 3

Slide 3 text

is:

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Enterprise

Slide 6

Slide 6 text

2012 2015 MapReduce Spanner 2003 2006 2010 2011 GFS Borg Colossus Dremel Bigtable Chubby 2004

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Building what’s next 8

Slide 9

Slide 9 text

The Google Cloud Technology Big Data Container 1 2 3 Networking 4 The Future

Slide 10

Slide 10 text

Confidential & Proprietary Google Cloud Platform 10 Big Data

Slide 11

Slide 11 text

Confidential & Proprietary Google Cloud Platform 11 1 B 1 B 100 B 900 M

Slide 12

Slide 12 text

At Google, MapReduce is classic. We use Dremel, FlumeJava and Millwheel. Confidential & Proprietary Google Cloud Platform 12

Slide 13

Slide 13 text

Dremel MillWheel FlumeJava MapReduce 2012 2013 2002 2004 2006 2008 2010 GFS The World Beyond MapReduce Cloud Dataflow BigQuery

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Confidential & Proprietary Google Cloud Platform 15 Google BigQuery Demo: RegEx + GROUP BY on 100 B rows response read RegEx 100 B shuffled 278 GB ~10 sec 4 TB

Slide 16

Slide 16 text

SELECT your_data FROM billions_of_rows WHERE full_disk_scan_required = true; Scanning 1 TB in 1 sec with 5,000 - 10,000 disk spindles

Slide 17

Slide 17 text

Mixer 0 Mixer 1 Mixer 1 Shard Shard Shard Shard Colossus SELECT state, year COUNT(*) GROUP BY state WHERE year >= 1980 and year < 1990 ORDER BY count_babies DESC LIMIT 10 COUNT(*) GROUP BY state

Slide 18

Slide 18 text

Dremel Shard Dremel Shard Dremel Shard Super fast Shuffling: 1B x 1B JOIN in 30 sec. How? Dremel Shard Dremel Shard Dremel Shard ?

Slide 19

Slide 19 text

But Dremel is not a silver bullet. What about complex batch or real-time stream processing? Confidential & Proprietary Google Cloud Platform 19

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Cloud Dataflow: FlumeJava + MillWheel Fully Managed & Optimized Super fast shuffling Exactly-Once pipeline Batch + Streaming

Slide 23

Slide 23 text

Example: Autocomplete Tweets Predictions read #argentina scores, my #art project, watching #armenia vs #argentina ExtractTags #argentina #art #armenia #argentina Count (argentina, 5M) (art, 9M) (armenia, 2M) ExpandPrefixes a->(argentina,5M) ar->(argentina,5M) arg->(argentina,5M) ar->(art, 9M) ... Top(3) write a->[apple, art, argentina] ar->[art, argentina, armenia] .apply(TextIO.Read.from(...)) .apply(ParDo.of(new ExtractTags())) .apply(Count.create()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3) Pipeline p = Pipeline.create(); p.begin(); .apply(TextIO.Write.to(...)); p.run()

Slide 24

Slide 24 text

Streaming made easy Pipeline p = Pipeline.create(new PipelineOptions()); p.begin() .apply(PubsubIO.Read.topic(“input_topic”)) .apply(Window.into(SlidingWindows.of( Duration.standardMinutes(60))) .apply(ParDo.of(new ExtractTags())) .apply(Count.perElement()) .apply(ParDo.of(new ExpandPrefixes()) .apply(Top.largestPerKey(3)) .apply(PubsubIO.Write.topic(“output_topic”)); p.run();

Slide 25

Slide 25 text

Confidential & Proprietary Google Cloud Platform 25 Container

Slide 26

Slide 26 text

Google confidential | Do not distribute Every Google service runs on Borg 2 B containers every week

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Borg No VMs, pure containers Manages 10K machines / Cell DC-scale proactive job sched (CPU, mem, disk IO, TCP ports) Paxos-based metadata store

Slide 29

Slide 29 text

one machine

Slide 30

Slide 30 text

Google App Engine: Borg for Everyone

Slide 31

Slide 31 text

Google confidential | Do not distribute start up in ~40 ms scales to 1 M req/s

Slide 32

Slide 32 text

Kubelet Kubelet Kubelet Kubelet Kubernetes Master Replication Controller Scheduler API Server Kube-UI Kubernetes (k8s) 1.0 is open source, Docker based Google Container Engine is a fully managed k8s

Slide 33

Slide 33 text

Google Cloud Storage Nearline is cheap. because Disk IO (not space) is the cost.

Slide 34

Slide 34 text

Confidential & Proprietary Google Cloud Platform 34 Networking

Slide 35

Slide 35 text

We build our network from scratch.

Slide 36

Slide 36 text

82 Tbps 1.3 Pbps

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Jupiter network 40 G ports 10 G x 100 K = 1 Pbps total CLOS topology Software Defined Network

Slide 39

Slide 39 text

Inter-zone iperf speed: 9G bps Inter-region private network: by default Google Compute Engine

Slide 40

Slide 40 text

GCE Load Balancer One Global IP, Multi-region LB/fail-over, 1 M req/s EU Asia US VMs VMs VMs 11.22.33.44

Slide 41

Slide 41 text

Confidential & Proprietary Google Cloud Platform 41 The Future

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

Vision API for Google Play services

Slide 48

Slide 48 text

Confidential & Proprietary Google Cloud Platform 48 ...and More 48

Slide 49

Slide 49 text

The Google Cloud Technology: Summary Big Data: the World beyond MapReduce Container: from Borg to k8s 1 2 3 Networking: Google is the Network 4 The Future: is Now

Slide 50

Slide 50 text

Thank you