Slide 1

Slide 1 text

MongoDB and Google Cloud Platform Sandeep Parikh Head of Solutions, Americas East @crcsmnky

Slide 2

Slide 2 text

Agenda MongoDB Deployment Architectures The Finer Points of Configuration Deploying MongoDB on Google Cloud Platform Integrating with Google Cloud Platform

Slide 3

Slide 3 text

Google Cloud Platform 3 MongoDB Deployment Architectures

Slide 4

Slide 4 text

Google Cloud Platform 4 Replica Sets Across Zones

Slide 5

Slide 5 text

Google Cloud Platform 5 Replica Sets Across Regions

Slide 6

Slide 6 text

Google Cloud Platform 6 Sharded Cluster Across Regions

Slide 7

Slide 7 text

Google Cloud Platform 7 The Finer Points of Configuration

Slide 8

Slide 8 text

Google Cloud Platform 8 Mostly available single CPU 3.75 GB of RAM per vCPU Higher CPU relative to memory 0.9 GB of RAM per vCPU Higher memory per core 6.5 GB of RAM per vCPU Machine Types Standard High Memory High Compute Shared Core Custom Balanced CPU and memory configurations 3.75 GB of RAM per vCPU Independently scale CPU and RAM Max 6.5 GB of RAM per vCPU

Slide 9

Slide 9 text

Google Cloud Platform 9 Mostly available single CPU 3.75 GB of RAM per vCPU Higher CPU relative to memory 0.9 GB of RAM per vCPU Higher memory per core 6.5 GB of RAM per vCPU Machine Types Standard High Memory High Compute Shared Core Custom Balanced CPU and memory configurations 3.75 GB of RAM per vCPU Independently scale CPU and RAM Max 6.5 GB of RAM per vCPU Good for getting started Best for MongoDB workloads Skip it, you probably don’t need the compute Configure one that fits your working set g1-small suitable for Arbiter, maybe

Slide 10

Slide 10 text

Google Cloud Platform 10 Disks Standard SSD Local SSD Persistent storage Max 3,000 Read Max 15,000 Write $0.04 per GB Up to 64 TB Persistent storage Max 15,000 Read Max 15,000 Write $0.17 per GB Up to 64 TB Ephemeral storage Max 680,000 Read Max 360,000 Write $0.218 per GB 375 GB only

Slide 11

Slide 11 text

Google Cloud Platform 11

Slide 12

Slide 12 text

Google Cloud Platform 12 Storage Bits IOPS scale with size 500GB PD-SSD is the sweet spot Better off with fewer, larger volumes No separate data/journal/log Data is encrypted at-rest Automatically, once it leaves the instance Standard SSD

Slide 13

Slide 13 text

Google Cloud Platform 13 Deploying MongoDB on Google Cloud Platform

Slide 14

Slide 14 text

Google Cloud Platform 14 Manually Deploying MongoDB

Slide 15

Slide 15 text

Google Cloud Platform 15 Google Cloud Launcher

Slide 16

Slide 16 text

Google Cloud Platform 16 MongoDB Cloud Manager

Slide 17

Slide 17 text

Google Cloud Platform 17 MongoDB Cloud Manager How do you automate this?

Slide 18

Slide 18 text

Google Cloud Platform 18 Cloud Deployment Manager Provision, configure your deployment Configuration as code Declarative approach to configuration Template-driven Supports YAML, Jinja, and Python Use schemas to constrain parameters References control order and dependencies

Slide 19

Slide 19 text

Google Cloud Platform 19 Bootstrapping MongoDB Cloud Manager Deployment Manager Template

Slide 20

Slide 20 text

Google Cloud Platform 20 Bootstrapping Cloud Manager Schema, Configuration & Template Posted on Github Three Compute Engine instances, each with 500 GB PD-SSD MongoDB Cloud Manager automation agent pre-installed and configured $ gcloud deployment-manager deployments create mongodb-cloud-manager \ --config mongodb-cloud-manager.jinja \ --properties mmsGroupId=MMSGROUPID,mmsApiKey=MMSAPIKEY

Slide 21

Slide 21 text

21 Defines required properties for deployment machineType, zone, mmsGroupId, mmsApiKey Use supplied defaults or override at runtime Constrain input by type and regex or filter Schema

Slide 22

Slide 22 text

22 Imports instance template Three resources of instance template Pass in properties from schema Configuration

Slide 23

Slide 23 text

23 Two resources, disk and instance Inherits properties from parent template References ensure creation order Instance Template

Slide 24

Slide 24 text

Google Cloud Platform 24 Integrating with Google Cloud Platform

Slide 25

Slide 25 text

Google Cloud Platform 25 MongoDB in Google Cloud Ecosystem

Slide 26

Slide 26 text

Google Cloud Platform 26 MongoDB in Google Cloud Ecosystem

Slide 27

Slide 27 text

Google Cloud Platform 27 Downstream Use Cases Backups Data Warehouse Analytics Applications ETL Machine Learning

Slide 28

Slide 28 text

Google Cloud Platform 28 Downstream Use Cases Backups Data Warehouse Analytics Applications ETL Machine Learning

Slide 29

Slide 29 text

Google Cloud Platform 29 Google Research in Data Technologies 2012 2013 2002 2004 2006 2008 2010 GFS MapReduce BigTable Colossus Dremel Flume Megastore Spanner Millwheel PubSub F1 Google Research Publications referenced are available here: The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009

Slide 30

Slide 30 text

Google Cloud Platform 30 Google Research in Data Technologies 2012 2013 2002 2004 2006 2008 2010 Google Research Publications referenced are available here: The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009 Cloud Storage Dataproc Bigtable Cloud Storage BigQuery Dataflow Datastore Spanner Dataflow PubSub F1

Slide 31

Slide 31 text

Google Cloud Platform 31 Backup Snapshots > db.adminCommand({fsync:1, lock:true}) $ sudo sync $ sudo fsfreeze -f /mnt/your-disk $ gcloud compute disks snapshot DISK $ sudo fsfreeze -f /mnt/your-disk > db.fsyncUnlock() Disk Snap Snap

Slide 32

Slide 32 text

Google Cloud Platform 32 Cloud Storage Standard (hot) & Nearline (warm) mongodump BSON: Pure backup/restore, Hadoop mongoexport JSON, CSV: Cloud Dataflow, BigQuery Downstream Backups

Slide 33

Slide 33 text

Google Cloud Platform 33 Managed Hadoop and Spark with Cloud Dataproc Separation of storage and compute Spin up clusters of any size in ~90 seconds Preemptible VMs are 70% cheaper Per-minute billing Run multiple clusters segregated by job or function Run against backups or via Hadoop Connector or Spark Connector Analytics

Slide 34

Slide 34 text

Google Cloud Platform 34 Extract, Transform, Load Batch and Stream data processing with Cloud Dataflow Intuitive data-processing framework Fully-managed - No-Ops Autoscaling mid-job Dynamic rebalancing mid-job Pull data from multiple sources for ETL jobs

Slide 35

Slide 35 text

Google Cloud Platform 35 Data Warehousing Petabyte-scale data warehousing with BigQuery Supports SQL and JSON fields Fast and independently scales storage and compute No setup or administration Stream in up to 100,000 rows/sec using mongobq Import JSON or CSV from Cloud Storage Run Dataflow jobs to transform and insert into BigQuery

Slide 36

Slide 36 text

Google Cloud Platform 36 Applications Run apps via multiple platforms Compute Engine using standard instances Container Engine for Kubernetes-native apps App Engine Flex for Dockerized apps

Slide 37

Slide 37 text

Google Cloud Platform 37 Machine Learning Machine learning at scale with Cloud ML Powerful image analysis Powerful speech recognition Fast, dynamic translation Trainable, scalable linear and logistic regression

Slide 38

Slide 38 text

Google Cloud Platform 38 node Kubernetes ● MongoDB in Kubernetes is…..non-trivial ● Possible today with shipping Kubernetes ● But some potential issues around Pod rescheduling and persistent volumes in 1.2 ● Some good recipes out there to solve now ● PetSet: improved support for stateful services, coming in Kubernetes 1.3 node master node node node node node

Slide 39

Slide 39 text

Build What’s Next

Slide 40

Slide 40 text

Google Cloud Platform 40 Questions, Comments, Resources @crcsmnky