MongoDB World 2016: MongoDB and Google Cloud

MongoDB World 2016: MongoDB and Google Cloud

A1af6e45dfb6e6cb9a64834484adf788?s=128

Sandeep Parikh

June 28, 2016
Tweet

Transcript

  1. MongoDB and Google Cloud Platform Sandeep Parikh Head of Solutions,

    Americas East @crcsmnky
  2. Agenda MongoDB Deployment Architectures The Finer Points of Configuration Deploying

    MongoDB on Google Cloud Platform Integrating with Google Cloud Platform
  3. Google Cloud Platform 3 MongoDB Deployment Architectures

  4. Google Cloud Platform 4 Replica Sets Across Zones

  5. Google Cloud Platform 5 Replica Sets Across Regions

  6. Google Cloud Platform 6 Sharded Cluster Across Regions

  7. Google Cloud Platform 7 The Finer Points of Configuration

  8. Google Cloud Platform 8 Mostly available single CPU 3.75 GB

    of RAM per vCPU Higher CPU relative to memory 0.9 GB of RAM per vCPU Higher memory per core 6.5 GB of RAM per vCPU Machine Types Standard High Memory High Compute Shared Core Custom Balanced CPU and memory configurations 3.75 GB of RAM per vCPU Independently scale CPU and RAM Max 6.5 GB of RAM per vCPU
  9. Google Cloud Platform 9 Mostly available single CPU 3.75 GB

    of RAM per vCPU Higher CPU relative to memory 0.9 GB of RAM per vCPU Higher memory per core 6.5 GB of RAM per vCPU Machine Types Standard High Memory High Compute Shared Core Custom Balanced CPU and memory configurations 3.75 GB of RAM per vCPU Independently scale CPU and RAM Max 6.5 GB of RAM per vCPU Good for getting started Best for MongoDB workloads Skip it, you probably don’t need the compute Configure one that fits your working set g1-small suitable for Arbiter, maybe
  10. Google Cloud Platform 10 Disks Standard SSD Local SSD Persistent

    storage Max 3,000 Read Max 15,000 Write $0.04 per GB Up to 64 TB Persistent storage Max 15,000 Read Max 15,000 Write $0.17 per GB Up to 64 TB Ephemeral storage Max 680,000 Read Max 360,000 Write $0.218 per GB 375 GB only
  11. Google Cloud Platform 11

  12. Google Cloud Platform 12 Storage Bits IOPS scale with size

    500GB PD-SSD is the sweet spot Better off with fewer, larger volumes No separate data/journal/log Data is encrypted at-rest Automatically, once it leaves the instance Standard SSD
  13. Google Cloud Platform 13 Deploying MongoDB on Google Cloud Platform

  14. Google Cloud Platform 14 Manually Deploying MongoDB

  15. Google Cloud Platform 15 Google Cloud Launcher

  16. Google Cloud Platform 16 MongoDB Cloud Manager

  17. Google Cloud Platform 17 MongoDB Cloud Manager How do you

    automate this?
  18. Google Cloud Platform 18 Cloud Deployment Manager Provision, configure your

    deployment Configuration as code Declarative approach to configuration Template-driven Supports YAML, Jinja, and Python Use schemas to constrain parameters References control order and dependencies
  19. Google Cloud Platform 19 Bootstrapping MongoDB Cloud Manager Deployment Manager

    Template
  20. Google Cloud Platform 20 Bootstrapping Cloud Manager Schema, Configuration &

    Template Posted on Github https://github.com/GoogleCloudPlatform/mongodb-cloud-manager Three Compute Engine instances, each with 500 GB PD-SSD MongoDB Cloud Manager automation agent pre-installed and configured $ gcloud deployment-manager deployments create mongodb-cloud-manager \ --config mongodb-cloud-manager.jinja \ --properties mmsGroupId=MMSGROUPID,mmsApiKey=MMSAPIKEY
  21. 21 Defines required properties for deployment machineType, zone, mmsGroupId, mmsApiKey

    Use supplied defaults or override at runtime Constrain input by type and regex or filter Schema
  22. 22 Imports instance template Three resources of instance template Pass

    in properties from schema Configuration
  23. 23 Two resources, disk and instance Inherits properties from parent

    template References ensure creation order Instance Template
  24. Google Cloud Platform 24 Integrating with Google Cloud Platform

  25. Google Cloud Platform 25 MongoDB in Google Cloud Ecosystem

  26. Google Cloud Platform 26 MongoDB in Google Cloud Ecosystem

  27. Google Cloud Platform 27 Downstream Use Cases Backups Data Warehouse

    Analytics Applications ETL Machine Learning
  28. Google Cloud Platform 28 Downstream Use Cases Backups Data Warehouse

    Analytics Applications ETL Machine Learning
  29. Google Cloud Platform 29 Google Research in Data Technologies 2012

    2013 2002 2004 2006 2008 2010 GFS MapReduce BigTable Colossus Dremel Flume Megastore Spanner Millwheel PubSub F1 Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009 http://research.google.com/pubs/pub35290.html
  30. Google Cloud Platform 30 Google Research in Data Technologies 2012

    2013 2002 2004 2006 2008 2010 Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009 http://research.google.com/pubs/pub35290.html Cloud Storage Dataproc Bigtable Cloud Storage BigQuery Dataflow Datastore Spanner Dataflow PubSub F1
  31. Google Cloud Platform 31 Backup Snapshots > db.adminCommand({fsync:1, lock:true}) $

    sudo sync $ sudo fsfreeze -f /mnt/your-disk $ gcloud compute disks snapshot DISK $ sudo fsfreeze -f /mnt/your-disk > db.fsyncUnlock() Disk Snap Snap
  32. Google Cloud Platform 32 Cloud Storage Standard (hot) & Nearline

    (warm) mongodump BSON: Pure backup/restore, Hadoop mongoexport JSON, CSV: Cloud Dataflow, BigQuery Downstream Backups
  33. Google Cloud Platform 33 Managed Hadoop and Spark with Cloud

    Dataproc Separation of storage and compute Spin up clusters of any size in ~90 seconds Preemptible VMs are 70% cheaper Per-minute billing Run multiple clusters segregated by job or function Run against backups or via Hadoop Connector or Spark Connector Analytics
  34. Google Cloud Platform 34 Extract, Transform, Load Batch and Stream

    data processing with Cloud Dataflow Intuitive data-processing framework Fully-managed - No-Ops Autoscaling mid-job Dynamic rebalancing mid-job Pull data from multiple sources for ETL jobs
  35. Google Cloud Platform 35 Data Warehousing Petabyte-scale data warehousing with

    BigQuery Supports SQL and JSON fields Fast and independently scales storage and compute No setup or administration Stream in up to 100,000 rows/sec using mongobq Import JSON or CSV from Cloud Storage Run Dataflow jobs to transform and insert into BigQuery
  36. Google Cloud Platform 36 Applications Run apps via multiple platforms

    Compute Engine using standard instances Container Engine for Kubernetes-native apps App Engine Flex for Dockerized apps
  37. Google Cloud Platform 37 Machine Learning Machine learning at scale

    with Cloud ML Powerful image analysis Powerful speech recognition Fast, dynamic translation Trainable, scalable linear and logistic regression
  38. Google Cloud Platform 38 node Kubernetes • MongoDB in Kubernetes

    is…..non-trivial • Possible today with shipping Kubernetes • But some potential issues around Pod rescheduling and persistent volumes in 1.2 • Some good recipes out there to solve now • PetSet: improved support for stateful services, coming in Kubernetes 1.3 node master node node node node node
  39. Build What’s Next

  40. Google Cloud Platform 40 Questions, Comments, Resources @crcsmnky https://cloud.google.com/solutions/deploy-mongodb https://github.com/GoogleCloudPlatform/mongodb-cloud-manager