MongoDB World 2016: MongoDB and Google Cloud

MongoDB World 2016: MongoDB and Google Cloud

A1af6e45dfb6e6cb9a64834484adf788?s=128

Sandeep Parikh

June 28, 2016
Tweet

Transcript

  1. 2.

    Agenda MongoDB Deployment Architectures The Finer Points of Configuration Deploying

    MongoDB on Google Cloud Platform Integrating with Google Cloud Platform
  2. 8.

    Google Cloud Platform 8 Mostly available single CPU 3.75 GB

    of RAM per vCPU Higher CPU relative to memory 0.9 GB of RAM per vCPU Higher memory per core 6.5 GB of RAM per vCPU Machine Types Standard High Memory High Compute Shared Core Custom Balanced CPU and memory configurations 3.75 GB of RAM per vCPU Independently scale CPU and RAM Max 6.5 GB of RAM per vCPU
  3. 9.

    Google Cloud Platform 9 Mostly available single CPU 3.75 GB

    of RAM per vCPU Higher CPU relative to memory 0.9 GB of RAM per vCPU Higher memory per core 6.5 GB of RAM per vCPU Machine Types Standard High Memory High Compute Shared Core Custom Balanced CPU and memory configurations 3.75 GB of RAM per vCPU Independently scale CPU and RAM Max 6.5 GB of RAM per vCPU Good for getting started Best for MongoDB workloads Skip it, you probably don’t need the compute Configure one that fits your working set g1-small suitable for Arbiter, maybe
  4. 10.

    Google Cloud Platform 10 Disks Standard SSD Local SSD Persistent

    storage Max 3,000 Read Max 15,000 Write $0.04 per GB Up to 64 TB Persistent storage Max 15,000 Read Max 15,000 Write $0.17 per GB Up to 64 TB Ephemeral storage Max 680,000 Read Max 360,000 Write $0.218 per GB 375 GB only
  5. 12.

    Google Cloud Platform 12 Storage Bits IOPS scale with size

    500GB PD-SSD is the sweet spot Better off with fewer, larger volumes No separate data/journal/log Data is encrypted at-rest Automatically, once it leaves the instance Standard SSD
  6. 18.

    Google Cloud Platform 18 Cloud Deployment Manager Provision, configure your

    deployment Configuration as code Declarative approach to configuration Template-driven Supports YAML, Jinja, and Python Use schemas to constrain parameters References control order and dependencies
  7. 20.

    Google Cloud Platform 20 Bootstrapping Cloud Manager Schema, Configuration &

    Template Posted on Github https://github.com/GoogleCloudPlatform/mongodb-cloud-manager Three Compute Engine instances, each with 500 GB PD-SSD MongoDB Cloud Manager automation agent pre-installed and configured $ gcloud deployment-manager deployments create mongodb-cloud-manager \ --config mongodb-cloud-manager.jinja \ --properties mmsGroupId=MMSGROUPID,mmsApiKey=MMSAPIKEY
  8. 21.

    21 Defines required properties for deployment machineType, zone, mmsGroupId, mmsApiKey

    Use supplied defaults or override at runtime Constrain input by type and regex or filter Schema
  9. 23.

    23 Two resources, disk and instance Inherits properties from parent

    template References ensure creation order Instance Template
  10. 27.

    Google Cloud Platform 27 Downstream Use Cases Backups Data Warehouse

    Analytics Applications ETL Machine Learning
  11. 28.

    Google Cloud Platform 28 Downstream Use Cases Backups Data Warehouse

    Analytics Applications ETL Machine Learning
  12. 29.

    Google Cloud Platform 29 Google Research in Data Technologies 2012

    2013 2002 2004 2006 2008 2010 GFS MapReduce BigTable Colossus Dremel Flume Megastore Spanner Millwheel PubSub F1 Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009 http://research.google.com/pubs/pub35290.html
  13. 30.

    Google Cloud Platform 30 Google Research in Data Technologies 2012

    2013 2002 2004 2006 2008 2010 Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2009 http://research.google.com/pubs/pub35290.html Cloud Storage Dataproc Bigtable Cloud Storage BigQuery Dataflow Datastore Spanner Dataflow PubSub F1
  14. 31.

    Google Cloud Platform 31 Backup Snapshots > db.adminCommand({fsync:1, lock:true}) $

    sudo sync $ sudo fsfreeze -f /mnt/your-disk $ gcloud compute disks snapshot DISK $ sudo fsfreeze -f /mnt/your-disk > db.fsyncUnlock() Disk Snap Snap
  15. 32.

    Google Cloud Platform 32 Cloud Storage Standard (hot) & Nearline

    (warm) mongodump BSON: Pure backup/restore, Hadoop mongoexport JSON, CSV: Cloud Dataflow, BigQuery Downstream Backups
  16. 33.

    Google Cloud Platform 33 Managed Hadoop and Spark with Cloud

    Dataproc Separation of storage and compute Spin up clusters of any size in ~90 seconds Preemptible VMs are 70% cheaper Per-minute billing Run multiple clusters segregated by job or function Run against backups or via Hadoop Connector or Spark Connector Analytics
  17. 34.

    Google Cloud Platform 34 Extract, Transform, Load Batch and Stream

    data processing with Cloud Dataflow Intuitive data-processing framework Fully-managed - No-Ops Autoscaling mid-job Dynamic rebalancing mid-job Pull data from multiple sources for ETL jobs
  18. 35.

    Google Cloud Platform 35 Data Warehousing Petabyte-scale data warehousing with

    BigQuery Supports SQL and JSON fields Fast and independently scales storage and compute No setup or administration Stream in up to 100,000 rows/sec using mongobq Import JSON or CSV from Cloud Storage Run Dataflow jobs to transform and insert into BigQuery
  19. 36.

    Google Cloud Platform 36 Applications Run apps via multiple platforms

    Compute Engine using standard instances Container Engine for Kubernetes-native apps App Engine Flex for Dockerized apps
  20. 37.

    Google Cloud Platform 37 Machine Learning Machine learning at scale

    with Cloud ML Powerful image analysis Powerful speech recognition Fast, dynamic translation Trainable, scalable linear and logistic regression
  21. 38.

    Google Cloud Platform 38 node Kubernetes • MongoDB in Kubernetes

    is…..non-trivial • Possible today with shipping Kubernetes • But some potential issues around Pod rescheduling and persistent volumes in 1.2 • Some good recipes out there to solve now • PetSet: improved support for stateful services, coming in Kubernetes 1.3 node master node node node node node