Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimizing your GCP Setup for Scale

Optimizing your GCP Setup for Scale

Tips on how to optimize your GCP setup for scale and keeping the spendings on a minimum.

Harshit Dwivedi

December 08, 2019
Tweet

More Decks by Harshit Dwivedi

Other Decks in Programming

Transcript

  1. Format of the talk 1. Product 2. Tips to effectively

    scale it 3. Tips for reducing the pricing incurred
  2. App Engine • App Engine is a fully managed serverless

    platform that scales with your users.
  3. App Engine • App Engine is a fully managed serverless

    platform that scales with your users. • Supports programming in a variety of languages like Kotlin, Java, GoLang, Nodejs, Python and others.
  4. Scaling App Engine 1. Reduce the frequency of health checks

    2. Standard environment scales to 0 instances, so only use it when you have period of inactivity.
  5. Scaling App Engine 1. Reduce the frequency of health checks

    2. Standard environment scales to 0 instances, so only use it when you have period of inactivity. 3. Tweak the auto scaling parameters to ensure you are utilizing your existing instances efficiently.
  6. Scaling App Engine 1. Reduce the frequency of health checks

    2. Standard environment scales to 0 instances, so only use it when you have period of inactivity. 3. Tweak the auto scaling parameters to ensure you are utilizing your existing instances efficiently 4. Keep the AppEngine in the same region as your other GCP services .
  7. Reducing the Pricing 1. Disable unwanted stackdriver logs 2. Use

    multiple smaller sized instances instead of a single larger instance
  8. Reducing the Pricing 1. Disable unwanted stackdriver logs 2. Use

    multiple smaller sized instances instead of a single larger instance 3. Network Egress in App Engine is costly, only send what you really want to!
  9. Dataflow • Dataflow is a Batch and Stream event processing

    pipeline • Allows you to read data from an input source and modify that data at scale.
  10. Dataflow • Dataflow is a Batch and Stream event processing

    pipeline • Allows you to read data from an input source and modify that data at scale. • Built on Open Sourced Apache Beam, so the SDK can be tweaked accordingly
  11. Dataflow • Dataflow is a Batch and Stream event processing

    pipeline • Allows you to read data from an input source and modify that data at scale. • Built on Open Sourced Apache Beam, so the SDK can be tweaked accordingly • Works with Java, Python, Go and Kotlin.
  12. Scaling Dataflow 1. Identify your use case and use an

    appropriate machine 2. Use a SSD enabled pipeline if your use case involves extensive Disk I/O
  13. Scaling Dataflow 1. Identify your use case and use an

    appropriate machine 2. Use a SSD enabled pipeline if your use case involves extensive Disk I/O 3. Add Deduplication support when using with PubSub
  14. Reducing the Pricing 1. Reduce the disk size 2. Specify

    a custom machine type if the prebuilt ones are not specific enough
  15. Reducing the Pricing 1. Reduce the disk size 2. Specify

    a custom machine type if the prebuilt ones are not specific enough 3. Disable public IPs if you don’t want your pipeline data made available to your users
  16. Reducing the Pricing 1. Reduce the disk size 2. Specify

    a custom machine type if the prebuilt ones are not specific enough 3. Disable public IPs if you don’t want your pipeline data made available to your users 4. Enable Dataflow Streaming engine
  17. BigQuery • BigQuery is a scalable data storage warehouse •

    Backed by SQL, it allows you to perform complex manipulations to you stored data
  18. BigQuery • BigQuery is a scalable data storage warehouse •

    Backed by SQL, it allows you to perform complex manipulations to you data • Has client libraries available in most commonly used languages to access the stored data
  19. Reducing the Pricing 1. Partition your table based on Date

    or the number of columns 2. Use File Loads instead of Streaming data into your table
  20. Reducing the Pricing 1. Partition your table based on Date

    or the number of columns 2. Use File Loads instead of Streaming data into your table 3. Use aggregation for columns if you need to access those columns frequently
  21. Reducing the Pricing 1. Partition your table based on Date

    or the number of columns 2. Use File Loads instead of Streaming data into your table 3. Use aggregation for columns if you need to access those columns frequently 4. Be judicious with your queries
  22. Reducing the Pricing 1. Partition your table based on Date

    or the number of columns 2. Use File Loads instead of Streaming data into your table 3. Use aggregation for columns if you need to access those columns frequently 4. Be judicious with your queries 5. In a partitioned table, set the partition name to “NULL” to query through the data from the past hour
  23. Some Numbers Following the tips above, we cut down spendings

    on GCP by over 70% while maintaining the scale