Optimizing your GCP Setup for Scale

Optimizing your GCP setup for Scale Harshit Dwivedi : Co
Founder, Roobits

What this Talk is About

A brief intro about me

Cloud is a better (and cheaper) way to run your
business

Format of the talk 1. Product 2. Tips to eﬀectively
scale it 3. Tips for reducing the pricing incurred

App Engine

App Engine • App Engine is a fully managed serverless
platform that scales with your users.

App Engine • App Engine is a fully managed serverless
platform that scales with your users. • Supports programming in a variety of languages like Kotlin, Java, GoLang, Nodejs, Python and others.

Scaling App Engine 1. Reduce the frequency of health checks

2. Standard environment scales to 0 instances, so only use it when you have period of inactivity.

2. Standard environment scales to 0 instances, so only use it when you have period of inactivity. 3. Tweak the auto scaling parameters to ensure you are utilizing your existing instances eﬃciently.

2. Standard environment scales to 0 instances, so only use it when you have period of inactivity. 3. Tweak the auto scaling parameters to ensure you are utilizing your existing instances eﬃciently 4. Keep the AppEngine in the same region as your other GCP services .

Reducing the Pricing 1. Disable unwanted stackdriver logs

Reducing the Pricing 1. Disable unwanted stackdriver logs 2. Use
multiple smaller sized instances instead of a single larger instance

Reducing the Pricing 1. Disable unwanted stackdriver logs 2. Use
multiple smaller sized instances instead of a single larger instance 3. Network Egress in App Engine is costly, only send what you really want to!

Dataﬂow

Dataﬂow • Dataﬂow is a Batch and Stream event processing
pipeline

pipeline • Allows you to read data from an input source and modify that data at scale.

pipeline • Allows you to read data from an input source and modify that data at scale. • Built on Open Sourced Apache Beam, so the SDK can be tweaked accordingly

pipeline • Allows you to read data from an input source and modify that data at scale. • Built on Open Sourced Apache Beam, so the SDK can be tweaked accordingly • Works with Java, Python, Go and Kotlin.

Scaling Dataﬂow 1. Identify your use case and choose an
appropriate machine

Scaling Dataﬂow 1. Identify your use case and use an
appropriate machine 2. Use a SSD enabled pipeline if your use case involves extensive Disk I/O

Scaling Dataﬂow 1. Identify your use case and use an
appropriate machine 2. Use a SSD enabled pipeline if your use case involves extensive Disk I/O 3. Add Deduplication support when using with PubSub

Reducing the Pricing 1. Reduce the disk size

Reducing the Pricing 1. Reduce the disk size 2. Specify
a custom machine type if the prebuilt ones are not speciﬁc enough

a custom machine type if the prebuilt ones are not speciﬁc enough 3. Disable public IPs if you don’t want your pipeline data made available to your users

a custom machine type if the prebuilt ones are not speciﬁc enough 3. Disable public IPs if you don’t want your pipeline data made available to your users 4. Enable Dataﬂow Streaming engine

BigQuery

BigQuery • BigQuery is a scalable data storage warehouse

BigQuery • BigQuery is a scalable data storage warehouse •
Backed by SQL, it allows you to perform complex manipulations to you stored data

BigQuery • BigQuery is a scalable data storage warehouse •
Backed by SQL, it allows you to perform complex manipulations to you data • Has client libraries available in most commonly used languages to access the stored data

Scaling BigQuery

Reducing the Pricing 1. Partition your table based on Date
or the number of columns

or the number of columns 2. Use File Loads instead of Streaming data into your table

or the number of columns 2. Use File Loads instead of Streaming data into your table 3. Use aggregation for columns if you need to access those columns frequently

or the number of columns 2. Use File Loads instead of Streaming data into your table 3. Use aggregation for columns if you need to access those columns frequently 4. Be judicious with your queries

or the number of columns 2. Use File Loads instead of Streaming data into your table 3. Use aggregation for columns if you need to access those columns frequently 4. Be judicious with your queries 5. In a partitioned table, set the partition name to “NULL” to query through the data from the past hour

GCP at Roobits

Some Numbers More than 500 mil daily request tracked and
stored

Some Numbers Average latency of 1.5 seconds

Services Used App Engine, PubSub, Dataﬂow, BigQuery and GCS

Some Numbers Following the tips above, we cut down spendings
on GCP by over 70% while maintaining the scale

For more info https://roobits.com

Thanks, Questions!? Twitter: twitter.com/harshithdwivedi Medium: medium.com/@harshithdwivedi Email: [email protected]

Optimizing your GCP Setup for Scale

Optimizing your GCP Setup for Scale

More Decks by Harshit Dwivedi

Other Decks in Programming

Featured

Transcript