Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Design_file_cost_optimisation__1_.pptx.pdf

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

 Design_file_cost_optimisation__1_.pptx.pdf

Avatar for Cloud Native Community

Cloud Native Community

October 16, 2023

More Decks by Cloud Native Community

Other Decks in Technology

Transcript

  1. Strategies for Thriving in the Cloud: Sustainability and Scalability in

    a Changing Macroeconomic Environment A $400k/M heist By Sandeep Raghuwanshi Lead DevOps | Razorpay Sandeepraghuwanshi
  2. “Last year we have created stacks of infra and scaled

    it till the moon and Sandeep been a pivotal part of the process, ” Proud Manager Cranking It Up • Team expands and infra creation starts rapidly. • As the consumption peaks scale happens. • At this stage the goal is to serve the requirements. The story begins …
  3. • AWS ◦ Compute ◦ Storage ◦ Network ◦ Database

    • K8S ◦ Over provisioning ◦ Request/Limit on resources ◦ Imbalanced cluster • Monitoring • Logging Bills, Thrills, and Big Bills: Service Expense Categorization
  4. • High expenditure on data transfer • Cluster Autoscaler &

    AutoScaling groups • Priority: One AZ (availability zone) • Secondary AZ fallback for availability assurance • 100% service cost visibility • 30% AWS cost reduction • 40% overall cost/txn reduction • Cost governance process implemented • AWS Graviton processor • Arm64 CPU architecture • 25% worker node transition • 25% cost decrease vs. AMD AWS Compute Storage Network Graviton • Scalability with Spot - 50% lower cost than On-demand • Over 60% of production EC2 instances are on Spot • New applications utilize Spot instead of On-demand • GitHub actions running in K8S on spot
  5. K8S • Goal: Reduce cost and increase instance size •

    Action items aligned with optimizing resource allocation. • Descheduler implementation • Variety of node type • Weekly review and implementation of CRDs for policy enforcement • Increased number of nodes in the system • Elevated operational costs. • Ephemeral cluster creation. • Night stage shutdown • Effectively manage request vs usage • Get 100% visibility on the changes • Keep auto scaling on requirement basis Bin packing Over Provision Imbalance Tuning R/R • Evaluate existing bin packing algorithm tools (eg: Cast AI, Karpenter) • Assess their performance and capabilities • Make sure every node is well utilized • Keep vertical scaling in mind
  6. Monitoring Choose monitoring based on below • Rapid data retrieval

    and analysis • Embedded compression algorithm • Cardinality has to be managed while exposing any metrics. • Monitor the Monitoring system • Keep sample size in check • Reliable time series database • Read load segregation to enhance resilience during zone failures
  7. Logging • Use light weight log exporter (fluentbit > Fluentd)

    • Keep tab on verbosity of logs(whenever possible switch off the debug or whole trail) • Divide the log tier in Hot , Cold, Archive • Make sure to ship the logs in same zone for curtailing any network cost • Make use of compression algorithms of log shippers
  8. Database • DAM stands for Database activity monitoring • Involves

    observing, identifying and reporting a database’s activities • Database firewall acts as a protective layer for the database • Helps detect and prevent database-specific attacks