Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Design_file_cost_optimisation__1_.pptx.pdf

 Design_file_cost_optimisation__1_.pptx.pdf

Avatar for Cloud Native Community

Cloud Native Community

October 16, 2023
Tweet

More Decks by Cloud Native Community

Other Decks in Technology

Transcript

  1. Strategies for Thriving in the Cloud: Sustainability and Scalability in

    a Changing Macroeconomic Environment A $400k/M heist By Sandeep Raghuwanshi Lead DevOps | Razorpay Sandeepraghuwanshi
  2. “Last year we have created stacks of infra and scaled

    it till the moon and Sandeep been a pivotal part of the process, ” Proud Manager Cranking It Up • Team expands and infra creation starts rapidly. • As the consumption peaks scale happens. • At this stage the goal is to serve the requirements. The story begins …
  3. • AWS ◦ Compute ◦ Storage ◦ Network ◦ Database

    • K8S ◦ Over provisioning ◦ Request/Limit on resources ◦ Imbalanced cluster • Monitoring • Logging Bills, Thrills, and Big Bills: Service Expense Categorization
  4. • High expenditure on data transfer • Cluster Autoscaler &

    AutoScaling groups • Priority: One AZ (availability zone) • Secondary AZ fallback for availability assurance • 100% service cost visibility • 30% AWS cost reduction • 40% overall cost/txn reduction • Cost governance process implemented • AWS Graviton processor • Arm64 CPU architecture • 25% worker node transition • 25% cost decrease vs. AMD AWS Compute Storage Network Graviton • Scalability with Spot - 50% lower cost than On-demand • Over 60% of production EC2 instances are on Spot • New applications utilize Spot instead of On-demand • GitHub actions running in K8S on spot
  5. K8S • Goal: Reduce cost and increase instance size •

    Action items aligned with optimizing resource allocation. • Descheduler implementation • Variety of node type • Weekly review and implementation of CRDs for policy enforcement • Increased number of nodes in the system • Elevated operational costs. • Ephemeral cluster creation. • Night stage shutdown • Effectively manage request vs usage • Get 100% visibility on the changes • Keep auto scaling on requirement basis Bin packing Over Provision Imbalance Tuning R/R • Evaluate existing bin packing algorithm tools (eg: Cast AI, Karpenter) • Assess their performance and capabilities • Make sure every node is well utilized • Keep vertical scaling in mind
  6. Monitoring Choose monitoring based on below • Rapid data retrieval

    and analysis • Embedded compression algorithm • Cardinality has to be managed while exposing any metrics. • Monitor the Monitoring system • Keep sample size in check • Reliable time series database • Read load segregation to enhance resilience during zone failures
  7. Logging • Use light weight log exporter (fluentbit > Fluentd)

    • Keep tab on verbosity of logs(whenever possible switch off the debug or whole trail) • Divide the log tier in Hot , Cold, Archive • Make sure to ship the logs in same zone for curtailing any network cost • Make use of compression algorithms of log shippers
  8. Database • DAM stands for Database activity monitoring • Involves

    observing, identifying and reporting a database’s activities • Database firewall acts as a protective layer for the database • Helps detect and prevent database-specific attacks