Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWS SQS queues & Kubernetes Autoscaling Pitfalls Stories

Eric Khun
October 26, 2020

AWS SQS queues & Kubernetes Autoscaling Pitfalls Stories

Talk at the Cloud Native Computing Foundation meetup @dcard.tw

Eric Khun

October 26, 2020
Tweet

More Decks by Eric Khun

Other Decks in Programming

Transcript

  1. Make it work, Make it right, Make it fast kent

    beck (agile manifesto - extreme programming)
  2. Make it work, Make it right, Make it fast kent

    beck (agile manifesto - extreme programming)
  3. Make it work, Make it right, Make it fast kent

    beck (agile manifesto - extreme programming)
  4. A bit of history... 2010 -> 2012: Joel (founder/ceo) 1

    cronjob on a Linode server $20/mo 512 mb of RAM 2012 -> 2017 : Sunil (ex-CTO) Crons running on AWS ElasticBeanstalk / supervisord 2017 -> now: Kubernetes / CronJob controller
  5. Empty messages? > Workers tries to pull messages from SQS,

    but receive “nothing” to process
  6. 1,000,000 API calls to AWS costs 0.40$ We have 7,2B

    calls/month for “empty messages” It costs ~$25k/year > Me:
  7. AWS

  8. $120 > $50 saved daily > $2000 / month >

    $25,000 / year (it’s USD, not TWD)
  9. Benefits - Saving money - Less CPU usage (less empty

    requests) - Less throttling (misleading) - Less containers > Better resources allocation: memory/cpu request
  10. What could have helped? Infra as code (explicit options /

    standardization) SLI/SLOs (keep re-evaluating what’s important) AWS architecture reviews (taging/recommendations from aws solutions architects)
  11. Resources allocated and not doing anything most of the time

    Developer trying to put find compromises on the number of workers
  12. what went wrong - Workers didn’t manage SIGTERM sent by

    k8s - Kept processing messages - Messages were halfway processed and killed - Messages were sent back to the the queue again - Less workers because of downscaling
  13. solution - When receiving SIGTERM stop processing new messages -

    Set a graceful period long enough to process the current message if (SIGTERM) { // finish current processing and stop receiving new messages }