◦ ~10K Desktop applications (.NET) talking to the Cloud ◦ Control the apps via commands ▪ Sync Logs -- most common ask ▪ Refresh Configurations file. ▪ Sync the Telemetry data on demand. • Scheduled push already happening.
and Desktop App understand. ✅ • Sync Approach: ◦ Poll? ▪ 5 sec ⇒ 2000 reqs/sec. Really? ❌ ** ◦ Long Polling Connections? ✔ ◦ Web-Sockets ✅ • But we want fancy tech!! =) ** This number was incorrectly presented as 10K in the talk.
box. ◦ XMPP Extension Protocols (XEP’s) ▪ Ad-hoc commands (https://xmpp.org/extensions/xep-0050.html) ◦ What more? ▪ We can embed our own chat clients - That’s so cool !! ▪ Inventing a problem that doesn’t exist! • eJabberd? → We don’t know Erlang? • OpenFire Server -- Java ◦ Allows User Plugins => We can hack Custom Protocols. ◦ What’s the big deal? • .NET client libs for XMPP
⇒ Sparse! • New Technology! • Basic plugins for reference. Reverse engineer from OpenFire code. • Scaling challenges ◦ Concurrency constructs not very obvious ⇒ Poor documentation kills you! • Unknown Unknowns galore! • Took ~ 3 months to stabilize for production • The developer had already lost the steam towards the end. • It worked until it stopped • Facing an Unknown Enemy! • It was a mess!
Web-Sockets could have solved this for us. • Less than 200-300 lines of JS code • Could have gone live in less than 3 weeks. • Low maintenance headache & Fewer unknowns! • Didn’t have to solve for problems that didn’t exist!
◦ Separate Service deployments for each product. • MySQL • ElasticSearch • Redis • Message Queues • Amazon Web Services. ◦ 30 EC2 instances - 16 instances on at the application services layer. ◦ RDS ◦ ElastiCache ◦ Elastic Load Balancers • Cloudinary & Akamai
extra traffic, scale up the application services infra to meet the 5X growth in traffic. • Constraints: ◦ You have limited $$ at your disposal to spend on Infra. ◦ You have only 1 Senior Engineer to spare. :) • Welcome to a start-up! =)
(Docker Swarm, Kubernetes - kops)? • Auto-Scaling Groups on EC2? • New Technologies ◦ Learning Curve - Engineers love to tinker with “Cool” tech. ◦ Maintenance cost ?? ▪ Known Unknowns vs Unknown Unknowns!
• Elasticity Factor ◦ Seconds? -- Lambda or Cloud Functions? ❌ ◦ Mins? -- Containers? ❌ ◦ Hour? -- Virtual Machines perhaps? ✅ • Cost Savings ◦ Do I need to save costs for extra seconds/mins? ❌ ◦ Do I need to save costs for extra hours? ✅
15 mins (Jenkins already used heavily within the team) • AWS SimpleDB ◦ JSON Documents: Service => Time & Server Count Mapping ◦ Easy GUI to manipulate JSON documents. • Ruby Scripts ◦ Scale-up ▪ Capistrano for deploying latest code builds. ▪ Register under the Load Balancer ◦ Scale down ▪ Fail the liveness probe for the Load Balancer ▪ Wait for the request queue to drain - Ruby scripts to check Passenger queue lengths. ▪ De-register & shutdown.
to 7X traffic volumes • Solution worked for 18 months before we explored containers. • Saved 40% cost over on-demand instances. ◦ Rough Calculations showed we could have saved an additional ~10% with container-based scaling. ◦ At what cost? • Templatized the solution ◦ QA environment moved to this model. ◦ Regression & Sanity Suites invoke the scaling jobs as a prerequisite. ◦ Each QA group could spawn their environment at will.