Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Applications at theScore using AWS

Luke Reeves
January 16, 2014

Scaling Applications at theScore using AWS

Luke Reeves

January 16, 2014
Tweet

More Decks by Luke Reeves

Other Decks in Technology

Transcript

  1. What is theScore? Mobile sports app for Android, iOS, and

    Blackberry Millions of unique users and installations Billions of push notifications sent
  2. Technology Stack Ruby on Rails Python Memcached Redis MySQL Nginx

    Haproxy Varnish Ansible Logstash Elastic Search Graphite Netflix Ice PagerDuty Airbrake GitHub New Relic Stock Keynote templates
  3. AWS Technologies EC2 Autoscaling via AMIs ELB, sometimes RedShift, sometimes

    CloudFront RDS Elasticache S3 Cloudwatch Route 53 IAM CloudTrail
  4. Do I want to use AWS? Yes… probably! AWS is

    a massive toolkit that contains platform and infrastructure as a service. It has advantages for when your app is being built all the way to it requiring massive scale. Don’t be scared of the cloud being fragile.
  5. App Basics Use appropriate instance types - e.g. don’t perform

    heavy IO on compute and memory instances. Design applications to be as stateless as possible Know what components are easy to scale (web servers) and what ones are hard or impossible to scale (database masters) Scaling up and down should have no external dependencies. Plan or learn various limits; this is usually done the hard way. Bonus: Team members who can build tooling specific to your scenarios.
  6. How do Rails apps work perform on AWS? As bad

    as everywhere else! Specifically AWS is not built around pure performance, and bottlenecks will be surfaced quickly Seriously though, the MRI both memory intensive and very slow at certain tasks. Garbage collection can be a very slow process for the MRI. The forking model of most Rails servers gives you copy-on- write, but savings are usually not huge for larger applications
  7. Workarounds See if your application works on JRuby (perhaps Rubinius,

    if you’re brave). The JVM has almost 20 years of work behind it. Use Rack-level caching with stores such as memcache. Use HTTP-based caches in front of your application such as Varnish (okay let’s be fair, only Varnish). Have metrics and tooling to identify bottlenecks and address them as soon as you can.
  8. So why use this? Two words - resiliency, and scaling.

    You can succeed with any platform - physical hardware, colo, PAAS, VMs or IAAS. Amazon provides the tools out of the box to address resiliency and scaling. Full spectrum from PAAS to IAAS. Depending on your needs, you may not care about either of these!
  9. Resiliency is Scaling Most of the techniques to make your

    application scaleable are also applicable to resiliency: Database replicas allow you to spread the load across systems, and can replace a failed primary system Scaling groups can be spread across availability zones, which means if a zone fails they can be scaled up in the remaining ones Data can be replicated across regions (finally) Instance health checks are used for both scaling and to terminate instances with problems.
  10. Provisioning Use the same Ansible playbooks for each role. After

    any deployments (changes) to running instances create an AMI snapshot of the host. Update the scaling group for the changed role to point to the AMI. Use the EC2 APIs to configure load balancers on a schedule (every minute works fine).
  11. Limits and Bottlenecks You will find a ton of unexpected

    limitations in technical and non-technical aspects of your stack. AWS limitations - instances allowed by type, number of AMIs, number of launch configurations, etc. File and socket limits per instance (use keepalives where possible to alleviate this) Bandwidth limits per instance - scale your caches. IO throughput limitations - AWS allows you to provision IOPS for databases and EC2 instances, use this anytime you need guaranteed throughput rates.
  12. Cost Amazon is definitely not cheap, but you get exactly

    what you pay for. Every resource you provision should be used all the time. Reserve any resources that are static and not going away.
  13. Conclusion AWS has enabled us to grow very quickly, and

    respond to massive anticipated and unanticipated bursts. Using AWS has definitely been a success story for theScore. Questions? Luke Reeves @lukereeves