How to Build a Scalable API

Slide 1

Slide 1 text

How to Build a Scalable API Travis Reeder CTO/Founder, Iron.io

Slide 2

Slide 2 text

Who am I? Travis Reeder CTO and Co-Founder of Iron.io Iron.io provides scalable and elastic cloud infrastructure services: IronMQ, IronWorker and IronCache. Building things to scale is our business.

Slide 3

Slide 3 text

Iron.io API's ● 100M+ API requests per day ● 300K+ jobs executed per day on IronWorker ● And growing... ● 100% uptime past 30 days ● 99.98% uptime past 6 months As of Feb. 12

Slide 4

Slide 4 text

What is Scalability? Scalability is the ability of a system, network, or process, to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth. For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added. source: wikipedia

Slide 5

Slide 5 text

In other words... You must be able to grow by throwing more hardware at it.

Slide 6

Slide 6 text

Bonus: Increased reliability Build to scale == more reliable. Redundancy Easy to provision new resources Easier transition to HA

Slide 7

Slide 7 text

Choose the right tools

Slide 8

Slide 8 text

Choose the right infrastructure ● Use the cloud ● A cloud that has the ability to truly launch servers on demand and will let you launch a lot of them ○ AWS comes to mind

Slide 9

Slide 9 text

Choose the right load balancer ● Using Amazon? Use ELB ● Using Rackspace? Use Rackspace Load Balancer ● Using something else? Throw a good LB like nginx on a few boxes and point dns to them.

Slide 10

Slide 10 text

Choose the right data store ● Probably the most important decision ● Choose one that scales. For reals. ● Mongodb, Riak, etc are built to scale. ● MySQL/Postgres/etc don't truly scale.

Slide 11

Slide 11 text

Choose the right language and framework ● Ruby on Rails == bad ● Go == good ● Not truly related to scalability because you can always throw more hardware at it. ● But if you like money, choose the right language. ○ We cut our server requirements by 90% by switching

Slide 12

Slide 12 text

KASS ● Keep your API as simple as possible. ● The more features you add, the harder it is to scale. ● Consider every feature and how it will affect things, most importantly your data store ○ ie: inspect the queries

Slide 13

Slide 13 text

Do everything in 3's (or more)

Slide 14

Slide 14 text

3 or more servers for every layer ● 1: No! ● 2: better, but not worth the risk. ● 3+: bingo Bonus points: Put one in a different zone to go for high availability.

Slide 15

Slide 15 text

Cache stuff to take load off the data store and improve performance Your database does a lot of work. Take some load off it by caching things. If it's something that is looked up often, like checking authentication, cache it for a short period of time, even if it's just 30 seconds.

Slide 16

Slide 16 text

Queue up everything that doesn't need to be done synchronously ● Return the required information to the client as fast as possible, queue up the rest. ● Stats, logs, notifications, etc. ● Put messages in queue, let other servers (worker servers) deal with the messages on their own time. Note: there's this really awesome message queue I heard about called IronMQ.

Slide 17

Slide 17 text

Automate everything ● If you have to SSH into your boxes, you're doing it wrong. ● You should be able to launch servers that self configure and get added to the resource pool automatically. ● If you find yourself firing up your SSH client, fix your scripts instead and then launch new servers.

Slide 18

Slide 18 text

Practice scaling ● Don't expect things to just work. ● Setup a staging environment ● Launch servers often. ● Terminate servers often. ● You should be VERY comfortable with killing and launching servers.

Slide 19

Slide 19 text

Test everything Scale testing: ● Run load tests, figure out what you can handle ● Add more resources ● Repeat... Production testing: ● Test your API's all the time, CI is not enough ○ Anything in your system can fail, you should find it before your users

Slide 20

Slide 20 text

Monitor everything ● Setup Pingdom to check your API. ○ Make sure it hits an endpoint that touches your database. ● Install monitoring daemons on your servers and collect that data somewhere. ○ Librato or Datadog are good ● Collect key metrics ○ Throw them at StatHat or Librato ● SETUP ALERTS!!! ○ The graphs are nice, but you need to know immediately when something goes wrong.

Slide 21

Slide 21 text

Thank you Feel free to contact me: [email protected]