on Rails codebase. At that time, approximately 200 engineers pushing code • Needed a solution to isolate failure and isolate feature development Previously at Twitter
needs 2. Must scale to tens of thousands of nodes running hundreds of jobs with millions of tasks 3. Must be fault-tolerant and highly available http://static.usenix.org/events/nsdi11/tech/full_papers/Hindman_new.pdf
and scheduling of jobs • Rich DSL for defining services • Health checking and SLA monitoring • Battle-tested in production at Twitter for multiple years
Job abstraction to bundle tasks • Ability to run multiple applications that are replicas of one another, and manage through a single point • Rolling Deploys Key Aurora Features