Twitter is powered by thousands of Applications that run on our internal Cloud platform, a suite of multi-tenant platform services that offer Compute, Storage, Messaging, Monitoring, etc as a service. These platforms have thousands of tenants and run atop hundreds of thousands of servers, across multiple zones. This scale makes it difficult to evaluate resource utilization, cost & efficiency across platforms in a canonical way.
We share how we built a platform agnostic metering & chargeback infrastructure for Twitter's complex platform topology. We use our Compute platform (powered by Apache Aurora/Mesos) as a case-study to show how both the platform owner and users of the platform used it to not only measure resource utilization & cost (across private/public Cloud in a canonical way) but also improve overall resource utilization & drive the cost-per-core down leading to huge savings.