Twitter is powered by thousands of microservices running on an internal cloud platform consisting of a suite of multitenant platform services that offer compute, storage, messaging, monitoring, etc. as a service. These platforms have thousands of tenants and run atop hundreds of thousands of servers both on-premises and in the public cloud. The scale of diversity in Twitter’s multitenant infrastructure services makes it extremely difficult to effectively forecast capacity, compute resource utilization, and cost and drive efficiency.
Vinu Charanya explains how she and her team are building a system that captures, defines, provisions, meters, and charges infrastructure resources, redefining how systems are built atop Twitter infrastructure. The infrastructure resources include primitive bare metal servers and VMs in the public cloud and abstract resources offered by multitenant services such as a compute platform (powered by Apache Aurora and Mesos), storage (Manhattan for key-value, cache, RDBMS), and observability. Along the way, Vinu shares how Twitter used this data to better plan capacity and drive a cultural change in engineering that helped improve overall resource utilization and led to significant savings in infrastructure spending.