Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Managing Services on Multi-tenant Infrastructur...

Managing Services on Multi-tenant Infrastructure for Twitter Scale

Twitter is powered thousands of micro services consuming resources from an array of multi-tenant infrastructure systems. At our scale, we focus on two things:
1. Developer productivity to ensure engineers are able to ship and scale services quickly and easily
2. Efficient utilization of our multi-tenant infrastructure systems

In this talk, we will share our learnings and experiences around managing micro services and complexities of multi-tenant infrastructure system management at Twitter scale.

Micheal Benedict (@micheal)

February 18, 2016
Tweet

More Decks by Micheal Benedict (@micheal)

Other Decks in Technology

Transcript

  1. FENCING & OWNERSHIP Clear isolation of services & its ownership.

    RELIABILITY
 Failure isolation and graceful degradation SCALABILITY & EFFICIENCY Scale independently ensuring efficient use of Infrastructure DEVELOPER PRODUCTIVITY Make it simple for engineers to build and launch services quickly and easily MONOLITH SERVICES
  2. O(105) : instances/containers O(104) : hosts O(103) : jobs/services O(102)

    : users/teams O(101) : SWEs Credit: Bill Farner, Ian Downes TWITTER COMPUTE SCALE
  3. O(105) : instances/containers O(104) : hosts O(103) : jobs/services O(102)

    : users/teams O(101) : SWEs O(100) : SREs Credit: Bill Farner, Ian Downes TWITTER COMPUTE SCALE
  4. Setup security accounts Manage infrastructure Resources Build, Test, Package and

    Deploy Monitor Health Track infrastructure usage & cost Setup service metadata & configuration LIFECYCLE OF A SERVICE Deprecation 
 and Teardown
  5. PROBLEM SOLUTION AT TWITTER Service Identity Manager ? Service Directory

    (and Metadata) ? Containerization Mesos Containizer Build & Package Jenkins & Packer Delivery & Deploy Packer & Aurora CLI/Workflows Service Discovery Wily (Zookeeper) Monitoring Observability Stack Infrastructure Services Compute: Aurora/Mesos & Hadoop Storage: Manhattan, Blobstore, MySQL, Vertica, HDFS Infrastructure Resource Manager ? Console/CLI ? PROBLEM DOMAINS for SERVICE LIFECYCLE MANAGEMENT
  6. PROBLEM SOLUTION AT TWITTER Service Identity Manager ? Service Directory

    (and Metadata) ? Containerization Mesos Containizer Build & Package Jenkins & Packer Delivery & Deploy Packer & Aurora CLI/Workflows Service Discovery Wily (Zookeeper) Monitoring Observability Stack Infrastructure Services Compute: Aurora/Mesos & Hadoop Storage: Manhattan, Blobstore, MySQL, Vertica, HDFS Infrastructure Resource Manager ? Console/CLI ? PROBLEM DOMAINS for SERVICE LIFECYCLE MANAGEMENT
  7. SERVICE IDENTITY MANAGER INFRASTRUCTURE RESOURCE MANAGER DASHBOARD (SINGLE PANE OF

    GLASS) REPORTING INFRASTRUCTURE SERVICE INFRASTRUCTURE SERVICE INFRASTRUCTURE SERVICE INFRASTRUCTURE SERVICE INFRASTRUCTURE SERVICE INFRASTRUCTURE SERVICE WORKFLOWS METADATA CATALOG & PROVISIONING METERING CHARGEBACK IDENTITY INFRASTRUCTURE SERVICE ADAPTERS
  8. role: foo job_name: foo_bar owner: Team XYZ app_id: foo_bar owner:

    XYZ Infrastructure services employ native mechanisms to provision and manage identifiers Need for federation of a service identity across infrastructure services for use-cases such as: - Service Ownership - Service Discovery - Service to Service Auth and ACL - Resource management - Provisioning & Configuration - Usage Metering & Chargeback name: service_foo owner: XYZ source code: /path COMPUTE STORAGE role: foo build: d7dbf11 version: 291 owner: Team XYZ PACKAGE MANAGER PROBLEM DOMAIN DEEP DIVE #1 - SERVICE IDENTITY MANAGER
  9. IMPACT AT TWITTER - RESOURCE USAGE AND CHARGEBACK We could

    now answer the following questions, driving targeted optimization and efficiency programs saving $$$. Q. What is the overall use of resources across Twitter’s services? Q. How does the overall cost of running Twitter’s services map to the organization? Q. How well utilized is our infrastructure?
  10. KEY TAKEAWAYS FOR DEVELOPERS DASHBOARD A single “pane of glass”

    that facilitates both admin functionalities and interactions with infra services. SERVICE METADATA & CONFIGURATION WORKFLOWS Store service related metadata (for ex, ownership) RESOURCE PROVISIONING WORKFLOWS Request “instant” access/quota to resources such as compute, storage, etc. RESOURCE USAGE AND BILLING Reports that show per service resource usage and its cost
  11. KEY TAKEAWAYS FOR INFRASTRUCTURE SERVICE OWNERS IDENTITY MANAGER Tenant identifier

    provisioning and management for authorization and access control RESOURCE PROVISIONING AND CONFIGURATION Tenant specific resource provisioning & config (for ex, quota on resources like cores, memory) RESOURCE USAGE AND CHARGEBACK Track and measure resource usage. Charge tenant accordingly DASHBOARD Self-service portal for developers to interact with the infrastructure service. For example, managing resources, viewing reports, etc.