Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Future Data Center

The Future Data Center

Strategy and Technology Workshop held at the Huawei R&D, Santa Clara

Vinod Kone

April 05, 2016
Tweet

More Decks by Vinod Kone

Other Decks in Technology

Transcript

  1. © 2015 Mesosphere, Inc. All Rights Reserved. 2 $ whoami

    INTRO Vinod Kone • Tech Lead @ Mesosphere • Former Tech Lead @ Twitter • PhD @ UC Santa Barbara
  2. © 2015 Mesosphere, Inc. All Rights Reserved. 3 A REINFORCING

    TREND microservices containerization container/cluster/resource management
  3. © 2015 Mesosphere, Inc. All Rights Reserved. MICROSERVICES Traditional Application

    Architecture Today’s Microservices Application Architecture REST APIs Hard to scale, wasting resources Many functions in a single process Cross-functional teams organized around capabilities Scalable, efficient and fully dynamic Siloed teams Each element of functionality defined as “microservices”
  4. © 2015 Mesosphere, Inc. All Rights Reserved. VIRTUAL MACHINES VS

    CONTAINERS Private Copy Shared User Code Libraries Virtual Processor Operating System Physical Processor Virtual Machines Containers User Code Libraries Virtual Processor Operating System Physical Processor Start time 30-45 seconds < 50 ms Stop time 5-10 seconds < 50 ms Workload density 10 - 100x 1x
  5. © 2015 Mesosphere, Inc. All Rights Reserved. 8 THE DATACENTER

    IS THE NEW SERVER PHYSICAL (x86) VIRTUAL UNIFIED HYPERSCALE MAINFRAME SERVER VIRTUAL MACHINE PARTITION (LPAR) FULL DATACENTER UNIT OF INTERACTION • ERP, CRM, PRODUCTIVITY, MAIL & WEB SERVER • LINUX, WINDOWS • DATA / TRANSACTION PROCESSING • UNIX, IBM OS/360 DEFINITIVE APPS AND OS NEW FORM FACTOR FOR DEVELOPING APPS AND RUNNING IT • ERP, CRM, PRODUCTIVITY, MAIL & WEB SERVER • LINUX, WINDOWS + HYPERVISOR • BIG DATA, INTERNET OF THINGS, MOBILE APPS • DATACENTER OPERATING SYSTEM
  6. © 2015 Mesosphere, Inc. All Rights Reserved. TRADITIONAL DATACENTER Docker

    App 1 Hadoop Cluster Spark Cluster Cloud Foundry App 2 VMware Cluster OpenStack • Many “snowflakes” • Management nightmare • Lengthy cycles to deploy code • Low utilization
  7. © 2015 Mesosphere, Inc. All Rights Reserved. MODERN DATACENTER •

    High performance and efficient resource isolation • Easy scalability and multi-tenancy • Fault-tolerant and highly available • Highly efficient with highest utilization • Complete workload portability Mesos Docker Big Data Analytics (Hadoop, Spark, etc.) Cloud Foundry Stateful Service (All) Deploys on-premise in cloud or both
  8. © 2015 Mesosphere, Inc. All Rights Reserved. 11 IMPROVED UTILIZATION

    Typical Datacenter siloed, over-provisioned servers, low utilization Mesos Datacenter automated schedulers, workload multiplexing onto the same machines Industry Average 12-15% utilization Mesos Multiplexing 30-40% utilization, up to 96% at some customers 4X
  9. © 2015 Mesosphere, Inc. All Rights Reserved. A BRIEF HISTORY

    OF MESOS Tupperware/Bistro Borg/Omega Apache Mesos Proprietary Proprietary Open Source (Apache License) ~2007 ~2001 2010+ Production-proven Web Scale Cluster Managers • Built at UC Berkeley AMPLab by Ben Hindman et.al • Built in collaboration with Google to overcome some Borg Challenges • Top level project at Apache Software Foundation
  10. © 2015 Mesosphere, Inc. All Rights Reserved. MESOS IS THE

    DATACENTER KERNEL Designed to be flexible • Aggregate all resources in the datacenter for modern apps • Intentionally simple to enable massive scalability • Handles different types of tasks - long running, batch & real-time • Two-level scheduler architecture enables multiple scheduling logic (a key challenge at Google) • Extensible to work with new technologies Downloads Mesos daily downloads, July 2014 - November 2015 Gaining massive adoption
  11. © 2015 Mesosphere, Inc. All Rights Reserved. 14 MESOS ARCHITECTURE

    • Masters • Agents • Frameworks Framework A Scheduler LEADER STANDBY STANDBY Framework B Scheduler Framework A Executor Task Framework B Executor Task ZK ZK Master Master Master Agent 1 Agent N MESOS MASTER QUORUM OFFER OFFER OFFER OFFER ... ZK
  12. © 2015 Mesosphere, Inc. All Rights Reserved. 15 MESOS MASTER

    • Allocates resources to frameworks (first level scheduling) • Manages task life cycle • Signals failures Mesos Master Allocator Scheduler Scheduler
  13. © 2015 Mesosphere, Inc. All Rights Reserved. 16 MESOS AGENT

    • The Mesos agent is a process running on each node in the cluster • Mesos agents have two primary functions: ◦ Manage and offer local resources on the Mesos agent node ◦ Launch and manage the executors using containers to run a task Agent Executor Task Task Task Executor Task Task
  14. © 2015 Mesosphere, Inc. All Rights Reserved. • Frameworks are

    distributed applications that run on Mesos. They comprise the: ◦ Scheduler ◦ Executor 17 FRAMEWORKS
  15. © 2015 Mesosphere, Inc. All Rights Reserved. 18 FRAMEWORK SCHEDULER

    • A framework scheduler is the component that decides which Mesos resource offers to accept or reject to complete the work of that specific framework • The scheduler makes these decisions by: ◦ Examining the offer’s ▪ Resources ▪ Attributes ◦ Matching the scheduler’s resource needs and placement constraints to the offer Framework B (Cassandra) Scheduler Framework A (Marathon) Scheduler LEADER Master OFFER 1 OFFER N
  16. © 2015 Mesosphere, Inc. All Rights Reserved. FRAMEWORK EXECUTOR •

    The executor does the work on behalf of the framework on the agent nodes. • An executor runs within a container • An executor can run multiple tasks 19 Hadoop Executor Task 1 Task 2 Mesos Agent Process Mesos Executor python -m SimpleHTTPServer Container 1 Container 2 Agent Node
  17. © 2015 Mesosphere, Inc. All Rights Reserved. 20 • Resource

    isolation using cgroups and namespaces • Battle tested availability and scalability • Oversubscripton of resources • Multiple container image formats (docker / appc) • Optimistic offers (coming) FEATURES
  18. © 2015 Mesosphere, Inc. All Rights Reserved. 21 • Running

    in production for ~ 5 years • Largest known Mesos production clusters : O(10 ^ 5) containers, O (10 ^ 4) hosts • Most stateless services run on Mesos / Aurora • CAPEX and OPEX savings in millions MESOS @ TWITTER
  19. © 2015 Mesosphere, Inc. All Rights Reserved. 22 PRODUCTION CUSTOMERS

    AND MESOS USERS Internet Proven reliable for large scale, mission-critical deployments Government Agencies
  20. © 2016 Mesosphere, Inc. All Rights Reserved. BEYOND CONTAINER ORCHESTRATION

    Microservices Interactions Example Design & Deploy Monitoring & Operations • Developer access to production-like environments • Service discovery between large number of services • Complex deployment and rollback of services • Ensuring API contract not broken between versions of various services • Monitoring, tracing and root cause analysis to ensure end-to-end performance across large number of services • Utilization of multiple, independent distributed systems Service Quality & Continuity • Fault tolerance and healing (in an always-on environment) Security • Secrets (key) management across large number of services • Incident detection and remediation Hailo Taxi Platform
  21. © 2015 Mesosphere, Inc. All Rights Reserved. 24 BEYOND MESOS

    Mesos becomes top level project @ ASF Apache Mesos built at UC Berkeley Mesosphere DCOS Released Mesosphere Founded Key engineering leaders from Twitter, Airbnb - companies behind open-source tech 2009 2013 2015 2010
  22. © 2015 Mesosphere, Inc. All Rights Reserved. 25 OVERVIEW The

    Mesosphere Datacenter Operating System (DCOS) is a new kind of operating system that spans all of the machines in your datacenter or cloud. It provides a highly elastic, and highly scalable way of deploying applications, services and big data infrastructure on shared resources. Existing Infrastructure Mesosphere DCOS Microservices & Containers Database, Analytics & Other Services DCOS
  23. © 2015 Mesosphere, Inc. All Rights Reserved. 26 MODERN OPERATING

    SYSTEMS, ARCHITECTURAL COMPONENTS Mac OS Android Datacenter OS (DCOS) Desktop Apps (e.g., Safari, Adobe Photoshop, Itunes) GUI (Aqua)/CLI PC / Laptop - App store - OpenGL - Advanced UI gestures BSD Unix Mobile Apps (e.g., Spotify, Evernote, WhatsApp) Android GUI Mobile / Tablet - Telephony Manager - Battery management - External storage support Linux Services (e.g., Docker, Spark, Hadoop, Cassandra) GUI/CLI (DCOS CLI) Full Datacenter / Cloud - Container orchestration - Distributed batch jobs - Persistent storage mgmt Apache Mesos Kernel Applications Form Factor User Interface OS Services (highlights only)
  24. © 2015 Mesosphere, Inc. All Rights Reserved. 27 DEVELOPER AGILITY

    Dev capacity request Fwk Setup Config Delivery Setup Dev QA Staging Start work New code successfully running HW available Prod 40~50% of active developer time on activity not related to code improvement Wait time Concept Dev capacity request Fwk Setup Dev Sys Test Phased Rollout Start work New code successfully running Concept Capacity available DCOS enables CI/CD, without being prescriptive on code management or lifecycle automation tools Prod TRADITIONAL APPROACH TO BUILDING MODERN APPS APPROACH WITH MESOSPHERE DCOS
  25. © 2015 Mesosphere, Inc. All Rights Reserved. 28 DATA AGILITY

    EVENTS Ubiquitous data streams from connected devices FEEDS Kafka ANALYTICS Spark STORAGE Cassandra REACTIVE APP Akka Ingest millions of events per second Real-time and batch process data Distributed & highly scalable database Scalable, resilient, data driven applications Sensors Devices Clients
  26. © 2015 Mesosphere, Inc. All Rights Reserved. 29 Cloud Private

    Datacenter Move to Cloud Burst to Cloud Data-aware scheduling Failover and fault tolerance Identical user experience Hybrid cloud scenarios • Same user experience as customers continue to move workloads from private data centers to Cloud • Autoscaling for burst scenarios to Cloud; dynamically scale cloud server capacity • Schedule workloads to a private datacenter or Cloud based on data gravity type of application (e.g. financial records vs. sensor data) • Automatically move workloads to Cloud in the case of private datacenter failure HYBRID INFRASTRUCTURE
  27. © 2016 Mesosphere, Inc. All Rights Reserved. AUTOMATED OPERATIONS OF

    DISTRIBUTED SYSTEMS FUTURE software will manage itself, using Mesos and the DCOS API • most distributed systems are difficult to manage but they don’t need to be Kafka Spark Cassandra Data processing engine Messaging backbone Distributed database HDFS Distributed file system