Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloud Operations - 1st Principles

vamsi sistla
February 25, 2014

Cloud Operations - 1st Principles

Cloud Operations - 1st Principles

vamsi sistla

February 25, 2014
Tweet

More Decks by vamsi sistla

Other Decks in Technology

Transcript

  1. Overview  This session talks about fundamental and tactical skills

    and knowledge for performing duties of Cloud Administrator and DevOps. By the end of this session, you will –  You will learn various steps and processes as part of your roles as Cloud Administrator.  This sessions covers general concepts without addressing vendor specific concepts and methodologies.  Prerequisites:  Your undivided attention  Understanding of various Cloud Technologies  Some high level understanding of Cloud Architecture
  2. Preplanning  Come up with a deployment plan  Cloud

    Operations is a Marathon – not a Sprint.  Cloud Administrator is like conductor for an orchestra.  Learn about hardware requirements – file space, storage, partitioning, memory  # of instances to run and how many of them to run simultaneous.  How many load balancers do you need and if you need.  What are Software requirements – OSs, Coding and Implementation Framework, libraries, Web Servers, and so forth.  Map needs of your current legacy applications that will be migrated to the cloud  Based on the Product and Technology roadmap, make sure your infrastructure is capable of supporting future applications.  Configuration – Very important to understand your configuration requirements. This also includes Network Configuration along with Software and Application level.  Understanding your network topology, geographical foot print of your users and mapping to your data centers will help you reduce costs at the same time increase performance.
  3.  Understand SLAs provided by the vendor to learn the

    service up time and what does that mean for your business. Assessing risk and business continuity is also very critical part of Cloud operations.  six 9s mean only 31 seconds of down time and five 9s means 5 mins of downtime in an entire year.  Cloud Availability – Learn about Mean Time To Failure (MTTF), Mean Time to Diagnose (MTTD), Mean Time to Repair (MTTR) and Mean Time Between Failures (MTBF).  Based on above, assess risk for business.  Cloud Vendors provide their own unique and standard tools for Cloud Administrators to manage their clouds.  Learning about the similarities and differences is important  Understand and document Users, User interactions, their Data, User Roles, User Permissions and so forth.  # of users and what, how and when they need to access in the cloud.  How will all types of users interact with your cloud infrastructure? (need of APIs & command line clients?)  User authentication? LDAP anyone? In memory key-value store? SQL DB? PAM (Pluggable Authentication Module) in APIs?  Coordinate and Collaborate with your business users  Understand about their Application needs and QoS expectations  Work with them on modifying and configuring their testing tools. Based on testing needs, make sure your infrastructure is appropriately commissioned.  Each Vendor Provides their own tools and support industry standard tools for deployment. Mapping your deployment plan with the available tools is important. Preplanning
  4.  Plan and prepare for incident management and issue tracking

     Document how you will manage issues – identity, diagnose, fix and deploy  Set expectations with your end users and business users  Work with your legal team articulate relevant details into your EULA (End user License Agreement) and Terms & Conditions.  Ongoing management and Upgrade Cycles  Infrastructure and Software Upgrade Planning  Verify that newer versions of your infrastructure and software are compatible with your existing software and application stack.  Back up all the relevant files and scripts – like configuration and security settings.  Back up data and databases for easy roll back. Preplanning
  5. Automation  Automation is very critical part of Cloud Computing

     Data Recovery, Resource Pooling (dynamic provisioning), Provisioning Policies, Resource Limitation  Benefits of Automation  Availability – Resource allocation and provisioning even during off business hours. Eliminates too much human dependency  Limiting Human Errors – Huge benefit of automation is reduce human errors  Hidden Complexity – takes care of resource availability without requiring operators to understand the location and type of individual host server equipment.  Standardization – Helps on consistency and repeatability during your operations.  Resource utilization and optimization – ability to automatically scale up or down helps efficient resource utilization while reducing the costs of operation.
  6. Auto Deployment, Configuration and Management  One of the primary

    goals of a Cloud Administrator to minimize of the operational cost of running your cloud.  Ubuntu and Red Hat Linux include mechanism for configuring the OS and deploy – typically this process is called bootstrapping. On AWS and Azure, even Windows servers have such capabilities.  If you want to deploy as an image, you can also use systemimager.  Automatic Configuration  The purpose is to keep low human intervention to avoid human errors.  There are configuration management tools that help you test your instances, scale up, scale down and also ability for you to roll back if needed.  Examples such as Puppet and Chef.  Remote Management  Ability to access to Server infrastructure, OS and application stack remotely is critical. Most of the vendors provide remote management tools and services (such as RDP, etc).  This also means ability to access your servers and data center during the times of lights-out.  Example – IPMI (Intelligent Platform Management Interface) is a standard – supported by most of the HW vendors – gives you ability to interact with the Server via network layer than through OS. This way during the times of power failure and other disasters, you can reach your Servers. Also, having remote access to Power Distribution Unit (PDU) that the server is plugged is also very critical.
  7. Cloud Brokers  Cloud Broker as the name suggests provides

    multiple services for multiple clouds and services providers.  Cloud Broker reduces overhead for enterprise datacenters – adding simplicity of datacenter but abstracting the complexity that comes with datacenters.  CBs help you handle  Intermediation,  Aggregation and  Arbitrage  Some of the CB examples are  AWS Marketplace  Dell Boomi  Rackspace Cloud Tools Marketplace
  8. Resources Distributed Management Task Force  DMTF is enables effective

    management of IT Systems by creating standards for interoperable IT management.  DMTF Cloud Management Initiative is focused on interoperable cloud infrastructure management standards and promotion of such standards.  CIMI – Cloud Infrastructure Management Interface is an open standard is created by DMTF to enable interoperability with multiple clouds platforms. This is akin to LDAP
  9. Resources Open Data Center Alliance  Fosters Cloud Agility by

    focusing on SLAs and MSAs.  They are a unique consortium of leading global IT orgs to create standards and protocols for emerging data centers and cloud computing industries.  They also play the role of an evangelist to enable migration into cloud for the entire industry.
  10. Resources Cloud Standards Customer Council (CSCC)  They influence standards

    development based on cloud user requirements.  Its an end user advocacy group for adoption of cloud.  Champions use of Standards and Interoperability Decisions in the interest of end users.
  11. Resources Cloud Security Alliance (CSA)  CSA is a non

    for profit entity led by a broad coalition of industry practitioners, corporations and associations.  CSA focuses on audit and security standards for cloud computing.