Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What I Did On My Cloud Vacation

What I Did On My Cloud Vacation

Summary after first six months of work on HUIT Cloud Engineering team

Steve Huff

March 03, 2014
Tweet

More Decks by Steve Huff

Other Decks in Programming

Transcript

  1. • Tools/Outreach – Nepho (current release is 1.2.2) – Consulting

    (IAM, SIS, others) – Blog, lunch talks, office hours
  2. • Tools/Outreach – Nepho (current release is 1.2.2) – Consulting

    (IAM, SIS, others) – Blog, lunch talks, office hours • Research IaaS Cloud – OpenStack on commodity hardware – Implementation by HUIT Cloud Engineering – Demo of Hadoop-aaS for James Cuff and Orran Krieger (BU)
  3. • Tools/Outreach – Nepho (current release is 1.2.2) – Consulting

    (IAM, SIS, others) – Blog, lunch talks, office hours • Research IaaS Cloud – OpenStack on commodity hardware – Implementation by HUIT Cloud Engineering – Demo of Hadoop-aaS for James Cuff and Orran Krieger (BU) • Enterprise IaaS Cloud – RHEL OpenStack on Cisco UCS – Implementation by Red Hat – Pilot for Dev/Test use by enterprise customers
  4. • Tools/Outreach – Nepho (current release is 1.2.2) – Consulting

    (IAM, SIS, others) – Blog, lunch talks, office hours • Research IaaS Cloud – OpenStack on commodity hardware – Implementation by HUIT Cloud Engineering – Demo of Hadoop-aaS for James Cuff and Orran Krieger (BU) • Web Application PaaS – Selected Red Hat OpenShift – Straightforward implementation path on OpenStack – Better solution for standalone web hosting • Enterprise IaaS Cloud – RHEL OpenStack on Cisco UCS – Implementation by Red Hat – Pilot for Dev/Test use by enterprise customers
  5. • Interdisciplinary teams of subject matter experts – Real-time collaboration

    – Rapid, incremental peer review – Toolchain sharing, “sharpening the saw”
  6. • Interdisciplinary teams of subject matter experts – Real-time collaboration

    – Rapid, incremental peer review – Toolchain sharing, “sharpening the saw” • Automation is a tremendous force multiplier – 10% is better than nothing – 90% is not that much better than 10% – 100% is MUCH better
  7. • Interdisciplinary teams of subject matter experts – Real-time collaboration

    – Rapid, incremental peer review – Toolchain sharing, “sharpening the saw” • Automation is a tremendous force multiplier – 10% is better than nothing – 90% is not that much better than 10% – 100% is MUCH better • Interruptions are terribly expensive – Engineering vs. operations work – Maker time vs. manager time – Who will you allocate?
  8. • Interdisciplinary teams of subject matter experts – Real-time collaboration

    – Rapid, incremental peer review – Toolchain sharing, “sharpening the saw” • Automation is a tremendous force multiplier – 10% is better than nothing – 90% is not that much better than 10% – 100% is MUCH better • Build the system you want to work with – Deprecate legacy solutions – Don't let the edge case dictate policy – Fail fast • Interruptions are terribly expensive – Engineering vs. operations work – Maker time vs. manager time – Who will you allocate?
  9. • Managing an IaaS platform is like managing a large,

    homogenous, integrated application – Enterprise support from vendor – Much higher degree of consistency/standardization/simplification – Automated management is not optional – Redundancy and fault tolerance is not optional
  10. • Managing an IaaS platform is like managing a large,

    homogenous, integrated application – Enterprise support from vendor – Much higher degree of consistency/standardization/simplification – Automated management is not optional – Redundancy and fault tolerance is not optional • Managing instances on an IaaS platform is just like managing instances in the traditional datacenter – EXCEPT that we do everything programmatically – Think of instances as ephemeral – Marginal costs of additional instances/storage/network are minimal – Hardware specifications of instances are immutable – Load balancing for fault tolerance and flexibility is pervasive
  11. • Allocate resources – We cannot do this work in

    5 minutes here, 5 minutes there – 100% utilization means there is no time to learn new skills – Sending everyone to training doesn't accomplish anything if they all go back to the daily treadmill
  12. • Allocate resources – We cannot do this work in

    5 minutes here, 5 minutes there – 100% utilization means there is no time to learn new skills – Sending everyone to training doesn't accomplish anything if they all go back to the daily treadmill • Stop doing things by hand – Perform a searching and fearless inventory of your operational procedures – Become familiar with automation technologies – We are all developers now
  13. • Allocate resources – We cannot do this work in

    5 minutes here, 5 minutes there – 100% utilization means there is no time to learn new skills – Sending everyone to training doesn't accomplish anything if they all go back to the daily treadmill • Stop doing things by hand – Perform a searching and fearless inventory of your operational procedures – Become familiar with automation technologies – We are all developers now • Designate two team members in each team to become subject matter experts – Identify the AWS offerings that correspond to the work your team currently does – Send your SMEs to training, make time and space for them to learn – Hold them accountable for mastering the material and training the rest of the team