Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Future of Ops

The Future of Ops

Traditional Operations isn’t going away, it’s just retooling. The move from on-premise to cloud means Ops, in the classical sense, is largely being outsourced to cloud providers. What’s left is a thin but crucial slice between cloud providers and the products built by development teams, encompassing infrastructure and deployment automation, configuration management, log management, and monitoring and instrumentation—all through the lens of self-service.

Join me as I share my vision for the future of Operations as an organizational competency and how it relates to DevOps. We will discuss where industry practices are headed while sharing some real-world stories—the good and the bad—of applying these practices at Workiva. The intended outcome of this talk is to leave listeners with a better understanding of what an effective modern engineering organization looks like, including patterns and best practices, and the path to reaching it. The end goal is an organization which delivers value to customers reliably, efficiently, and continuously.

Ops is dead, long live Ops!

Tyler Treat

April 13, 2018
Tweet

More Decks by Tyler Treat

Other Decks in Technology

Transcript

  1. @tyler_treat
    Tyler Treat • DevOpsDays Des Moines • 4/13/18
    The Future of Ops

    View full-size slide

  2. @tyler_treat
    Welcome to the

    world of tomorrow!

    View full-size slide

  3. @tyler_treat
    @tyler_treat

    View full-size slide

  4. @tyler_treat
    Tyler Treat
    Managing Partner @ Real Kinetic
    Former infrastructure engineering
    manager @ Workiva
    bravenewgeek.com

    View full-size slide

  5. @tyler_treat
    Data Center

    View full-size slide

  6. @tyler_treat
    Data Center
    Compute Network Storage

    View full-size slide

  7. @tyler_treat
    Data Center
    Compute Network Storage
    App Servers Security Backups/DR Monitoring

    View full-size slide

  8. @tyler_treat
    Data Center
    Compute Network Storage
    Help Desk Procurement Compliance
    App Servers Security Backups/DR Monitoring

    View full-size slide

  9. @tyler_treat
    Data Center
    Compute Network Storage
    Help Desk Procurement Compliance
    App Servers Security Backups/DR Monitoring
    App App App App App App App

    View full-size slide

  10. @tyler_treat
    Data Center
    Compute Network Storage
    Help Desk Procurement Compliance
    App Servers Security Backups/DR Monitoring

    View full-size slide

  11. @tyler_treat
    Data Center
    Compute Network Storage
    Help Desk Procurement Compliance
    App Servers Security Backups/DR Monitoring
    App App App App App App App
    Ops

    View full-size slide

  12. @tyler_treat
    Data Center
    Compute Network Storage
    Help Desk Procurement Compliance
    App Servers Security Backups/DR Monitoring
    App App App App App App App
    DevOps

    View full-size slide

  13. @tyler_treat
    App App App App App App App
    NoOps

    View full-size slide

  14. @tyler_treat
    App App App App App App App
    Infrastructure
    Automation
    Deployment
    Automation
    Configuration
    Management
    Log
    Management
    Monitoring
    NewOps

    View full-size slide

  15. @tyler_treat

    View full-size slide

  16. @tyler_treat
    DevOps is a journey, not a destination.

    View full-size slide

  17. @tyler_treat
    Manual Provisioning Self-Service
    The DevOps Scale of Automation

    View full-size slide

  18. @tyler_treat
    Manual Provisioning Self-Service
    Large Enterprise
    Small Startup

    View full-size slide

  19. @tyler_treat
    Scaling DevOps

    View full-size slide

  20. @tyler_treat
    @tyler_treat
    Why do silos form?

    View full-size slide

  21. @tyler_treat
    Many companies start with a
    “DevOps” approach.

    View full-size slide

  22. @tyler_treat
    Manual Provisioning Self-Service
    Large Enterprise
    Small Startup
    DevOps by Necessity

    Devs push to production, unstable,

    high-risk, minimal cost control

    View full-size slide

  23. @tyler_treat
    As the product scales,
    we specialize.
    @tyler_treat

    View full-size slide

  24. @tyler_treat
    As the business scales,

    we add safety checks.

    View full-size slide

  25. @tyler_treat
    Developers write
    code.

    View full-size slide

  26. @tyler_treat
    Ops people run it.

    View full-size slide

  27. @tyler_treat
    QA gets blamed for
    defects.

    View full-size slide

  28. @tyler_treat
    Security blocks
    everything.

    View full-size slide

  29. @tyler_treat
    And management
    wonders why nothing
    gets shipped.

    View full-size slide

  30. @tyler_treat
    Manual Provisioning Self-Service
    Large Enterprise
    Small Startup
    Ops as Gatekeepers

    Stable, cost-controlled, risk-averse,

    delivery and innovation bottleneck


    View full-size slide

  31. @tyler_treat
    Specialization is good!

    View full-size slide

  32. @tyler_treat
    Misalignment is not good.

    View full-size slide

  33. @tyler_treat
    How do we scale specialization?

    View full-size slide

  34. @tyler_treat
    Cross-functional
    teams?
    @tyler_treat

    View full-size slide

  35. @tyler_treat
    DevOps encourages cooperation!

    View full-size slide

  36. @tyler_treat
    Just add an ops
    engineer to each team.

    View full-size slide

  37. @tyler_treat
    And maybe a reliability
    engineer.

    View full-size slide

  38. @tyler_treat
    Maybe a few extra for
    on-call backup.

    View full-size slide

  39. @tyler_treat
    And of course we need
    a QA engineer too.

    View full-size slide

  40. @tyler_treat
    Done!

    View full-size slide

  41. @tyler_treat
    Also, $$$

    View full-size slide

  42. @tyler_treat
    @tyler_treat

    View full-size slide

  43. @tyler_treat
    How do we scale specialization?

    View full-size slide

  44. @tyler_treat
    Vision and Product

    View full-size slide

  45. @tyler_treat
    Vision: a mental
    image of what the
    future could be like.
    @tyler_treat

    View full-size slide

  46. @tyler_treat
    Vision enables independent
    decision making and alignment.

    View full-size slide

  47. @tyler_treat
    But vision without execution is
    just hallucination…

    View full-size slide

  48. @tyler_treat
    Products are how we scale execution.

    View full-size slide

  49. @tyler_treat

    View full-size slide

  50. @tyler_treat
    The
    evolution
    of QA
    Test-focused Tools-focused

    View full-size slide

  51. @tyler_treat
    The
    evolution
    of QA
    QA SDET
    “Combined”
    Engineering

    View full-size slide

  52. @tyler_treat
    Production
    CD Pipeline
    CI

    View full-size slide

  53. @tyler_treat
    QA teams are shrinking,
    but what’s growing are the
    teams building the tools.

    View full-size slide

  54. @tyler_treat
    The same is becoming true of Ops.

    View full-size slide

  55. @tyler_treat
    build/release/deploy
    configuration management
    infrastructure automation
    logging & instrumentation
    monitoring

    View full-size slide

  56. @tyler_treat
    By productizing our infrastructure,
    we scaled.

    View full-size slide

  57. @tyler_treat
    We controlled costs.

    View full-size slide

  58. @tyler_treat
    We reduced risk.

    View full-size slide

  59. @tyler_treat
    We accelerated development.

    View full-size slide

  60. @tyler_treat
    We delivered value to customers
    faster…

    View full-size slide

  61. @tyler_treat
    from 3 - 4 releases per year to
    multiple releases per day.

    View full-size slide

  62. @tyler_treat
    Rethinking Ops

    View full-size slide

  63. @tyler_treat

    View full-size slide

  64. @tyler_treat

    View full-size slide

  65. @tyler_treat
    Data Center
    Compute Network Storage
    Help Desk Procurement Compliance
    App Servers Security Backups/DR Monitoring
    App App App App App App App
    Wake me up if
    anything goes
    wrong here.
    Ops as Masters of Production

    View full-size slide

  66. @tyler_treat
    Data Center
    Compute Network Storage
    Help Desk Procurement Compliance
    App Servers Security Backups/DR Monitoring
    App App App App App App App
    Jim Bob’s
    Frobulator
    service is out
    of memory.
    Ops as Masters of Production

    View full-size slide

  67. @tyler_treat
    Manual Provisioning Self-Service
    Large Enterprise
    Small Startup
    PaaS

    Stable, cost-controlled, risk-averse,

    delivery enabler, innovation bottleneck

    View full-size slide

  68. @tyler_treat

    View full-size slide

  69. @tyler_treat
    Enable developers to self-service through tooling and
    automation and empower them to deploy and operate
    their services…
    @tyler_treat
    The Vision

    View full-size slide

  70. @tyler_treat
    “Here’s a CloudFormation
    template and access to
    production…”

    View full-size slide

  71. @tyler_treat
    Manual Provisioning Self-Service
    Large Enterprise
    Small Startup
    IaaS

    Devs provision infrastructure as code,

    free-for-all, cost explosion, high-risk,

    delivery and innovation enabler

    View full-size slide

  72. @tyler_treat
    Enable developers to self-service through tooling and
    automation and empower them to deploy and operate
    their services…
    @tyler_treat
    The Vision

    View full-size slide

  73. @tyler_treat
    Enable developers to self-service through tooling and
    automation and empower them to deploy and operate
    their services…with minimal Ops intervention.
    @tyler_treat
    The Vision

    View full-size slide

  74. @tyler_treat

    View full-size slide

  75. @tyler_treat

    View full-size slide

  76. @tyler_treat
    Enable developers to self-service through tooling and
    automation and empower them to deploy and operate
    their services…with minimal Ops intervention.
    @tyler_treat
    The Vision

    View full-size slide

  77. @tyler_treat
    App App App App App App App
    Infrastructure
    Automation
    Deployment
    Automation
    Configuration
    Management
    Log
    Management
    Monitoring
    Ops as Product Team

    View full-size slide

  78. @tyler_treat
    App App App App App App App
    Infrastructure
    Automation
    Deployment
    Automation
    Configuration
    Management
    Log
    Management
    Monitoring
    Products
    Ops as Product Team

    View full-size slide

  79. @tyler_treat
    Enable developers to self-service through tooling and
    automation and empower them to deploy and operate
    their services…with minimal Ops intervention.
    @tyler_treat
    The Vision

    View full-size slide

  80. @tyler_treat
    Pain-Driven Development:
    making locally optimal
    decisions to minimize pain.

    View full-size slide

  81. @tyler_treat
    Silos promote pain
    displacement.
    Product
    Development
    QA Ops

    View full-size slide

  82. @tyler_treat
    Silos promote pain
    displacement.
    Product
    Development
    QA Ops
    pain of running software
    pain of testing software
    pain of building software

    View full-size slide

  83. @tyler_treat
    Misaligned incentives!

    View full-size slide

  84. @tyler_treat
    How do you expect devs to care about
    quality if they’re not on the hook?

    View full-size slide

  85. @tyler_treat
    How do you expect devs to care about
    operability if they’re not on the hook?

    View full-size slide

  86. @tyler_treat
    Devs won’t build truly reliable systems
    until they are on-call for them.

    View full-size slide

  87. @tyler_treat
    BUT!

    View full-size slide

  88. @tyler_treat
    Responsibility requires empowerment.

    View full-size slide

  89. @tyler_treat
    You can’t ask someone to care about
    something and fix it without also
    giving them the power to do so.

    View full-size slide

  90. @tyler_treat
    Most Ops teams simply
    haven’t done enough to
    empower and offload
    responsibility onto dev teams.

    View full-size slide

  91. @tyler_treat
    Products enable ownership.

    View full-size slide

  92. @tyler_treat
    App App App App App App App
    Infrastructure
    Automation
    Deployment
    Automation
    Configuration
    Management
    Log
    Management
    Monitoring
    Products
    Ops as Product Team

    View full-size slide

  93. @tyler_treat
    App App App App App App App
    Infrastructure
    Automation
    Deployment
    Automation
    Configuration
    Management
    Log
    Management
    Monitoring
    Products
    The Frobulator
    service is out of
    memory…
    Since you are the
    Frobulator expert,
    here are these tools
    to help you
    diagnose and
    resolve the problem
    autonomously.
    Ops as Product Team

    View full-size slide

  94. @tyler_treat
    Enable developers to self-service through tooling and
    automation and empower them to deploy and operate
    their services…with minimal Ops intervention.
    @tyler_treat
    The Vision

    View full-size slide

  95. @tyler_treat
    Products maintain control
    through enablement.

    View full-size slide

  96. @tyler_treat
    Enable teams to follow best
    practices.

    View full-size slide

  97. @tyler_treat
    Best practices for builds.

    View full-size slide

  98. @tyler_treat
    Best practices for testing.

    View full-size slide

  99. @tyler_treat
    Best practices for deploys.

    View full-size slide

  100. @tyler_treat
    Best practices for support.

    View full-size slide

  101. @tyler_treat
    Best practices for compliance.

    View full-size slide

  102. @tyler_treat
    Encode compliance and SDLC
    requirements into tooling and process.

    View full-size slide

  103. @tyler_treat
    Snowflakes kill…

    Use pain-driven development
    to your advantage by creating
    paths of least resistance.

    View full-size slide

  104. @tyler_treat
    Teams must make a case for
    going off-menu.

    View full-size slide

  105. @tyler_treat
    Products in Practice

    View full-size slide

  106. @tyler_treat
    Build Release Deploy Operate

    View full-size slide

  107. @tyler_treat
    Build Release Deploy Operate

    View full-size slide

  108. @tyler_treat
    Code
    Repository
    Dev
    Push change to branch
    Review by
    Peers
    Build
    QA &
    Compliance
    Continuous Integration

    View full-size slide

  109. @tyler_treat
    @tyler_treat

    View full-size slide

  110. @tyler_treat
    @tyler_treat

    View full-size slide

  111. @tyler_treat
    @tyler_treat

    View full-size slide

  112. @tyler_treat

    View full-size slide

  113. @tyler_treat

    View full-size slide

  114. @tyler_treat

    View full-size slide

  115. @tyler_treat
    @tyler_treat

    View full-size slide

  116. @tyler_treat
    • Build plan part of the code, not baked into build tool

    • Dev teams fully control their builds

    • Deep integration with GitHub 

    • Build controls into the process
    Continuous Integration

    View full-size slide

  117. @tyler_treat
    Build Release Deploy Operate

    View full-size slide

  118. @tyler_treat
    Code
    Repository
    Dev
    Tag branch for release
    Build/QA
    Continuous Delivery
    Dev Artifact
    Repository
    Sign-Off
    Prod Artifact
    Repository
    Deploy

    View full-size slide

  119. @tyler_treat
    • Artifact build/tagging/promotion automation

    • Container/machine image auditing

    • Machine image and security patch automation

    • Streamlining sign-off from different parties
    Continuous Delivery

    View full-size slide

  120. @tyler_treat
    Build Release Deploy Operate

    View full-size slide

  121. @tyler_treat

    View full-size slide

  122. @tyler_treat

    View full-size slide

  123. @tyler_treat
    • Self-service deploys

    • Self-service configuration (with guard rails)

    • Infrastructure provisioning is automated

    • No ticket-driven development
    Continuous Deployment

    View full-size slide

  124. @tyler_treat
    Build Release Deploy Operate

    View full-size slide

  125. @tyler_treat
    • Logging

    - Structured logging spec

    - Language libs implementing spec

    - Log pipeline (i.e. agent, collector, storage, search)

    • Telemetry, tracing, health checks, alerting

    • Canary deploys, A/B testing, traffic shadowing, etc.
    Continuous Operations

    View full-size slide

  126. @tyler_treat
    Many off-the-shelf solutions just
    need “glued” together.

    View full-size slide

  127. @tyler_treat
    Most problems are cultural, not
    technical.

    View full-size slide

  128. @tyler_treat
    Technology will not fix your
    broken culture!

    View full-size slide

  129. @tyler_treat
    Solutions need to fit the company,
    its culture, and its architecture.

    View full-size slide

  130. @tyler_treat
    Get the workflow correct, start
    manual, then automate.

    View full-size slide

  131. @tyler_treat
    Wrapping Up

    View full-size slide

  132. @tyler_treat
    Specialization leads to misalignment
    and broken feedback loops.

    View full-size slide

  133. @tyler_treat
    But specialization is an important
    part of scaling a business.

    View full-size slide

  134. @tyler_treat
    The question is:

    how do we specialize?

    View full-size slide

  135. @tyler_treat
    The traditional Ops model does
    not scale.

    View full-size slide

  136. @tyler_treat
    DevOps is about tightening feedback
    loops and building empathy.

    View full-size slide

  137. @tyler_treat
    NewOps is about empowering teams
    and providing autonomy.

    View full-size slide

  138. @tyler_treat
    It’s not a replacement for DevOps,
    it’s an evolution of it.

    View full-size slide

  139. @tyler_treat
    It’s applying a product mindset to
    the traditional Ops model.

    View full-size slide

  140. @tyler_treat
    Ops teams should be redefining
    their vision:

    View full-size slide

  141. @tyler_treat
    from masters of production to
    enablers of production.

    View full-size slide

  142. @tyler_treat
    Ops capabilities should be
    embedded within dev teams…

    View full-size slide

  143. @tyler_treat
    but they need to be enabled!

    View full-size slide

  144. @tyler_treat

    View full-size slide

  145. @tyler_treat
    NewOps treats Ops like a product
    team whose product is infrastructure.

    View full-size slide

  146. @tyler_treat
    Creating guard rails, not walls.

    View full-size slide

  147. @tyler_treat
    Offloading responsibility helps
    correct and scale feedback loops.

    View full-size slide

  148. @tyler_treat
    Traditional Ops isn’t going away,
    it’s just getting a product manager.

    View full-size slide

  149. @tyler_treat
    Thanks!

    bravenewgeek.com
    realkinetic.com

    View full-size slide