Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ChefConf 2015 - Chef Retrospective

Avatar for gwaldo gwaldo
April 02, 2015

ChefConf 2015 - Chef Retrospective

With almost ten years of combined Chef experience, join H. "Waldo" Grunenwald from CommerceHub and Joe Nuspl from Workday for a short retrospective of our our Chef experiences at smaller companies.

CommerceHub is a monolithic Java-on-Windows shop moving towards Linux-hosted SOA.
Workday has more than 10,000 nodes across 11 physical data centers world wide plus external cloud providers.

Learn what worked for us, what didn't work, our triumphs, our defeats, and where we had pain and found dragons.

Avatar for gwaldo

gwaldo

April 02, 2015
Tweet

More Decks by gwaldo

Other Decks in Technology

Transcript

  1. Joe and I have come to Chef from drastically different

    places, and our working conditions are almost guaranteed to be different than yours, but here are some lessons that we’ve learned the long way.
  2. Workday A NYSE listed company (WDAY) that provides enterprise cloud

    applications for human capital management (HCM), payroll, financial management, recruiting, and analytics. J
  3. Workday Environment • 9 physical data centers world wide plus

    Amazon and HP cloud • 124 roles • 153 cookbooks • More than 10K servers under chef control • PCI and Regulatory compliance J
  4. Connect e-Retailers with Suppliers, providing drop-shipping services. Processed > 44

    million orders for the top online retailers in US & Canada (> $7B retail sales)
  5. CommerceHub Environment • Low-Thousands of VMs (VMware) • Mostly monolithic

    codebase • Java on Windows originally, now split w/ Ubuntu • Many Roles and (small) Envs
  6. Introduction of Chef • 0.8.2 in 2010 • Knew no

    ruby • Hired to apply engineering disciple to operations • Chef 11 in 2013 • Knew no ruby • Hired as DevOps / Automation Cheerleader CommerceHub Workday
  7. The Good @ Workday: SSH • 2FA ssh into the

    data center, then multi hop ssh to get the final machine • Wrote ssh wrapper that grabs PIN from SecurID.app and sets up ssh control masters and socks proxies along the way. • A VP regularly uses it to get access to some realtime performance dashboards. J
  8. The Good @ Workday: Jira Automation • Don’t just automate

    servers; automate workflow • Automate routine Jira / Confluence updates J
  9. Good at CommerceHub • Solid Infrastructure, ramping up spending •

    There was a lot of desire for improvements • People care • Some automation was already in place • exp. around Testing
  10. The Bad @ Workday: Chef Workarounds • cookbook_file resources would

    update the file every chef run. Used templates for everything. • search was slow and unreliable. ran knife exec scripts to collect the search data and stuff it into a data bag. • too much “convert this shell script into chef code” J W: Chef Search result order (“Sensunamis”)
  11. The Bad @ Workday: Community Cookbook Quality Variance • A

    majority assumed: • running ubuntu • Internet access • can compile code J W: I understand the Internet Access assumption, but not the code compiling one. Is it that you wouldn’t want to compile everything, but options for specifying a built package aren’t available? A bigger problem that I have with community cookbooks is that many simply don’t work. Ask about this on stage. ‘In fact, mcollective removes things like…'
  12. The Bad @ Workday: Not having a “gold standard cookbook”

    • Programmers tend to plagiarize. • It is encouraged as “code reuse” • People inevitably choose the worst example • Causing the crap to spread J
  13. Bad at CommerceHub • Key people wanted different things •

    Lots of “Key People” • “Can you automate this environment first?” • Gatekeepers • Little insight tooling (logging, metrics, alerting) • Surprise! Chef requires Engineering Effort
  14. It’s not a DevOpsey conference without a @littleidea quote. But

    seriously, it seems that some people thought “Hire a DevOp, and it’ll magically get better!”
  15. The Ugly @ Workday: Data Bag Misuse • Created the

    silo data bag to put data center specific overrides • Predated Chef::Environments • Grew out of control, 280K of json. J
  16. The Ugly @ Workday • Not being tightly integrated with

    the rest of the Infrastructure team • Not creating build pipeline sooner • Not creating easy-to-use test environments sooner • Occasional excessive logic in Templates • We were lacking clear “Gold Standard” Cookbook design example. J Not tooting our own horn
  17. Ugly at CommerceHub • Developers sometimes uninterested in Chef/Ruby “Ops

    Work” • Not establishing opinions early (TIMTOWTDI) • Many small Environments • Many teams solving the same problems*
  18. Ugly at CommerceHub • Resistance to Include Ops Eng work

    in timeframes • Aligning People + Interest + Time/ Opportunity/Dollars • Berkshelf and Testing are late additions to Chef workflow
  19. What do you call… A group of Wolves? A group

    of Crows? A group of Developers? a Pack a Murder a Merge Conflict
  20. Why Resistant to Change? • You’re going to automate me

    out of a job • I inherited the pile of crap, I don’t understand how it works, so if you break it I won’t be able to fix this. • If it ain’t broke, don’t fix it. (or “I made this pile of crap. Don’t change it.”) • Damn it Jim, I’m sys admin not a programmer. • Used to Ops being invisible.
  21. Resistant to Change • “I’d just have to verify that

    it worked anyway.” • Overemphasis on Standardization and Consensus. • The people know the processes. They made them. • “I don’t trust code.” • “It’ll take longer to do the automation than the work.”
  22. Friction • Status Quo • Language • Common Idioms •

    “I have to learn Ruby?!” • Analysis Paralysis • Training, because Learning Curve • “Windows Support*”
  23. What could we have done better? • Lots of things

    • Identify the goals of your org & make them: • See the light • Enter the light • And shine • Fight the Silver Bullet mentality J
  24. What could we have done better? • Be more explicit

    about engineering effort involved. (It’s software engineering) • Chef is powerful, but not always the best tool for the job. • Identify as part of a skill and job promotion. W:
  25. What could we have done better? • More Explicit about

    code-reviews. • Be more opinionated early-on. • Testing up-front. W:
  26. Wins • Consistency • No more snowflake hunts • Mitigating

    environment differences • Capacity additions made easy • Facilitating Services split-outs We don’t want everything sound too dour, because Chef has been a huge win for us. None of these are news, but we’re so close to Chef that they can become so familiar as to become invisible.
  27. Wins • Gateway drug to automation-addiction • People Upgrades •

    Bringing visibility of Operations work • Reduction of “Works on my machine” rage We don’t want everything sound too dour, because Chef has been a huge win for us. None of these are news, but we’re so close to Chef that they can become so familiar as to become invisible.
  28. Request #1: “Best Practices” We’re often asked for “Best Practices”,

    but people see things like this. Their reaction is…
  29. …and sometimes they wonder if we know what we’re doing.

    Having strong feelings leads to the original problem when they see an opposing view. (ROLES STAHP)
  30. Solution #1 “Recommended Practices” • Present Options/Views of a subject

    (e.g. Roles) • Explain pros & cons of the approach. • “If your environment looks like ABC, this may make sense for you.” • Reviewed periodically, and describe changes visibly. So, let’s give it to them.
  31. “Can you take a look at something? I can’t figure

    out why the value isn’t $val.” This is where I take them through the process of figuring out what values are being set, and where in the order they fit. This is time-consuming. And I often come down to showing them this:
  32. https://docs.chef.io/ attributes.html#attribute-precedence I love this page. It gives new Chefs

    hives. 15 attribute levels. But you want to help, so you sit down.
  33. “WHY WOULD YOU DO THAT?!” (I’d want to scream) What

    I’d like to see is something like this:
  34. Solution #2 `knife  (…)  inspect  (…)` The process to determine

    what value is set. Let’s make it a little more verbose.
  35. Request #3: Windows Look, I love this community. And I

    honestly don’t hate Windows. But Chef-on-Windows has not been great this last 2 years.
  36. W: Finally, a Plea to Chef: Chef is not our

    job. Our priorities are not the same. Asking for empathy and patience, and we’ll give you the same.
  37. Introducing Sous Chef https://github.com/commercehub-oss/sous_chef/ (not an official logo) Work of

    Larry Zarou, this is a cookbook to help you set up a cookbook-testing pipeline.