Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Growing internal tooling from the console up

Growing internal tooling from the console up

Your site was built for your external customers first. Data or workflow problems are solved on the Rails console.

But, two years in, your app has grown. Identifying, researching, and fixing those data and workflow problems takes more of your time and attention. It frustrates your business stakeholders, your customers and, of course, you.

This talk will look at a Rails-based web store–including inventory, payment processing, fraud mitigation and customer notifications–and explore how we can build tools into our apps to discover when things go sideways and then help get things back on track.

Nathan L. Walls

April 30, 2019
Tweet

More Decks by Nathan L. Walls

Other Decks in Technology

Transcript

  1. Growing internal tooling from the console up Nathan L. Walls

    // RailsConf 2019 https://wallscorp.us/ https://twitter.com/base10
  2. Q: How old is your codebase? • rails init •

    Less than six months • Six months to 18 months
  3. Q: How old is your codebase? • rails init •

    Less than six months • Six months to 18 months • 18 months to 3 years
  4. Q: How old is your codebase? • rails init •

    Less than six months • Six months to 18 months • 18 months to 3 years • More than 3 years
  5. Central Questions • When is a good moment for a

    team to start making their internal life better?
  6. Central Questions • When is a good moment for a

    team to start making their internal life better? • What might that look like?
  7. Central Questions • When is a good moment for a

    team to start making their internal life better? • What might that look like? • Who should advocate for it and who should do it?
  8. About me • Senior developer/team lead-ish for an education-focused ebook

    store • My team builds the store on a Rails codebase
  9. About me • Senior developer/team lead-ish for an education-focused ebook

    store • My team builds the store on a Rails codebase • One of the three production key holders
  10. About me • Senior developer/team lead-ish for an education-focused ebook

    store • My team builds the store on a Rails codebase • One of the three production key holders • I review a lot of code
  11. The Stargate application • Rails 5.1/Ruby 2.5 • React •

    MySQL + Redis • Sidekiq • Kubernetes on Google Cloud
  12. The Stargate application • Rails 5.1/Ruby 2.5 • React •

    MySQL + Redis • Sidekiq • Kubernetes on Google Cloud • Right where Rails’ sweet spot is
  13. About me • I spend a lot of time helping

    my team* • Alternately stated: I spend a lot of time finding the answers to questions they can’t find on their own
  14. About me • I spend a lot of time helping

    my team* • Alternately stated: I spend a lot of time finding the answers to questions they can’t find on their own • I’m a bottleneck of knowledge and access
  15. About Brent from The Phoenix Project • Brent is a

    focal point of multi-team dependencies
  16. About Brent from The Phoenix Project • Brent is a

    focal point of multi-team dependencies • Brent can’t focus, because Brent either seeks out or is pulled into emergent situations
  17. About Brent from The Phoenix Project • Brent is a

    focal point of multi-team dependencies • Brent can’t focus, because Brent either seeks out or is pulled into emergent situations • Brent is a near full-time firefighter vs. being a mentor
  18. What we are going to cover • Some overarching goals

    • World-building • Identifying pain points • Approaching problem solving • The command-line and the Rails console
  19. What we are going to cover • Initial automation /

    notifications • Reevaluating pain points
  20. What we are going to cover • Initial automation /

    notifications • Reevaluating pain points • Administrative frameworks
  21. What we are going to cover • Initial automation /

    notifications • Reevaluating pain points • Administrative frameworks • Building your own tooling
  22. Goals • Make problems easier to see, evaluate and act

    upon for the entire team • Limit “keyholder”-specific tools to rarely needed and/or higher-risk solutions
  23. Goals • Make problems easier to see, evaluate and act

    upon for the entire team • Limit “keyholder”-specific tools to rarely needed and/or higher-risk solutions • Develop, observe, evaluate, iterate
  24. Goals • Less involvement of necessity in emergent situations •

    Facilitate having fewer emergent situations overall
  25. Goals • Less involvement of necessity in emergent situations •

    Facilitate having fewer emergent situations overall • Redirect “helping my team” into mentorship
  26. Goals • Don’t change Brent’s name • No one else

    on my team should fill the role of Brent
  27. Caveats • This is all “Work in Progress” • I’m

    OK with exploring ideas that make our life and experiences better • Everything about this is and will be iterative
  28. About the problem space • The codebase is about four

    years old • Started with a consulting team of implementing engineers • Team has substantially cycled over and grown • Development efforts focused on implementing sales- focused features and solving external problems
  29. About the problem space • Production access is limited •

    We have notifications for some automated jobs, but not all of them • Production access is required to determine the state of: • Automated jobs • Generated artifacts
  30. About the problem space • Problem-solving involves a lot of

    ad-hoc Rails console or database digging • There are only three production keyholders
  31. Operational Needs • Resetting stale data • Investigating error states

    of transactions • Troubleshooting and restarting failed jobs • Finding and verifying artifacts
  32. How this manifests • Because of the knowledge bottlenecks, I

    spend time context switching or identifying answers to questions instead of: • Feature work • Proactive technical debt pay-down
  33. Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing • Figuring out why particular items aren’t in store inventory when the business thinks they ought to be
  34. Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing • Figuring out why particular items aren’t in store inventory when the business thinks they ought to be • Determining how pervasive a possible payment transaction problem is
  35. Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing • Figuring out why particular items aren’t in store inventory when the business thinks they ought to be • Determining how pervasive a possible payment transaction problem is • Email notification troubleshooting
  36. Sample issues • DDOS • Overly-aggressive site crawling from search

    indexes • Fraudulent purchases • General issues with an online store
  37. Sample issues • DDOS • Overly-aggressive site crawling from search

    indexes • Fraudulent purchases • General issues with an online store • What’s one-off vs. ongoing and cyclical?
  38. Approach the issues iteratively • What defines the issue? •

    Who has to address the issue now? • Who could potentially address the issue instead?
  39. Approach the issues iteratively • How might solving this issue

    be easier? • How might this issue be easier to spot?
  40. Improving resilience • Make expensive things more fault tolerant •

    Make it as easy as practical to recover from failed jobs
  41. Involve your team • Socialize the issues • Review and

    iterate solutions • Don’t be the only person
  42. Example of team involvement • Point Developer • Technical triage

    as needed for bugs • General question answerer • Focused point for interruption • Can be iteratively improve internal tooling
  43. What’s the console good for? • Investigating data and state

    changes • Trialing and applying one-off fixes • Calling-up data and running new or existing methods • Running SQL queries (no, seriously)
  44. Why SQL from the Rails console? • You may not

    have access to a database console • You can intermix Rails objects and SQL queries
  45. How we use the console • Looking at error states

    on pending transactions or failed purchases
  46. How we use the console • Looking at error states

    on pending transactions or failed purchases • Examining the state of database-backed inventory processing jobs
  47. How we use the console • Looking at error states

    on pending transactions or failed purchases • Examining the state of database-backed inventory processing jobs • Verifying inventory availability per database scopes
  48. How we use the console • Looking at error states

    on pending transactions or failed purchases • Examining the state of database-backed inventory processing jobs • Verifying inventory availability per database scopes • Flipping feature flags
  49. Rails console benefits • It’s ad-hoc! • Access to your

    scopes, prod data • Reopen classes, write new methods on a trial basis
  50. Rails console benefits • It’s ad-hoc! • Access to your

    scopes, prod data • Reopen classes, write new methods on a trial basis • You probably already have access to it on a prod server (k8s, Heroku, a VM, etc.)
  51. Rails console caveats • Limited to production keyholders • You’re

    live in production • Ad-hoc solutions aren’t saved
  52. Rails console caveats • Limited to production keyholders • You’re

    live in production • Ad-hoc solutions aren’t saved • Not great for review, visibility, auditability
  53. Safe use of the production console • Plan your actions

    in a non-prod environment • Review your action plan with a peer
  54. Safe use of the production console • Plan your actions

    in a non-prod environment • Review your action plan with a peer • Inform your team of your intentions
  55. Safe use of the production console • Plan your actions

    in a non-prod environment • Review your action plan with a peer • Inform your team of your intentions • What? How? Why? When?
  56. Safe use of the production console • Work from a

    script • Log out of production when you’re task complete
  57. Automate what you can, make it visible • We started

    off and we had some big automated tasks
  58. Automate what you can, make it visible • We started

    off and we had some big automated tasks • Sometimes, these tasks would not complete or require changes
  59. Automate what you can, make it visible • We started

    off and we had some big automated tasks • Sometimes, these tasks would not complete or require changes • Some tasks are time-sensitive
  60. Automate what you can, make it visible • We started

    off and we had some big automated tasks • Sometimes, these tasks would not complete or require changes • Some tasks are time-sensitive • We’ve made some of it visible
  61. Notification Benefits • Increase visibility of your system • Socialize

    the location and purpose of the notifications • Chat notifications (e.g. Slack) is a pretty low barrier • Can cover both automated jobs and live events
  62. Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy • The map is not the territory
  63. Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy • The map is not the territory • Do you have the right things notifying?
  64. Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy • The map is not the territory • Do you have the right things notifying? • Do you have the right things not notifying?
  65. Automation • Take typical console tasks and make them rake

    tasks • When? • As soon as you detect the pattern
  66. Benefits • Increase the visibility of your data inside of

    your app • Uncouple answering questions with production data from production systems-level access
  67. Benefits • Increase the visibility of your data inside of

    your app • Uncouple answering questions with production data from production systems-level access • Solves a good amount with not a lot of work
  68. Benefits • Increase the visibility of your data inside of

    your app • Uncouple answering questions with production data from production systems-level access • Solves a good amount with not a lot of work • Allows for some customization
  69. Caveats • Needs to be worked into your authorization and

    authentication schemes • It’s a sizable addition to your existing app
  70. Caveats • Needs to be worked into your authorization and

    authentication schemes • It’s a sizable addition to your existing app • It likely won’t cover everything you need, particularly in specialized cases
  71. Why build your own? • Not limited by the structure

    an admin framework • A better approach for building more complex workflows
  72. Central Questions, revisited • When is a good moment for

    a team to start making their internal life better?
  73. Central Questions, revisited • When is a good moment for

    a team to start making their internal life better? • What might that look like?
  74. Central Questions, revisited • When is a good moment for

    a team to start making their internal life better? • What might that look like? • Who should advocate for it and who should do it?
  75. Additional questions • What is the time commitment? • How

    often should we review? • Who should be involved?
  76. Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford • Work Clean, Dan Charnas
  77. Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford • Work Clean, Dan Charnas • The Nature Fix, Florence Williams
  78. Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford • Work Clean, Dan Charnas • The Nature Fix, Florence Williams • Greater Than Code Podcast