Growing internal tooling from the console up

Growing internal tooling from the console up

Your site was built for your external customers first. Data or workflow problems are solved on the Rails console.

But, two years in, your app has grown. Identifying, researching, and fixing those data and workflow problems takes more of your time and attention. It frustrates your business stakeholders, your customers and, of course, you.

This talk will look at a Rails-based web store–including inventory, payment processing, fraud mitigation and customer notifications–and explore how we can build tools into our apps to discover when things go sideways and then help get things back on track.

1932c0ac21b6792f2572d8e56f84a1c2?s=128

Nathan L. Walls

April 30, 2019
Tweet

Transcript

  1. 1.

    Growing internal tooling from the console up Nathan L. Walls

    // RailsConf 2019 https://wallscorp.us/ https://twitter.com/base10
  2. 5.

    Q: How old is your codebase? • rails init •

    Less than six months • Six months to 18 months
  3. 6.

    Q: How old is your codebase? • rails init •

    Less than six months • Six months to 18 months • 18 months to 3 years
  4. 7.

    Q: How old is your codebase? • rails init •

    Less than six months • Six months to 18 months • 18 months to 3 years • More than 3 years
  5. 9.

    Central Questions • When is a good moment for a

    team to start making their internal life better?
  6. 10.

    Central Questions • When is a good moment for a

    team to start making their internal life better? • What might that look like?
  7. 11.

    Central Questions • When is a good moment for a

    team to start making their internal life better? • What might that look like? • Who should advocate for it and who should do it?
  8. 12.
  9. 14.

    About me • Senior developer/team lead-ish for an education-focused ebook

    store • My team builds the store on a Rails codebase
  10. 15.

    About me • Senior developer/team lead-ish for an education-focused ebook

    store • My team builds the store on a Rails codebase • One of the three production key holders
  11. 16.

    About me • Senior developer/team lead-ish for an education-focused ebook

    store • My team builds the store on a Rails codebase • One of the three production key holders • I review a lot of code
  12. 17.
  13. 19.
  14. 20.
  15. 21.
  16. 27.

    The Stargate application • Rails 5.1/Ruby 2.5 • React •

    MySQL + Redis • Sidekiq • Kubernetes on Google Cloud
  17. 28.

    The Stargate application • Rails 5.1/Ruby 2.5 • React •

    MySQL + Redis • Sidekiq • Kubernetes on Google Cloud • Right where Rails’ sweet spot is
  18. 29.
  19. 31.

    About me • I spend a lot of time helping

    my team* • Alternately stated: I spend a lot of time finding the answers to questions they can’t find on their own
  20. 32.

    About me • I spend a lot of time helping

    my team* • Alternately stated: I spend a lot of time finding the answers to questions they can’t find on their own • I’m a bottleneck of knowledge and access
  21. 35.

    About Brent from The Phoenix Project • Brent is a

    focal point of multi-team dependencies
  22. 36.

    About Brent from The Phoenix Project • Brent is a

    focal point of multi-team dependencies • Brent can’t focus, because Brent either seeks out or is pulled into emergent situations
  23. 37.

    About Brent from The Phoenix Project • Brent is a

    focal point of multi-team dependencies • Brent can’t focus, because Brent either seeks out or is pulled into emergent situations • Brent is a near full-time firefighter vs. being a mentor
  24. 40.

    What we are going to cover • Some overarching goals

    • World-building • Identifying pain points • Approaching problem solving • The command-line and the Rails console
  25. 43.

    What we are going to cover • Initial automation /

    notifications • Reevaluating pain points
  26. 44.

    What we are going to cover • Initial automation /

    notifications • Reevaluating pain points • Administrative frameworks
  27. 45.

    What we are going to cover • Initial automation /

    notifications • Reevaluating pain points • Administrative frameworks • Building your own tooling
  28. 46.
  29. 47.
  30. 49.

    Goals • Make problems easier to see, evaluate and act

    upon for the entire team • Limit “keyholder”-specific tools to rarely needed and/or higher-risk solutions
  31. 50.

    Goals • Make problems easier to see, evaluate and act

    upon for the entire team • Limit “keyholder”-specific tools to rarely needed and/or higher-risk solutions • Develop, observe, evaluate, iterate
  32. 51.
  33. 53.

    Goals • Less involvement of necessity in emergent situations •

    Facilitate having fewer emergent situations overall
  34. 54.

    Goals • Less involvement of necessity in emergent situations •

    Facilitate having fewer emergent situations overall • Redirect “helping my team” into mentorship
  35. 55.

    Goals • Don’t change Brent’s name • No one else

    on my team should fill the role of Brent
  36. 56.

    Caveats • This is all “Work in Progress” • I’m

    OK with exploring ideas that make our life and experiences better • Everything about this is and will be iterative
  37. 58.

    About the problem space • The codebase is about four

    years old • Started with a consulting team of implementing engineers • Team has substantially cycled over and grown • Development efforts focused on implementing sales- focused features and solving external problems
  38. 59.

    About the problem space • Production access is limited •

    We have notifications for some automated jobs, but not all of them • Production access is required to determine the state of: • Automated jobs • Generated artifacts
  39. 60.

    About the problem space • Problem-solving involves a lot of

    ad-hoc Rails console or database digging • There are only three production keyholders
  40. 63.

    Operational Needs • Resetting stale data • Investigating error states

    of transactions • Troubleshooting and restarting failed jobs • Finding and verifying artifacts
  41. 65.

    How this manifests • Because of the knowledge bottlenecks, I

    spend time context switching or identifying answers to questions instead of: • Feature work • Proactive technical debt pay-down
  42. 69.

    Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing • Figuring out why particular items aren’t in store inventory when the business thinks they ought to be
  43. 70.

    Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing • Figuring out why particular items aren’t in store inventory when the business thinks they ought to be • Determining how pervasive a possible payment transaction problem is
  44. 71.

    Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing • Figuring out why particular items aren’t in store inventory when the business thinks they ought to be • Determining how pervasive a possible payment transaction problem is • Email notification troubleshooting
  45. 76.

    Sample issues • DDOS • Overly-aggressive site crawling from search

    indexes • Fraudulent purchases • General issues with an online store
  46. 77.

    Sample issues • DDOS • Overly-aggressive site crawling from search

    indexes • Fraudulent purchases • General issues with an online store • What’s one-off vs. ongoing and cyclical?
  47. 79.

    Approach the issues iteratively • What defines the issue? •

    Who has to address the issue now? • Who could potentially address the issue instead?
  48. 80.

    Approach the issues iteratively • How might solving this issue

    be easier? • How might this issue be easier to spot?
  49. 81.
  50. 82.

    Improving resilience • Make expensive things more fault tolerant •

    Make it as easy as practical to recover from failed jobs
  51. 86.

    Involve your team • Socialize the issues • Review and

    iterate solutions • Don’t be the only person
  52. 87.

    Example of team involvement • Point Developer • Technical triage

    as needed for bugs • General question answerer • Focused point for interruption • Can be iteratively improve internal tooling
  53. 88.
  54. 89.
  55. 92.

    What’s the console good for? • Investigating data and state

    changes • Trialing and applying one-off fixes • Calling-up data and running new or existing methods • Running SQL queries (no, seriously)
  56. 93.

    Why SQL from the Rails console? • You may not

    have access to a database console • You can intermix Rails objects and SQL queries
  57. 95.

    How we use the console • Looking at error states

    on pending transactions or failed purchases
  58. 96.

    How we use the console • Looking at error states

    on pending transactions or failed purchases • Examining the state of database-backed inventory processing jobs
  59. 97.

    How we use the console • Looking at error states

    on pending transactions or failed purchases • Examining the state of database-backed inventory processing jobs • Verifying inventory availability per database scopes
  60. 98.

    How we use the console • Looking at error states

    on pending transactions or failed purchases • Examining the state of database-backed inventory processing jobs • Verifying inventory availability per database scopes • Flipping feature flags
  61. 103.

    Rails console benefits • It’s ad-hoc! • Access to your

    scopes, prod data • Reopen classes, write new methods on a trial basis
  62. 104.

    Rails console benefits • It’s ad-hoc! • Access to your

    scopes, prod data • Reopen classes, write new methods on a trial basis • You probably already have access to it on a prod server (k8s, Heroku, a VM, etc.)
  63. 108.

    Rails console caveats • Limited to production keyholders • You’re

    live in production • Ad-hoc solutions aren’t saved
  64. 109.

    Rails console caveats • Limited to production keyholders • You’re

    live in production • Ad-hoc solutions aren’t saved • Not great for review, visibility, auditability
  65. 112.

    Safe use of the production console • Plan your actions

    in a non-prod environment • Review your action plan with a peer
  66. 113.

    Safe use of the production console • Plan your actions

    in a non-prod environment • Review your action plan with a peer • Inform your team of your intentions
  67. 114.

    Safe use of the production console • Plan your actions

    in a non-prod environment • Review your action plan with a peer • Inform your team of your intentions • What? How? Why? When?
  68. 117.

    Safe use of the production console • Work from a

    script • Log out of production when you’re task complete
  69. 118.
  70. 122.

    Automate what you can, make it visible • We started

    off and we had some big automated tasks
  71. 123.

    Automate what you can, make it visible • We started

    off and we had some big automated tasks • Sometimes, these tasks would not complete or require changes
  72. 124.

    Automate what you can, make it visible • We started

    off and we had some big automated tasks • Sometimes, these tasks would not complete or require changes • Some tasks are time-sensitive
  73. 125.

    Automate what you can, make it visible • We started

    off and we had some big automated tasks • Sometimes, these tasks would not complete or require changes • Some tasks are time-sensitive • We’ve made some of it visible
  74. 128.

    Notification Benefits • Increase visibility of your system • Socialize

    the location and purpose of the notifications • Chat notifications (e.g. Slack) is a pretty low barrier • Can cover both automated jobs and live events
  75. 130.
  76. 133.

    Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy • The map is not the territory
  77. 134.

    Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy • The map is not the territory • Do you have the right things notifying?
  78. 135.

    Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy • The map is not the territory • Do you have the right things notifying? • Do you have the right things not notifying?
  79. 141.
  80. 144.

    Automation • Take typical console tasks and make them rake

    tasks • When? • As soon as you detect the pattern
  81. 148.
  82. 149.
  83. 150.
  84. 151.
  85. 152.
  86. 153.
  87. 154.
  88. 155.
  89. 156.
  90. 158.

    Benefits • Increase the visibility of your data inside of

    your app • Uncouple answering questions with production data from production systems-level access
  91. 159.

    Benefits • Increase the visibility of your data inside of

    your app • Uncouple answering questions with production data from production systems-level access • Solves a good amount with not a lot of work
  92. 160.

    Benefits • Increase the visibility of your data inside of

    your app • Uncouple answering questions with production data from production systems-level access • Solves a good amount with not a lot of work • Allows for some customization
  93. 161.
  94. 163.

    Caveats • Needs to be worked into your authorization and

    authentication schemes • It’s a sizable addition to your existing app
  95. 164.

    Caveats • Needs to be worked into your authorization and

    authentication schemes • It’s a sizable addition to your existing app • It likely won’t cover everything you need, particularly in specialized cases
  96. 166.

    Why build your own? • Not limited by the structure

    an admin framework • A better approach for building more complex workflows
  97. 167.
  98. 170.
  99. 171.
  100. 172.
  101. 173.
  102. 174.
  103. 176.

    Central Questions, revisited • When is a good moment for

    a team to start making their internal life better?
  104. 177.

    Central Questions, revisited • When is a good moment for

    a team to start making their internal life better? • What might that look like?
  105. 178.

    Central Questions, revisited • When is a good moment for

    a team to start making their internal life better? • What might that look like? • Who should advocate for it and who should do it?
  106. 182.

    Additional questions • What is the time commitment? • How

    often should we review? • Who should be involved?
  107. 183.
  108. 184.
  109. 186.

    Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford • Work Clean, Dan Charnas
  110. 187.

    Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford • Work Clean, Dan Charnas • The Nature Fix, Florence Williams
  111. 188.

    Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford • Work Clean, Dan Charnas • The Nature Fix, Florence Williams • Greater Than Code Podcast