Growing internal tooling from the console up

Growing internal tooling from the console up

Your site was built for your external customers first. Data or workflow problems are solved on the Rails console.

But, two years in, your app has grown. Identifying, researching, and fixing those data and workflow problems takes more of your time and attention. It frustrates your business stakeholders, your customers and, of course, you.

This talk will look at a Rails-based web store–including inventory, payment processing, fraud mitigation and customer notifications–and explore how we can build tools into our apps to discover when things go sideways and then help get things back on track.

1932c0ac21b6792f2572d8e56f84a1c2?s=128

Nathan L. Walls

April 30, 2019
Tweet

Transcript

  1. Growing internal tooling from the console up Nathan L. Walls

    // RailsConf 2019 https://wallscorp.us/ https://twitter.com/base10
  2. Q: How old is your codebase?

  3. Q: How old is your codebase? • rails init

  4. Q: How old is your codebase? • rails init •

    Less than six months
  5. Q: How old is your codebase? • rails init •

    Less than six months • Six months to 18 months
  6. Q: How old is your codebase? • rails init •

    Less than six months • Six months to 18 months • 18 months to 3 years
  7. Q: How old is your codebase? • rails init •

    Less than six months • Six months to 18 months • 18 months to 3 years • More than 3 years
  8. Central Questions

  9. Central Questions • When is a good moment for a

    team to start making their internal life better?
  10. Central Questions • When is a good moment for a

    team to start making their internal life better? • What might that look like?
  11. Central Questions • When is a good moment for a

    team to start making their internal life better? • What might that look like? • Who should advocate for it and who should do it?
  12. About me

  13. About me • Senior developer/team lead-ish for an education-focused ebook

    store
  14. About me • Senior developer/team lead-ish for an education-focused ebook

    store • My team builds the store on a Rails codebase
  15. About me • Senior developer/team lead-ish for an education-focused ebook

    store • My team builds the store on a Rails codebase • One of the three production key holders
  16. About me • Senior developer/team lead-ish for an education-focused ebook

    store • My team builds the store on a Rails codebase • One of the three production key holders • I review a lot of code
  17. None
  18. Internally, we call it Stargate

  19. None
  20. None
  21. None
  22. The Stargate application

  23. The Stargate application • Rails 5.1/Ruby 2.5

  24. The Stargate application • Rails 5.1/Ruby 2.5 • React

  25. The Stargate application • Rails 5.1/Ruby 2.5 • React •

    MySQL + Redis
  26. The Stargate application • Rails 5.1/Ruby 2.5 • React •

    MySQL + Redis • Sidekiq
  27. The Stargate application • Rails 5.1/Ruby 2.5 • React •

    MySQL + Redis • Sidekiq • Kubernetes on Google Cloud
  28. The Stargate application • Rails 5.1/Ruby 2.5 • React •

    MySQL + Redis • Sidekiq • Kubernetes on Google Cloud • Right where Rails’ sweet spot is
  29. About me

  30. About me • I spend a lot of time helping

    my team*
  31. About me • I spend a lot of time helping

    my team* • Alternately stated: I spend a lot of time finding the answers to questions they can’t find on their own
  32. About me • I spend a lot of time helping

    my team* • Alternately stated: I spend a lot of time finding the answers to questions they can’t find on their own • I’m a bottleneck of knowledge and access
  33. I’m Brent More accurately, I’m at risk of becoming Brent

  34. About Brent from The Phoenix Project

  35. About Brent from The Phoenix Project • Brent is a

    focal point of multi-team dependencies
  36. About Brent from The Phoenix Project • Brent is a

    focal point of multi-team dependencies • Brent can’t focus, because Brent either seeks out or is pulled into emergent situations
  37. About Brent from The Phoenix Project • Brent is a

    focal point of multi-team dependencies • Brent can’t focus, because Brent either seeks out or is pulled into emergent situations • Brent is a near full-time firefighter vs. being a mentor
  38. I’d like to make this dynamic better

  39. What we are going to cover

  40. What we are going to cover • Some overarching goals

    • World-building • Identifying pain points • Approaching problem solving • The command-line and the Rails console
  41. What we are going to cover

  42. What we are going to cover • Initial automation /

    notifications
  43. What we are going to cover • Initial automation /

    notifications • Reevaluating pain points
  44. What we are going to cover • Initial automation /

    notifications • Reevaluating pain points • Administrative frameworks
  45. What we are going to cover • Initial automation /

    notifications • Reevaluating pain points • Administrative frameworks • Building your own tooling
  46. Goals

  47. Goals

  48. Goals • Make problems easier to see, evaluate and act

    upon for the entire team
  49. Goals • Make problems easier to see, evaluate and act

    upon for the entire team • Limit “keyholder”-specific tools to rarely needed and/or higher-risk solutions
  50. Goals • Make problems easier to see, evaluate and act

    upon for the entire team • Limit “keyholder”-specific tools to rarely needed and/or higher-risk solutions • Develop, observe, evaluate, iterate
  51. Goals

  52. Goals • Less involvement of necessity in emergent situations

  53. Goals • Less involvement of necessity in emergent situations •

    Facilitate having fewer emergent situations overall
  54. Goals • Less involvement of necessity in emergent situations •

    Facilitate having fewer emergent situations overall • Redirect “helping my team” into mentorship
  55. Goals • Don’t change Brent’s name • No one else

    on my team should fill the role of Brent
  56. Caveats • This is all “Work in Progress” • I’m

    OK with exploring ideas that make our life and experiences better • Everything about this is and will be iterative
  57. World-building

  58. About the problem space • The codebase is about four

    years old • Started with a consulting team of implementing engineers • Team has substantially cycled over and grown • Development efforts focused on implementing sales- focused features and solving external problems
  59. About the problem space • Production access is limited •

    We have notifications for some automated jobs, but not all of them • Production access is required to determine the state of: • Automated jobs • Generated artifacts
  60. About the problem space • Problem-solving involves a lot of

    ad-hoc Rails console or database digging • There are only three production keyholders
  61. About the problem space • The majority of operational questions

    require specialized access
  62. Identifying pain points

  63. Operational Needs • Resetting stale data • Investigating error states

    of transactions • Troubleshooting and restarting failed jobs • Finding and verifying artifacts
  64. Context-switching is painful • Disruption from emergent situations and ad

    hoc requests is a productivity pits
  65. How this manifests • Because of the knowledge bottlenecks, I

    spend time context switching or identifying answers to questions instead of: • Feature work • Proactive technical debt pay-down
  66. Sample issues

  67. Sample issues • Inventory ingestion failing

  68. Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing
  69. Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing • Figuring out why particular items aren’t in store inventory when the business thinks they ought to be
  70. Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing • Figuring out why particular items aren’t in store inventory when the business thinks they ought to be • Determining how pervasive a possible payment transaction problem is
  71. Sample issues • Inventory ingestion failing • Inventory notification to

    third-party services failing • Figuring out why particular items aren’t in store inventory when the business thinks they ought to be • Determining how pervasive a possible payment transaction problem is • Email notification troubleshooting
  72. Sample issues

  73. Sample issues • DDOS

  74. Sample issues • DDOS • Overly-aggressive site crawling from search

    indexes
  75. Sample issues • DDOS • Overly-aggressive site crawling from search

    indexes • Fraudulent purchases
  76. Sample issues • DDOS • Overly-aggressive site crawling from search

    indexes • Fraudulent purchases • General issues with an online store
  77. Sample issues • DDOS • Overly-aggressive site crawling from search

    indexes • Fraudulent purchases • General issues with an online store • What’s one-off vs. ongoing and cyclical?
  78. Working towards better

  79. Approach the issues iteratively • What defines the issue? •

    Who has to address the issue now? • Who could potentially address the issue instead?
  80. Approach the issues iteratively • How might solving this issue

    be easier? • How might this issue be easier to spot?
  81. Improving issue visibility • Chat alerts/notifications • Monitoring with New

    Relic, Skylight, etc • Operational dashboards
  82. Improving resilience • Make expensive things more fault tolerant •

    Make it as easy as practical to recover from failed jobs
  83. Involve your team

  84. Involve your team • Socialize the issues

  85. Involve your team • Socialize the issues • Review and

    iterate solutions
  86. Involve your team • Socialize the issues • Review and

    iterate solutions • Don’t be the only person
  87. Example of team involvement • Point Developer • Technical triage

    as needed for bugs • General question answerer • Focused point for interruption • Can be iteratively improve internal tooling
  88. None
  89. None
  90. Solution Spaces

  91. The command-line and Rails Console

  92. What’s the console good for? • Investigating data and state

    changes • Trialing and applying one-off fixes • Calling-up data and running new or existing methods • Running SQL queries (no, seriously)
  93. Why SQL from the Rails console? • You may not

    have access to a database console • You can intermix Rails objects and SQL queries
  94. How we use the console

  95. How we use the console • Looking at error states

    on pending transactions or failed purchases
  96. How we use the console • Looking at error states

    on pending transactions or failed purchases • Examining the state of database-backed inventory processing jobs
  97. How we use the console • Looking at error states

    on pending transactions or failed purchases • Examining the state of database-backed inventory processing jobs • Verifying inventory availability per database scopes
  98. How we use the console • Looking at error states

    on pending transactions or failed purchases • Examining the state of database-backed inventory processing jobs • Verifying inventory availability per database scopes • Flipping feature flags
  99. Console use case

  100. Rails console benefits

  101. Rails console benefits • It’s ad-hoc!

  102. Rails console benefits • It’s ad-hoc! • Access to your

    scopes, prod data
  103. Rails console benefits • It’s ad-hoc! • Access to your

    scopes, prod data • Reopen classes, write new methods on a trial basis
  104. Rails console benefits • It’s ad-hoc! • Access to your

    scopes, prod data • Reopen classes, write new methods on a trial basis • You probably already have access to it on a prod server (k8s, Heroku, a VM, etc.)
  105. Rails console caveats

  106. Rails console caveats • Limited to production keyholders

  107. Rails console caveats • Limited to production keyholders • You’re

    live in production
  108. Rails console caveats • Limited to production keyholders • You’re

    live in production • Ad-hoc solutions aren’t saved
  109. Rails console caveats • Limited to production keyholders • You’re

    live in production • Ad-hoc solutions aren’t saved • Not great for review, visibility, auditability
  110. Safe use of the production console

  111. Safe use of the production console • Plan your actions

    in a non-prod environment
  112. Safe use of the production console • Plan your actions

    in a non-prod environment • Review your action plan with a peer
  113. Safe use of the production console • Plan your actions

    in a non-prod environment • Review your action plan with a peer • Inform your team of your intentions
  114. Safe use of the production console • Plan your actions

    in a non-prod environment • Review your action plan with a peer • Inform your team of your intentions • What? How? Why? When?
  115. Safe use of the production console

  116. Safe use of the production console • Work from a

    script
  117. Safe use of the production console • Work from a

    script • Log out of production when you’re task complete
  118. None
  119. Initial automation and notification

  120. Cron jobs, amirite?

  121. Automate what you can, make it visible

  122. Automate what you can, make it visible • We started

    off and we had some big automated tasks
  123. Automate what you can, make it visible • We started

    off and we had some big automated tasks • Sometimes, these tasks would not complete or require changes
  124. Automate what you can, make it visible • We started

    off and we had some big automated tasks • Sometimes, these tasks would not complete or require changes • Some tasks are time-sensitive
  125. Automate what you can, make it visible • We started

    off and we had some big automated tasks • Sometimes, these tasks would not complete or require changes • Some tasks are time-sensitive • We’ve made some of it visible
  126. So many notifications

  127. So many notifications

  128. Notification Benefits • Increase visibility of your system • Socialize

    the location and purpose of the notifications • Chat notifications (e.g. Slack) is a pretty low barrier • Can cover both automated jobs and live events
  129. Example: Fraud notifications

  130. None
  131. Notification Caveats

  132. Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy
  133. Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy • The map is not the territory
  134. Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy • The map is not the territory • Do you have the right things notifying?
  135. Notification Caveats • Dumping everything into one or more Slack

    channels can get very noisy • The map is not the territory • Do you have the right things notifying? • Do you have the right things not notifying?
  136. Notification Caveats

  137. Notification Caveats • Remediation requires a more complex integration

  138. Notification Caveats • Remediation requires a more complex integration •

    Chat bots
  139. Notification Caveats • Remediation requires a more complex integration •

    Chat bots • Webhooks
  140. Notification Caveats • Remediation requires a more complex integration •

    Chat bots • Webhooks • ???
  141. Automation

  142. Automation • Take typical console tasks and make them rake

    tasks
  143. Automation • Take typical console tasks and make them rake

    tasks • When?
  144. Automation • Take typical console tasks and make them rake

    tasks • When? • As soon as you detect the pattern
  145. Example inventory rake task invocation

  146. Reevaluating pain points

  147. Administrative frameworks

  148. None
  149. None
  150. None
  151. None
  152. None
  153. None
  154. None
  155. None
  156. Benefits

  157. Benefits • Increase the visibility of your data inside of

    your app
  158. Benefits • Increase the visibility of your data inside of

    your app • Uncouple answering questions with production data from production systems-level access
  159. Benefits • Increase the visibility of your data inside of

    your app • Uncouple answering questions with production data from production systems-level access • Solves a good amount with not a lot of work
  160. Benefits • Increase the visibility of your data inside of

    your app • Uncouple answering questions with production data from production systems-level access • Solves a good amount with not a lot of work • Allows for some customization
  161. Caveats

  162. Caveats • Needs to be worked into your authorization and

    authentication schemes
  163. Caveats • Needs to be worked into your authorization and

    authentication schemes • It’s a sizable addition to your existing app
  164. Caveats • Needs to be worked into your authorization and

    authentication schemes • It’s a sizable addition to your existing app • It likely won’t cover everything you need, particularly in specialized cases
  165. Building your own admin interfaces

  166. Why build your own? • Not limited by the structure

    an admin framework • A better approach for building more complex workflows
  167. Caveats • You’re building something on the order of a

    full externally-facing feature
  168. Custom ideas

  169. Currency conversion

  170. None
  171. None
  172. None
  173. None
  174. That’s a lot of tools • Tools aren’t the be-all,

    end-all • Work with your team
  175. Central Questions, revisited

  176. Central Questions, revisited • When is a good moment for

    a team to start making their internal life better?
  177. Central Questions, revisited • When is a good moment for

    a team to start making their internal life better? • What might that look like?
  178. Central Questions, revisited • When is a good moment for

    a team to start making their internal life better? • What might that look like? • Who should advocate for it and who should do it?
  179. Additional questions

  180. Additional questions • What is the time commitment?

  181. Additional questions • What is the time commitment? • How

    often should we review?
  182. Additional questions • What is the time commitment? • How

    often should we review? • Who should be involved?
  183. None
  184. Resources

  185. Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford
  186. Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford • Work Clean, Dan Charnas
  187. Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford • Work Clean, Dan Charnas • The Nature Fix, Florence Williams
  188. Resources • The Phoenix Project, Gene Kim, Kevin Behr, and

    George Spafford • Work Clean, Dan Charnas • The Nature Fix, Florence Williams • Greater Than Code Podcast
  189. Watkins graphics By Stephanie Schafer, VitalSource Technologies

  190. Me • Website: https://wallscorp.us/ • Twitter: @base10 • Slides: https://wallscorp.us/presentations/

    (soon)
  191. https://www.vitalsource.com THANK YOU!