Intercom's Majestic Monolith

Intercom's Majestic Monolith

Presented at Ruby Ireland - https://www.meetup.com/rubyireland/events/248735127/

Intercom is still powered by the same Rails application that the founding team started to build seven years ago. In that time it’s grown from a tiny prototype to half a million lines of code, maintained by 80 engineers and deployed 100 times a day. In this talk, we’ll present some of the technologies and practices we use to deal with the ever-increasing scale and complexity at which we operate.

5087f20f93d975670c1e7f44e3147fc8?s=128

Eugene Kenny

March 20, 2018
Tweet

Transcript

  1. Intercom’s Majestic Monolith

  2. Eugene Kenny Senior Engineer @eugeneius

  3. What is Intercom? Intercom is one place for every team

    in an internet business to communicate with customers, personally, at scale—on your website, inside web and mobile apps, and by email.
  4. None
  5. Monolith vs. Microservices

  6. “If you’re Amazon or Google or any other software organization

    with thousands of developers, [microservices are] a wonderful way to parallelize opportunities for improvement.” — DHH, The Majestic Monolith https://m.signalvnoise.com/the-majestic-monolith-29166d022228
  7. ~12 ~160 ~20,000 Engineers https://m.signalvnoise.com/the-majestic-monolith-29166d022228 https://www.quora.com/How-many-software-engineers-does-Google-have

  8. 70 thousand 517 thousand two billion Lines of code https://twitter.com/dhh/status/962111734361178112

    https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code/fulltext
  9. Where’s the limit? Intercom is closer to Basecamp in size

    than Google—but growing fast. I’ll talk about four changes we’ve made to deal with increases in size and complexity.
  10. Scaling database reads

  11. The ideal solution • Buy a bigger database and forget

    about it • Biggest Amazon RDS database available • New instance type announced? Upgrade
  12. The problem • Eventually even the biggest database available isn’t

    powerful enough • Vertical partitioning is effective, but difficult
  13. Solution: IdentityCache Opt-in read-through cache for Active Record Cache entries

    are invalidated automatically Use Model.fetch instead of Model.find
  14. Issues with IdentityCache Methods that skip callbacks (update_column, update_all, …)

    don’t invalidate the cache Updating a shared cache from the client will never be 100% accurate (machine dies at wrong time)
  15. Solution: Makara • Read-write proxy for database connections • Runs

    queries on replicas when possible • Transparent to application code
  16. Issues with Makara • Can’t always read from a replica;

    in particular, if a client has just written • Cookies “stick” user to primary database • Somewhat difficult to reason about
  17. Summary • Scale database up until it stops working •

    Vertically partition if possible • Prefer read replicas to a shared cache
  18. Managing complexity

  19. Commands • Encapsulate actions that need to be performed together

    • Intercom uses the Mutations gem
 (https://github.com/cypriss/mutations)
  20. class WidgetsController < ApplicationController def create @widget = Widget.create!(widget_params) WidgetResizeJob.perform_later(@widget)

    end end Before
  21. class WidgetsController < ApplicationController def create Widgets::Create.run!(widget_params) end end class

    Widgets::Create < Mutations::Command def execute widget = Widget.create!(inputs) WidgetResizeJob.perform_later(widget) end end After
  22. What does this give us? • Commands can be run

    in multiple contexts (requests/jobs/admin) • Input validation (both structure and types)
  23. class Widgets::Create < Mutations::Command required do string :name boolean :tuneable

    end def execute widget = Widget.create!(inputs) WidgetResizeJob.perform_later(widget) end end Input validation
  24. Summary • Whether you use a library or plain old

    Ruby objects, consider modelling actions explicitly
  25. Code ownership

  26. Responsible teams Models, controllers, jobs and commands are tagged with

    the team that owns the code, by adding a RESPONSIBLE_TEAM constant
  27. class Widget < ApplicationRecord RESPONSIBLE_TEAM = "team-widgets" # … end

    Models
  28. class WidgetsController < ApplicationController RESPONSIBLE_TEAM = "team-widgets" # … end

    Controllers
  29. class WidgetResizeWorker < ApplicationWorker RESPONSIBLE_TEAM = "team-widgets" # … end

    Jobs
  30. class Widgets::Create < Mutations::Command RESPONSIBLE_TEAM = "team-widgets" # … end

    Commands
  31. Benefits • Per-team performance metrics/dashboards • Helps on-call triage incidents

    • Can route alerts for new errors to the team
  32. None
  33. Service boundaries

  34. Service boundaries • Caches and replicas are not enough •

    As I mentioned, vertical partitioning is hard • Potentially easier with well-defined APIs
  35. Conclusion

  36. Does Rails scale?? • Yes, it’s possible to scale a

    monolithic Rails application to ~10x Basecamp’s size • It requires investment in infrastructure, but engineers are still productive
  37. But… • We may be reaching a tipping point •

    Service boundary work is potentially a first step towards service-oriented architecture
  38. Thanks Eugene Kenny @eugeneius from @intercom eugene@intercom.io