Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intercom's Majestic Monolith

Intercom's Majestic Monolith

Presented at Ruby Ireland - https://www.meetup.com/rubyireland/events/248735127/

Intercom is still powered by the same Rails application that the founding team started to build seven years ago. In that time it’s grown from a tiny prototype to half a million lines of code, maintained by 80 engineers and deployed 100 times a day. In this talk, we’ll present some of the technologies and practices we use to deal with the ever-increasing scale and complexity at which we operate.

Eugene Kenny

March 20, 2018
Tweet

More Decks by Eugene Kenny

Other Decks in Programming

Transcript

  1. What is Intercom? Intercom is one place for every team

    in an internet business to communicate with customers, personally, at scale—on your website, inside web and mobile apps, and by email.
  2. “If you’re Amazon or Google or any other software organization

    with thousands of developers, [microservices are] a wonderful way to parallelize opportunities for improvement.” — DHH, The Majestic Monolith https://m.signalvnoise.com/the-majestic-monolith-29166d022228
  3. 70 thousand 517 thousand two billion Lines of code https://twitter.com/dhh/status/962111734361178112

    https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code/fulltext
  4. Where’s the limit? Intercom is closer to Basecamp in size

    than Google—but growing fast. I’ll talk about four changes we’ve made to deal with increases in size and complexity.
  5. The ideal solution • Buy a bigger database and forget

    about it • Biggest Amazon RDS database available • New instance type announced? Upgrade
  6. The problem • Eventually even the biggest database available isn’t

    powerful enough • Vertical partitioning is effective, but difficult
  7. Solution: IdentityCache Opt-in read-through cache for Active Record Cache entries

    are invalidated automatically Use Model.fetch instead of Model.find
  8. Issues with IdentityCache Methods that skip callbacks (update_column, update_all, …)

    don’t invalidate the cache Updating a shared cache from the client will never be 100% accurate (machine dies at wrong time)
  9. Solution: Makara • Read-write proxy for database connections • Runs

    queries on replicas when possible • Transparent to application code
  10. Issues with Makara • Can’t always read from a replica;

    in particular, if a client has just written • Cookies “stick” user to primary database • Somewhat difficult to reason about
  11. Summary • Scale database up until it stops working •

    Vertically partition if possible • Prefer read replicas to a shared cache
  12. Commands • Encapsulate actions that need to be performed together

    • Intercom uses the Mutations gem
 (https://github.com/cypriss/mutations)
  13. class WidgetsController < ApplicationController def create Widgets::Create.run!(widget_params) end end class

    Widgets::Create < Mutations::Command def execute widget = Widget.create!(inputs) WidgetResizeJob.perform_later(widget) end end After
  14. What does this give us? • Commands can be run

    in multiple contexts (requests/jobs/admin) • Input validation (both structure and types)
  15. class Widgets::Create < Mutations::Command required do string :name boolean :tuneable

    end def execute widget = Widget.create!(inputs) WidgetResizeJob.perform_later(widget) end end Input validation
  16. Summary • Whether you use a library or plain old

    Ruby objects, consider modelling actions explicitly
  17. Responsible teams Models, controllers, jobs and commands are tagged with

    the team that owns the code, by adding a RESPONSIBLE_TEAM constant
  18. Service boundaries • Caches and replicas are not enough •

    As I mentioned, vertical partitioning is hard • Potentially easier with well-defined APIs
  19. Does Rails scale?? • Yes, it’s possible to scale a

    monolithic Rails application to ~10x Basecamp’s size • It requires investment in infrastructure, but engineers are still productive
  20. But… • We may be reaching a tipping point •

    Service boundary work is potentially a first step towards service-oriented architecture