Intercom's Majestic Monolith

Intercom’s Majestic Monolith

Eugene Kenny Senior Engineer @eugeneius

What is Intercom? Intercom is one place for every team
in an internet business to communicate with customers, personally, at scale—on your website, inside web and mobile apps, and by email.

Monolith vs. Microservices

“If you’re Amazon or Google or any other software organization
with thousands of developers, [microservices are] a wonderful way to parallelize opportunities for improvement.” — DHH, The Majestic Monolith https://m.signalvnoise.com/the-majestic-monolith-29166d022228

~12 ~160 ~20,000 Engineers https://m.signalvnoise.com/the-majestic-monolith-29166d022228 https://www.quora.com/How-many-software-engineers-does-Google-have

70 thousand 517 thousand two billion Lines of code https://twitter.com/dhh/status/962111734361178112
https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code/fulltext

Where’s the limit? Intercom is closer to Basecamp in size
than Google—but growing fast. I’ll talk about four changes we’ve made to deal with increases in size and complexity.

Scaling database reads

The ideal solution • Buy a bigger database and forget
about it • Biggest Amazon RDS database available • New instance type announced? Upgrade

The problem • Eventually even the biggest database available isn’t
powerful enough • Vertical partitioning is effective, but difﬁcult

Solution: IdentityCache Opt-in read-through cache for Active Record Cache entries
are invalidated automatically Use Model.fetch instead of Model.find

Issues with IdentityCache Methods that skip callbacks (update_column, update_all, …)
don’t invalidate the cache Updating a shared cache from the client will never be 100% accurate (machine dies at wrong time)

Solution: Makara • Read-write proxy for database connections • Runs
queries on replicas when possible • Transparent to application code

Issues with Makara • Can’t always read from a replica;
in particular, if a client has just written • Cookies “stick” user to primary database • Somewhat difﬁcult to reason about

Summary • Scale database up until it stops working •
Vertically partition if possible • Prefer read replicas to a shared cache

Managing complexity

Commands • Encapsulate actions that need to be performed together
• Intercom uses the Mutations gem  (https://github.com/cypriss/mutations)

class WidgetsController < ApplicationController def create @widget = Widget.create!(widget_params) WidgetResizeJob.perform_later(@widget)
end end Before

class WidgetsController < ApplicationController def create Widgets::Create.run!(widget_params) end end class
Widgets::Create < Mutations::Command def execute widget = Widget.create!(inputs) WidgetResizeJob.perform_later(widget) end end After

What does this give us? • Commands can be run
in multiple contexts (requests/jobs/admin) • Input validation (both structure and types)

class Widgets::Create < Mutations::Command required do string :name boolean :tuneable
end def execute widget = Widget.create!(inputs) WidgetResizeJob.perform_later(widget) end end Input validation

Summary • Whether you use a library or plain old
Ruby objects, consider modelling actions explicitly

Code ownership

Responsible teams Models, controllers, jobs and commands are tagged with
the team that owns the code, by adding a RESPONSIBLE_TEAM constant

class Widget < ApplicationRecord RESPONSIBLE_TEAM = "team-widgets" # … end
Models

class WidgetsController < ApplicationController RESPONSIBLE_TEAM = "team-widgets" # … end
Controllers

class WidgetResizeWorker < ApplicationWorker RESPONSIBLE_TEAM = "team-widgets" # … end
Jobs

class Widgets::Create < Mutations::Command RESPONSIBLE_TEAM = "team-widgets" # … end
Commands

Beneﬁts • Per-team performance metrics/dashboards • Helps on-call triage incidents
• Can route alerts for new errors to the team

Service boundaries

Service boundaries • Caches and replicas are not enough •
As I mentioned, vertical partitioning is hard • Potentially easier with well-deﬁned APIs

Conclusion

Does Rails scale?? • Yes, it’s possible to scale a
monolithic Rails application to ~10x Basecamp’s size • It requires investment in infrastructure, but engineers are still productive

But… • We may be reaching a tipping point •
Service boundary work is potentially a ﬁrst step towards service-oriented architecture

Thanks Eugene Kenny @eugeneius from @intercom [email protected]

Intercom's Majestic Monolith

Intercom's Majestic Monolith

More Decks by Eugene Kenny

Other Decks in Programming

Featured

Transcript