Slide 1

Slide 1 text

Intercom’s Majestic Monolith

Slide 2

Slide 2 text

Eugene Kenny Senior Engineer @eugeneius

Slide 3

Slide 3 text

What is Intercom? Intercom is one place for every team in an internet business to communicate with customers, personally, at scale—on your website, inside web and mobile apps, and by email.

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Monolith vs. Microservices

Slide 6

Slide 6 text

“If you’re Amazon or Google or any other software organization with thousands of developers, [microservices are] a wonderful way to parallelize opportunities for improvement.” — DHH, The Majestic Monolith https://m.signalvnoise.com/the-majestic-monolith-29166d022228

Slide 7

Slide 7 text

~12 ~160 ~20,000 Engineers https://m.signalvnoise.com/the-majestic-monolith-29166d022228 https://www.quora.com/How-many-software-engineers-does-Google-have

Slide 8

Slide 8 text

70 thousand 517 thousand two billion Lines of code https://twitter.com/dhh/status/962111734361178112 https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code/fulltext

Slide 9

Slide 9 text

Where’s the limit? Intercom is closer to Basecamp in size than Google—but growing fast. I’ll talk about four changes we’ve made to deal with increases in size and complexity.

Slide 10

Slide 10 text

Scaling database reads

Slide 11

Slide 11 text

The ideal solution • Buy a bigger database and forget about it • Biggest Amazon RDS database available • New instance type announced? Upgrade

Slide 12

Slide 12 text

The problem • Eventually even the biggest database available isn’t powerful enough • Vertical partitioning is effective, but difficult

Slide 13

Slide 13 text

Solution: IdentityCache Opt-in read-through cache for Active Record Cache entries are invalidated automatically Use Model.fetch instead of Model.find

Slide 14

Slide 14 text

Issues with IdentityCache Methods that skip callbacks (update_column, update_all, …) don’t invalidate the cache Updating a shared cache from the client will never be 100% accurate (machine dies at wrong time)

Slide 15

Slide 15 text

Solution: Makara • Read-write proxy for database connections • Runs queries on replicas when possible • Transparent to application code

Slide 16

Slide 16 text

Issues with Makara • Can’t always read from a replica; in particular, if a client has just written • Cookies “stick” user to primary database • Somewhat difficult to reason about

Slide 17

Slide 17 text

Summary • Scale database up until it stops working • Vertically partition if possible • Prefer read replicas to a shared cache

Slide 18

Slide 18 text

Managing complexity

Slide 19

Slide 19 text

Commands • Encapsulate actions that need to be performed together • Intercom uses the Mutations gem
 (https://github.com/cypriss/mutations)

Slide 20

Slide 20 text

class WidgetsController < ApplicationController def create @widget = Widget.create!(widget_params) WidgetResizeJob.perform_later(@widget) end end Before

Slide 21

Slide 21 text

class WidgetsController < ApplicationController def create Widgets::Create.run!(widget_params) end end class Widgets::Create < Mutations::Command def execute widget = Widget.create!(inputs) WidgetResizeJob.perform_later(widget) end end After

Slide 22

Slide 22 text

What does this give us? • Commands can be run in multiple contexts (requests/jobs/admin) • Input validation (both structure and types)

Slide 23

Slide 23 text

class Widgets::Create < Mutations::Command required do string :name boolean :tuneable end def execute widget = Widget.create!(inputs) WidgetResizeJob.perform_later(widget) end end Input validation

Slide 24

Slide 24 text

Summary • Whether you use a library or plain old Ruby objects, consider modelling actions explicitly

Slide 25

Slide 25 text

Code ownership

Slide 26

Slide 26 text

Responsible teams Models, controllers, jobs and commands are tagged with the team that owns the code, by adding a RESPONSIBLE_TEAM constant

Slide 27

Slide 27 text

class Widget < ApplicationRecord RESPONSIBLE_TEAM = "team-widgets" # … end Models

Slide 28

Slide 28 text

class WidgetsController < ApplicationController RESPONSIBLE_TEAM = "team-widgets" # … end Controllers

Slide 29

Slide 29 text

class WidgetResizeWorker < ApplicationWorker RESPONSIBLE_TEAM = "team-widgets" # … end Jobs

Slide 30

Slide 30 text

class Widgets::Create < Mutations::Command RESPONSIBLE_TEAM = "team-widgets" # … end Commands

Slide 31

Slide 31 text

Benefits • Per-team performance metrics/dashboards • Helps on-call triage incidents • Can route alerts for new errors to the team

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

Service boundaries

Slide 34

Slide 34 text

Service boundaries • Caches and replicas are not enough • As I mentioned, vertical partitioning is hard • Potentially easier with well-defined APIs

Slide 35

Slide 35 text

Conclusion

Slide 36

Slide 36 text

Does Rails scale?? • Yes, it’s possible to scale a monolithic Rails application to ~10x Basecamp’s size • It requires investment in infrastructure, but engineers are still productive

Slide 37

Slide 37 text

But… • We may be reaching a tipping point • Service boundary work is potentially a first step towards service-oriented architecture

Slide 38

Slide 38 text

Thanks Eugene Kenny @eugeneius from @intercom [email protected]