What is Intercom?
Intercom is one place for every team in an internet business
to communicate with customers, personally, at scale—on
your website, inside web and mobile apps, and by email.
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
Monolith vs. Microservices
Slide 6
Slide 6 text
“If you’re Amazon or Google or any other
software organization with thousands of developers,
[microservices are] a wonderful way to parallelize
opportunities for improvement.”
— DHH, The Majestic Monolith
https://m.signalvnoise.com/the-majestic-monolith-29166d022228
70 thousand
517 thousand
two billion
Lines of code
https://twitter.com/dhh/status/962111734361178112
https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code/fulltext
Slide 9
Slide 9 text
Where’s the limit?
Intercom is closer to Basecamp in size than
Google—but growing fast.
I’ll talk about four changes we’ve made to
deal with increases in size and complexity.
Slide 10
Slide 10 text
Scaling database reads
Slide 11
Slide 11 text
The ideal solution
• Buy a bigger database and forget about it
• Biggest Amazon RDS database available
• New instance type announced? Upgrade
Slide 12
Slide 12 text
The problem
• Eventually even the biggest database
available isn’t powerful enough
• Vertical partitioning is effective, but difficult
Slide 13
Slide 13 text
Solution: IdentityCache
Opt-in read-through cache for Active Record
Cache entries are invalidated automatically
Use Model.fetch instead of Model.find
Slide 14
Slide 14 text
Issues with IdentityCache
Methods that skip callbacks (update_column,
update_all, …) don’t invalidate the cache
Updating a shared cache from the client will never
be 100% accurate (machine dies at wrong time)
Slide 15
Slide 15 text
Solution: Makara
• Read-write proxy for database connections
• Runs queries on replicas when possible
• Transparent to application code
Slide 16
Slide 16 text
Issues with Makara
• Can’t always read from a replica; in
particular, if a client has just written
• Cookies “stick” user to primary database
• Somewhat difficult to reason about
Slide 17
Slide 17 text
Summary
• Scale database up until it stops working
• Vertically partition if possible
• Prefer read replicas to a shared cache
Slide 18
Slide 18 text
Managing complexity
Slide 19
Slide 19 text
Commands
• Encapsulate actions that need to be
performed together
• Intercom uses the Mutations gem
(https://github.com/cypriss/mutations)
Slide 20
Slide 20 text
class WidgetsController < ApplicationController
def create
@widget = Widget.create!(widget_params)
WidgetResizeJob.perform_later(@widget)
end
end
Before
Slide 21
Slide 21 text
class WidgetsController < ApplicationController
def create
Widgets::Create.run!(widget_params)
end
end
class Widgets::Create < Mutations::Command
def execute
widget = Widget.create!(inputs)
WidgetResizeJob.perform_later(widget)
end
end
After
Slide 22
Slide 22 text
What does this give us?
• Commands can be run in multiple contexts
(requests/jobs/admin)
• Input validation (both structure and types)
Slide 23
Slide 23 text
class Widgets::Create < Mutations::Command
required do
string :name
boolean :tuneable
end
def execute
widget = Widget.create!(inputs)
WidgetResizeJob.perform_later(widget)
end
end
Input validation
Slide 24
Slide 24 text
Summary
• Whether you use a library or plain old Ruby
objects, consider modelling actions explicitly
Slide 25
Slide 25 text
Code ownership
Slide 26
Slide 26 text
Responsible teams
Models, controllers, jobs and commands are
tagged with the team that owns the code,
by adding a RESPONSIBLE_TEAM constant
Slide 27
Slide 27 text
class Widget < ApplicationRecord
RESPONSIBLE_TEAM = "team-widgets"
# …
end
Models
Slide 28
Slide 28 text
class WidgetsController < ApplicationController
RESPONSIBLE_TEAM = "team-widgets"
# …
end
Controllers
Slide 29
Slide 29 text
class WidgetResizeWorker < ApplicationWorker
RESPONSIBLE_TEAM = "team-widgets"
# …
end
Jobs
Slide 30
Slide 30 text
class Widgets::Create < Mutations::Command
RESPONSIBLE_TEAM = "team-widgets"
# …
end
Commands
Slide 31
Slide 31 text
Benefits
• Per-team performance metrics/dashboards
• Helps on-call triage incidents
• Can route alerts for new errors to the team
Slide 32
Slide 32 text
No content
Slide 33
Slide 33 text
Service boundaries
Slide 34
Slide 34 text
Service boundaries
• Caches and replicas are not enough
• As I mentioned, vertical partitioning is hard
• Potentially easier with well-defined APIs
Slide 35
Slide 35 text
Conclusion
Slide 36
Slide 36 text
Does Rails scale??
• Yes, it’s possible to scale a monolithic Rails
application to ~10x Basecamp’s size
• It requires investment in infrastructure, but
engineers are still productive
Slide 37
Slide 37 text
But…
• We may be reaching a tipping point
• Service boundary work is potentially a first
step towards service-oriented architecture