Scaling The Monolith

Scaling the Monolith Laracon EU 2023

Mateus Guimarães @mateusjatenee @mateusjatenee Software Engineer and Educator

TDD With Laravel is now free! tddwithlaravel.com

What is scalability?

Scalability is the ability of a system to keep working
as load increases

Scalability

Code Scalability Come on, Mateus. This term does not exist…
We usually worry about the infrastructure side of things, but often neglect the quality and our code’s ability to grow — in functionalities, modi fi cations or processing power. Well-tested, properly designed code = easier to grow and to modify

Does Laravel scale?

Writing apps with MEAN Laracon EU 2015

I’m kidding. It does scale. A lot.

Giving some context In 2019, I worked on an SMS
app. In short: • Clients uploaded huge lists of customers • Campaigns were created targeting segmented people based on those lists • The platform would then automatically send SMS messages, respecting the limits of each number provider, etc

Application Modules List uploading Campaign Creation Message Sending • Lots
of data ( 100M + ) • Expensive geographic lookup • Slow • Expensive queries • Fairly database • Complex processes needed to run during the actual creation of each pending message record • Need to be FAST • High number of operations per second • The maximum amount of messages per minute per account needed to be calculated during runtime • We need to support dozens of SMS providers!

Application Modules Response Handling Automatic Replies (Drips) • Simple logic,
but expensive operations in high scale • “Stop word” verifications against the database • Would possibly trigger another command to block a number from receiving messages • Would happen after receiving a “reply keyword” (e.g: yes) • Adds a pending message to the database

The Stack • Laravel • Mongo • Vue.js • Go
Most of the app was written in Laravel. To send messages, there was a simple Go Script that fetched pending messages from the database and sent them.

Modules

Problem: list uploading • Extremely slow • Queued, but jobs
would frequently timeout several times and be discarded

Problem: contact creation • Complex and intensive • Expensive geolocation
data generation through the contact’s phone number • High amount of queries to create each content • High number of contacts being created at any minute

Problem: message sending • High number of operations per second
• Needed to verify whether the contact had already received a message from that speci fi c sender over the last 24 hours

Problem: requests We had two types of webhooks: delivery reports
and text replies. The number of delivery reports was 1 - 1 with sent messages — for each message we sent, we would receive one delivery report. After campaigns were sending for a while, we would get flooded with requests and start responding very slowly and, after some time, with 500s.

Problem: Mongo Mongo worked very well… until the collections got
fairly large and it didn’t. The fact that pausing/resuming campaigns and sending messaged involved *moving* data between collections obviously did not help.

Problem: Message processing The Go script — that processed pending
messages — did not have a UI, logs, or anything. Monitoring was very complicated. Adding new drivers was very complicated too, since code needed to be added to Laravel *and* to the Go script.

There was, clearly, architecture and design problems.

The state of things • ~100M contacts per day •
1-2M messages sent per day • Servers costed 5 digits a month • Frequent failures • Daily data cleanup

A New Hope • Laravel • MySQL • Laravel Horizon
• Redis We decided I was build a new MVP, alone, to try and correct the problems we had. I decided to use a very simple stack:

So… why didn’t you pick…

Docker

Kubernetes

Microservices

I chose what I knew well. That’s it.

Figure out the basics It’s easy to think about all
the fancy tooling and hype things and forget about the basics, sometimes optimizing things that were never a problem. • Does your database have the necessary indexes? • Does your database have the correct data types?(looking at you, VARCHAR ( 255 ) ) • Are your server and process manager ( NGINX and PHP - FPM, for example) correctly configured? • Are your queries optimized? • Do you have tools to show you where the bottleneck is? ( Observability)

New Database structure

Fixing the processing problems Since we had lots of intensive
and/or frequent processes, the solution was, in short, to queue everything we could and read against a cache whenever possible.

Things are going to break. Accept it. Things break often
in programming. They break even more often when they’re external. And even more often in high-scale environments. Accept that they *are* going to break, and instead focus on writing good countermeasures to those problems and ensuring you have good observability — that is, you can spot them as soon as they happen.

List Uploading

Contact Creation

Number Lookup

Requests

Webhooks

Message Sending

Know your tools We all leverage lots of abstraction, but
it is important to have an understanding of how some things work behind the scenes. For example…

Results! • 15M+ sent messages a day without breaking a
sweat • 300M+ imported contacts a day • 500M+ jobs processed in a 12h timeframe • 30,000+ requests/minute • Infrastructure cost under $1000/month • Easily horizontally scalable • Easy queue management and monitoring through Horizon • Easy to add new providers • Well-tested, easy to maintain, extend, and modify codebase

Would I do this the same way today?

Thank you!

Scaling The Monolith

Scaling The Monolith

More Decks by Mateus Guimarães

Other Decks in Programming

Featured

Transcript