DON'T PANIC: Large scale web development

Jorge Lería & William Viana DON’T PANIC! Large scale web
development {[email protected], [email protected]} - Tuenti

in general

Jorge Lería @jorgeleria [email protected]

William Viana @vianasw [email protected]

Category: scalability Level: advanced Language: spanish Speakers: 2

THIS IS NOT ROCKET SCIENCE } {

BUT COULD BE DANGEROUS } {

“The effect of a small mistake with that amount of
users can be a disaster…”

Disaster example Back in 2010 we had games in Tuenti
We had a system to share sessions with third parties At first it wasn’ t using our Backend Framework And we launched GreenFarm

Disaster example /; ;\ __ \\____// COW /{_\_/ `'\____ \___
(o) (o } _____________________________/ :--' ,-,'`@@@@@@@@ @@@@@@ \_ `__\ ;:( @@@@@@@@@ @@@ \___(o'o) :: ) @@@@ @@@@@@ ,'@@( `====' :: : @@@@@: @@@@ `@@@: :: \ @@@@@: @@@@@@@) ( '@@@' ;; /\ /`, @@@@@@@@@\ :@@@@@) ::/ ) {_----------------: :~`,~~; ;;'`; : ) : / `; ; ;;;; : : ; : ; ; : `'`' / : : : : : : )_ \__; ";" :_ ; \_\ `,',' :__\ \ * `,'* \ \ : \ * 8`;'* * `^' \ :/ `^' `-^-' \v/ : \/ Back in 2010 we had games in Tuenti We had a system to share sessions with third parties At first it wasn’ t using our Backend Framework And we launched GreenFarm

Disaster example /; ;\ __ \\____// COW /{_\_/ `'\____ \___
(o) (o } _____________________________/ :--' ,-,'`@@@@@@@@ @@@@@@ \_ `__\ ;:( @@@@@@@@@ @@@ \___(o'o) :: ) @@@@ @@@@@@ ,'@@( `====' :: : @@@@@: @@@@ `@@@: :: \ @@@@@: @@@@@@@) ( '@@@' ;; /\ /`, @@@@@@@@@\ :@@@@@) ::/ ) {_----------------: :~`,~~; ;;'`; : ) : / `; ; ;;;; : : ; : ; ; : `'`' / : : : : : : )_ \__; ";" :_ ; \_\ `,',' :__\ \ * `,'* \ \ : \ * 8`;'* * `^' \ :/ `^' `-^-' \v/ : \/ We had a problem with concurrency We detected duplicated gamer IDs The database was corrupted No backup (it happened during the launch)

Disaster example /; ;\ __ \\____// DEAD COW /{_\_/ `'\____
\___ X X } _____________________________/ :--' ,-,'`@@@@@@@@ @@@@@@ \_ `__\ ;:( @@@@@@@@@ @@@ \___(o'o) :: ) @@@@ @@@@@@ ,'@@( `----' :: : @@@@@: @@@@ `@@@: |_| :: \ @@@@@: @@@@@@@) ( '@@@' ;; /\ /`, @@@@@@@@@\ :@@@@@) ::/ ) {_----------------: :~`,~~; ;;'`; : ) : / `; ; ;;;; : : ; : ; ; : `'`' / : : : : : : )_ \__; ";" :_ ; \_\ `,',' :__\ \ * `,'* \ \ : \ * 8`;'* * `^' \ :/ `^' `-^-' \v/ : \/ We had a problem with concurrency We detected duplicated gamer IDs The database was corrupted No backup (it happened during the launch) = RESET THE GAME

a.- A hat b.- A boa constrictor digesting an elephant
c.- Something much more frightening

Tuenti... do you still use that?

Three days ago we were #2

~16.000.000 users

~3.200.000.000 images

How do we manage all of that?

With a ton of servers... +1000 servers 550 frontend (PHP)
servers. 300 database (MySQL) servers. 100 memcached servers. 100 image servers. etc. (thumbnails, photo processors, chat, batch, dev, stats, backups, HBase).

...and a great team

Technologies PHP MySQL memcache HBase Hadoop Hive HipHop nginx Erlang
...

Technologies PHP MySQL memcache HBase Hadoop Hive HipHop nginx Erlang
... }Free software (free as in freedom)

In software engineering, scalability is the ability of a system,
network, or process, to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth. It's all about handling growth

What is NOT scalability Scalability ≠ Speed Scalability ≠ Throughput
(Sometimes slower code is better for scalability)

social network != other sites

Photos Notifications Master slave architecture Schema divided into logical blocks
One cluster per data type

Partitioning not limited to databases Partitioned memcache and frontend farms
Databases

“All magic comes with a price… ...in complexity” - No
longer relational (no joins or counts) - Denormalization

Thankfully, we have a very good Backend Framework

“Backend Framework team is the team responsible for the data
access layer in Tuenti, providing services for persistent storage interaction and their support.”

Backend framework All data partitioning is transparent Fully integrated memcache
as a storage layer Data structures like “Cached lists”

Feature disabling/enabling “Allows you to gradually open a feature or
disable one that is broken”

Dark launches Usually done when we want to fill a
cache, before a feature is actually released

Asynchronous jobs Running on a queue system with throthing enable
to be used for heavy operations.

Staging “Test and review a newer version before it is
moved into production by making users go to a concrete server where the new code is deployed”

Before we start: what makes a photo popular? how much
is it going to be used? is it going to be slow? what are our constraints?

So, we run some numbers, and decide how to implement
a MVP

1st approach: Let’s build something quick with the existing backend
(but that still scales)

For every user friend...

Photos For every user friend... Retrieve all of their photos

Photos Likes Comments For every user friend... Retrieve all of
their photos And for every photo, retrieve all the likes and comments

Of course, this doesn’t scale very well

We can do better

B F H Users with activities

B F H Status Photo Comment Status Photo Photo Users
with activities User Events

B F H Status Photo Comment Status Photo Photo Users
with activities User Events Likes Comments

Store the result in a cache

So we released it to the world, crossed our fingers,
and opened it up little by little, with an eye in the graphs

Well, at least that was the plan...

After the hotfix, it kind of worked and it was
enough to validate the product.

“Don't listen to the direct reaction, just look at the
numbers” Facebook guy (yesterday in the speech)

2nd approach: The generative approach

For every user, store every like and comment in a
photo into a memcache storage

B F H Like on Photo Comment Comment Like on
Photo Comment Comment Users with activities Photo events Much easier Smaller memory footprint

Do the dark launch, wait to the caches to fill
in, and switch to the new implementation

Profit

Until the beliebers came...

“It’s not enough to make it work for the average
user, you’ll have to make it for the power users with power user friends as well.” Socrates -

Any questions? @jorgeleria @vianasw @TuentiEng

DON'T PANIC: Large scale web development

DON'T PANIC: Large scale web development

More Decks by Tuenti

Other Decks in Technology

Featured

Transcript