Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DON'T PANIC: Large scale web development

Tuenti
October 19, 2013

DON'T PANIC: Large scale web development

In Tuenti, all features we develop have to be ready to scale to millions of users when the code is released to production. By using caching patterns, priming caches, dark-launches and other strategies we try to ensure that the infrastructure will cope with the load. In this talk we want to give real, not-trivial examples of code written to scale... and other cool things about scalability in general.

Tuenti

October 19, 2013
Tweet

More Decks by Tuenti

Other Decks in Technology

Transcript

  1. Disaster example Back in 2010 we had games in Tuenti

    We had a system to share sessions with third parties At first it wasn’ t using our Backend Framework And we launched GreenFarm
  2. Disaster example /; ;\ __ \\____// COW /{_\_/ `'\____ \___

    (o) (o } _____________________________/ :--' ,-,'`@@@@@@@@ @@@@@@ \_ `__\ ;:( @@@@@@@@@ @@@ \___(o'o) :: ) @@@@ @@@@@@ ,'@@( `====' :: : @@@@@: @@@@ `@@@: :: \ @@@@@: @@@@@@@) ( '@@@' ;; /\ /`, @@@@@@@@@\ :@@@@@) ::/ ) {_----------------: :~`,~~; ;;'`; : ) : / `; ; ;;;; : : ; : ; ; : `'`' / : : : : : : )_ \__; ";" :_ ; \_\ `,',' :__\ \ * `,'* \ \ : \ * 8`;'* * `^' \ :/ `^' `-^-' \v/ : \/ Back in 2010 we had games in Tuenti We had a system to share sessions with third parties At first it wasn’ t using our Backend Framework And we launched GreenFarm
  3. Disaster example /; ;\ __ \\____// COW /{_\_/ `'\____ \___

    (o) (o } _____________________________/ :--' ,-,'`@@@@@@@@ @@@@@@ \_ `__\ ;:( @@@@@@@@@ @@@ \___(o'o) :: ) @@@@ @@@@@@ ,'@@( `====' :: : @@@@@: @@@@ `@@@: :: \ @@@@@: @@@@@@@) ( '@@@' ;; /\ /`, @@@@@@@@@\ :@@@@@) ::/ ) {_----------------: :~`,~~; ;;'`; : ) : / `; ; ;;;; : : ; : ; ; : `'`' / : : : : : : )_ \__; ";" :_ ; \_\ `,',' :__\ \ * `,'* \ \ : \ * 8`;'* * `^' \ :/ `^' `-^-' \v/ : \/ We had a problem with concurrency We detected duplicated gamer IDs The database was corrupted No backup (it happened during the launch)
  4. Disaster example /; ;\ __ \\____// DEAD COW /{_\_/ `'\____

    \___ X X } _____________________________/ :--' ,-,'`@@@@@@@@ @@@@@@ \_ `__\ ;:( @@@@@@@@@ @@@ \___(o'o) :: ) @@@@ @@@@@@ ,'@@( `----' :: : @@@@@: @@@@ `@@@: |_| :: \ @@@@@: @@@@@@@) ( '@@@' ;; /\ /`, @@@@@@@@@\ :@@@@@) ::/ ) {_----------------: :~`,~~; ;;'`; : ) : / `; ; ;;;; : : ; : ; ; : `'`' / : : : : : : )_ \__; ";" :_ ; \_\ `,',' :__\ \ * `,'* \ \ : \ * 8`;'* * `^' \ :/ `^' `-^-' \v/ : \/ We had a problem with concurrency We detected duplicated gamer IDs The database was corrupted No backup (it happened during the launch) = RESET THE GAME
  5. a.- A hat b.- A boa constrictor digesting an elephant

    c.- Something much more frightening
  6. a.- A hat b.- A boa constrictor digesting an elephant

    c.- Something much more frightening
  7. With a ton of servers... +1000 servers 550 frontend (PHP)

    servers. 300 database (MySQL) servers. 100 memcached servers. 100 image servers. etc. (thumbnails, photo processors, chat, batch, dev, stats, backups, HBase).
  8. In software engineering, scalability is the ability of a system,

    network, or process, to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth. It's all about handling growth
  9. What is NOT scalability Scalability ≠ Speed Scalability ≠ Throughput

    (Sometimes slower code is better for scalability)
  10. “All magic comes with a price… ...in complexity” - No

    longer relational (no joins or counts) - Denormalization
  11. “Backend Framework team is the team responsible for the data

    access layer in Tuenti, providing services for persistent storage interaction and their support.”
  12. Backend framework All data partitioning is transparent Fully integrated memcache

    as a storage layer Data structures like “Cached lists”
  13. Dark launches Usually done when we want to fill a

    cache, before a feature is actually released
  14. Staging “Test and review a newer version before it is

    moved into production by making users go to a concrete server where the new code is deployed”
  15. Before we start: what makes a photo popular? how much

    is it going to be used? is it going to be slow? what are our constraints?
  16. Photos Likes Comments For every user friend... Retrieve all of

    their photos And for every photo, retrieve all the likes and comments
  17. B F H Status Photo Comment Status Photo Photo Users

    with activities User Events Likes Comments
  18. So we released it to the world, crossed our fingers,

    and opened it up little by little, with an eye in the graphs
  19. After the hotfix, it kind of worked and it was

    enough to validate the product.
  20. “Don't listen to the direct reaction, just look at the

    numbers” Facebook guy (yesterday in the speech)
  21. B F H Like on Photo Comment Comment Like on

    Photo Comment Comment Users with activities Photo events Much easier Smaller memory footprint
  22. Do the dark launch, wait to the caches to fill

    in, and switch to the new implementation
  23. “It’s not enough to make it work for the average

    user, you’ll have to make it for the power users with power user friends as well.” Socrates -