Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CakePHP at a massive scale, on a budget

Andy Gale
September 05, 2010

CakePHP at a massive scale, on a budget

Talk given at Cakefest 2010. About the platform used to create Cyclingnews.com using CakePHP.

Andy Gale

September 05, 2010
Tweet

More Decks by Andy Gale

Other Decks in Programming

Transcript

  1. CakePHP at a massive
    scale, on a budget
    Andy Gale
    Wednesday, 29 May 13

    View full-size slide

  2. Introduction
    • I’m a web developer
    • I’ve been making websites since ’96
    • And professionally since ’98
    • I’m from Bristol, UK
    Wednesday, 29 May 13

    View full-size slide

  3. Clifton Suspension Bridge
    © Ian Wade http://ianwadephotography.co.uk/
    Wednesday, 29 May 13

    View full-size slide

  4. Introduction
    • I work for Future Publishing Plc
    • We are based in Bath, UK and have offices
    in London, San Francisco, New York and
    Sydney
    • We publish over 180 special-interest
    publications
    Wednesday, 29 May 13

    View full-size slide

  5. Publications
    • In the UK we publish SFX, Cycling Plus,
    TotalFilm, MBUK, Simply Knitting, Official
    Playstation Magazine, .Net
    • In the US we publish Mac|Life, World of
    Warcraft Official Magazine
    • In Australia we publish Guitarist, T3, Official
    Nintendo, Official Xbox 360
    To name a few:
    Wednesday, 29 May 13

    View full-size slide

  6. Websites
    • TechRadar, BikeRadar, MusicRadar,
    GamesRadar, PhotoRadar
    • CyclingNews, TotalFilm
    • Many more...
    Wednesday, 29 May 13

    View full-size slide

  7. Websites
    • Most websites contain: news, features,
    reviews, products, forums
    • Ad funded
    • Popular sites, lots of traffic, often market
    leaders
    Wednesday, 29 May 13

    View full-size slide

  8. Website Platforms
    • Wordpress for small builds
    http://www.futureplc.com
    • Drupal for medium builds
    http://www.photoradar.com
    • CakePHP for large custom builds
    http://www.totalfilm.com
    http://www.cyclingnews.com
    We try to use the most appropriate
    website platform for the job
    Wednesday, 29 May 13

    View full-size slide

  9. Wednesday, 29 May 13

    View full-size slide

  10. • The first CakePHP build had 6 developers,
    2 front-end developers and most hadn’t
    used CakePHP before
    • Design change halfway through site build
    • But the site was completely with time to
    spare
    Our first big CakePHP build
    Wednesday, 29 May 13

    View full-size slide

  11. • Most developers thought they knew better
    • They didn’t really embrace CakePHP
    conventions - myself included
    • They didn’t know better
    We learnt some lessons
    Wednesday, 29 May 13

    View full-size slide

  12. • We ended up with at four different ways of
    doing everything
    • Our future site builds with CakePHP really
    should use developers that “get” the
    framework
    We learnt some lessons
    Wednesday, 29 May 13

    View full-size slide

  13. • Expecting 30k page views
    • Got 100k page views
    • Site held up nicely
    • We all went to the pub
    Launch day
    Wednesday, 29 May 13

    View full-size slide

  14. • Marketing and PR departments announced
    the new site and started pimping links
    • IMDB liked one of the launch stories and
    put a link on their homepage
    • We got 500k page views
    • CakePHP view caching did not hold up
    because the server just couldn’t run
    enough copies of PHP
    The day after launch day
    Wednesday, 29 May 13

    View full-size slide

  15. • How did we fix it?
    • JavaScript used for dynamic aspects such as
    user login/out
    • Cached full pages in Memcache via a helper
    • Served directly from Memcache with Nginx
    • More info:
    http://andy-gale.com/cakephp-view-memcache.html
    Oh crap! It broke
    Wednesday, 29 May 13

    View full-size slide

  16. • Hub pages - i.e. the homepage, features -
    weren’t immediately updated as content
    changed
    • Simple changes to content meant entire
    sections of the site should be rebuilt
    • Susceptible to the “thundering herd”
    problem
    Quick fix cache solution wasn’t perfect!
    Wednesday, 29 May 13

    View full-size slide

  17. Thundering Herd
    Wednesday, 29 May 13

    View full-size slide

  18. Thundering Herd
    • When cached item expires an unlucky user
    has to rebuild item in cache
    • But we have many requests every second
    • That’s a lot of concurrent requests trying
    to rebuild the item in cache
    • And then we have an unlucky server
    A cache issue
    Wednesday, 29 May 13

    View full-size slide

  19. Thundering Herd
    • A single unlucky user rebuilds that cache
    item
    • And creates a lock in the cache so no
    other users try to recreate
    • Lock must expire quickly in case rebuild
    fails
    Solutions - Soft expire
    Wednesday, 29 May 13

    View full-size slide

  20. Thundering Herd
    • The best way to prevent the thundering
    herd problem is positive cache expiry
    • Update the cache when things change!
    • Sometimes easier said than done
    Solutions - Update the cache!!!
    Wednesday, 29 May 13

    View full-size slide

  21. © Roberto Bettini
    Wednesday, 29 May 13

    View full-size slide

  22. • Recently acquired by Future Publishing
    • World’s number one cycling site
    • Flat HTML website
    • Editors used text editor to hand code
    HTML and FTP to update the website
    • Laborious to edit but fast to serve
    Our next project
    Wednesday, 29 May 13

    View full-size slide

  23. • CMS driven website needed
    • De-skill editorial requirements
    • More modern design
    • Still needs to handle 4 million page views in
    a day during the Tour de France
    • Massive peak towards the end of a stage
    Our next project
    Wednesday, 29 May 13

    View full-size slide

  24. Page views per hour
    Wednesday, 29 May 13

    View full-size slide

  25. • It’s a news site and needs to update
    instantly
    • High traffic peaks during the Tour de
    France or a Lance doping story
    • Couldn’t tolerate the caching issues of that
    TotalFilm
    • Don’t have a Facebook’s hardware budget
    Our next project
    Wednesday, 29 May 13

    View full-size slide

  26. • Instead of caching whole pages, cache
    individual elements
    • We called them “panels”
    • The data in each panel relates to a model
    so when that model is saved we can update
    the cache of panels related to that model
    How would we cache it?
    Wednesday, 29 May 13

    View full-size slide

  27. Wednesday, 29 May 13

    View full-size slide

  28. • At peak times (i.e. at the end of a stage)
    • Not the CSS, JS and other static elements
    they are served via a CDN
    • That’s serving just the HTML
    1000 requests per second
    Wednesday, 29 May 13

    View full-size slide

  29. • After the success of TotalFilm we wanted
    to use CakePHP again
    • But we we’re worried it wouldn’t scale to
    1000 requests a second
    • So we benchmarked it
    Can we handle that with CakePHP?
    Wednesday, 29 May 13

    View full-size slide

  30. • CakePHP with requestAction was very
    slow
    • CakePHP with elements cached was
    actually pretty quick but too slow
    Can we handle that with CakePHP?
    Wednesday, 29 May 13

    View full-size slide

  31. • CakePHP for the CMS
    • CakePHP generating the HTML separated
    into panels
    • CakePHP for publishing HTML panels into
    Memcache
    So what did we do?
    Wednesday, 29 May 13

    View full-size slide

  32. • We used a very simple PHP script to
    compile the HTML into full pages
    • Works out which panels are required from
    URL, fetches parts of the page from
    Memcache and compiles page
    So what did we do?
    Wednesday, 29 May 13

    View full-size slide

  33. • Using CakePHP with cached elements
    0.1521 seconds
    • Pagecompiler
    0.0488 seconds
    • Pagecompiler with optimisation
    0.0031 seconds
    Benchmarks (average with all panels cached)
    Wednesday, 29 May 13

    View full-size slide

  34. • View panel - an article
    • When article changes, panel needs to be
    updated
    How do we update the panels?
    Wednesday, 29 May 13

    View full-size slide

  35. Hub panel
    Often a listing
    Difficult to update
    Two types of panel
    Wednesday, 29 May 13

    View full-size slide

  36. • We stored the find params for each panel
    • Check beforeSave and afterSave, compare
    the resulting arrays and if they’ve changed
    the HTML panel needs to be regenerated
    Hub panels - Need to update?
    Wednesday, 29 May 13

    View full-size slide

  37. Model beforeSave
    Model->find() for the each panel
    associated with model
    Model afterSave
    Model->find() for the each panel
    associated with model
    Update HTML panel in cache
    Ensure find query isn’t
    cached!!!
    Wednesday, 29 May 13

    View full-size slide

  38. • To avoid making the CMS really slow
    generating loads of HTML panels for every
    save we decided to use a queue
    • And made a CakePHP shell to work
    through the panels and publish them
    Lots of panels
    Wednesday, 29 May 13

    View full-size slide

  39. • CakePHP shell
    • Works through the panels and publishes
    them
    • Queue system prevents panels being
    regenerated over and over again when
    altered by multiple users
    Panelworker
    Wednesday, 29 May 13

    View full-size slide

  40. • Parent/child architecture with IPC
    • Able to process 20 panels at once
    • State engine with socket_select()
    Panelworker
    Wednesday, 29 May 13

    View full-size slide

  41. Panel queue
    /news
    /news/view/lance-is-amazing
    /races/view/tour-de-france
    /features/view/some-new-bike
    Panelworker
    Next item from queue
    Fetch panel HTML from CMS
    CMS
    Publish panel HTML to
    Memcache
    www2
    www1
    Place changed
    panels in queue
    Wednesday, 29 May 13

    View full-size slide

  42. • We cached generated pages for 60 seconds
    in Memcache and served directly from
    Nginx to give us even more head room
    • www. servers also run Panelworkers to
    enable them to get panels they don’t have
    • Panels are also cached on disk for when
    they fall out of Memcache
    What else?
    Wednesday, 29 May 13

    View full-size slide

  43. • 1181 forum posts complaining
    • Lance Armstrong tweets saying he hates
    the site
    • But, it easily handled the traffic both front
    end servers running at a load average of
    about 0.5
    So the site launched
    Wednesday, 29 May 13

    View full-size slide

  44. • If your site needs more interactivity with
    users, replace HTML panels with data
    • Use a similar system to update data in the
    cache when things change
    • A queue system similar to the Panelworker
    could still be useful to keep your site
    responsive
    Alternatively
    Wednesday, 29 May 13

    View full-size slide

  45. • Instead of storing find params use a model
    method for fetching content for each panel
    • Despite being fast the web site lacks
    interactively
    • Use Membase instead of both Memcache
    and files to store panels
    What we’d do differently
    Wednesday, 29 May 13

    View full-size slide

  46. • Since we did out benchmarks over a year
    ago CakePHP 1.3 seems a lot quicker
    • LazyModel by Frank de Graaf (and others)
    http://bakery.cakephp.org/articles/view/optimizing-model-loading-with-lazymodel
    • We’ve got a new website build coming up
    which requires a lot interactivity
    But...
    Wednesday, 29 May 13

    View full-size slide

  47. • For the new build we’re implementing the
    following caching system...
    But...
    Wednesday, 29 May 13

    View full-size slide

  48. Questions or comments?
    http://www.cyclingnews.com
    http://www.totalfilm.com
    http://www.andy-gale.com
    Twitter: @andygale
    Wednesday, 29 May 13

    View full-size slide