CakePHP at a massive scale, on a budget

Andy Gale
September 05, 2010

Talk given at Cakefest 2010. About the platform used to create Cyclingnews.com using CakePHP.

  1. Introduction • I’m a web developer • I’ve been making

    websites since '96 • And professionally since '98 • I'm from Bristol, UK
  2. Introduction • I work for Future Publishing Plc • We

    are based in Bath, UK and have offices in London, San Francisco, New York and Sydney • We publish over 180 special-interest publications
  3. Publications • In the UK we publish SFX, Cycling Plus,

    In the UK we publish SFX, Cycling Plus, TotalFilm, MBUK, Simply Knitting, Official Playstation Magazine, .Net • In the US we publish Mac|Life, World of Warcraft Official Magazine • In Australia we publish Guitarist, T3, Official Nintendo, Official Xbox 360 To name a few:
  4. Websites • Most websites contain: news, features, reviews, products, forums

    • Ad funded • Popular sites, lots of traffic, often market leaders
  5. Website Platforms • Wordpress for small builds http://www.futureplc.com • Drupal

    for medium builds http://www.photoradar.com • CakePHP for large custom builds http://www.totalfilm.com http://www.cyclingnews.com We try to use the most appropriate website platform for the job
  6. • The first CakePHP build had 6 developers, 2 front-end

    developers and most hadn't used CakePHP before • Design change halfway through site build • But the site was completely with time to spare Our first big CakePHP build
  7. • Most developers thought they knew better • They didn’t

    really embrace CakePHP conventions - myself included • They didn't know better We learnt some lessons
  8. • We ended up with at four different ways of

    doing everything • Our future site builds with CakePHP really should use developers that "get" the framework We learnt some lessons
  9. • Expecting 30k page views • Got 100k page views

    • Site held up nicely • We all went to the pub Launch day
  10. • Marketing and PR departments announced the new site and

    started pimping links • IMDB liked one of the launch stories and put a link on their homepage • We got 500k page views • CakePHP view caching did not hold up because the server just couldn't run enough copies of PHP The day after launch day
  11. • How did we fix it? • JavaScript used for

    dynamic aspects such as user login/out • Cached full pages in Memcache via a helper • Served directly from Memcache with Nginx • More info: http://andy-gale.com/cakephp-view-memcache.html Oh crap! It broke
  12. • Hub pages - i.e. the homepage, features - weren’t

    immediately updated as content changed • Simple changes to content meant entire sections of the site should be rebuilt • Susceptible to the "thundering herd" problem Quick fix cache solution wasn't perfect!
  13. Thundering Herd • When cached item expires an unlucky user

    has to rebuild item in cache • But we have many requests every second • That's a lot of concurrent requests trying to rebuild the item in cache • And then we have an unlucky server A cache issue
  14. Thundering Herd • A single unlucky user rebuilds that cache

    item • And creates a lock in the cache so no other users try to recreate • Lock must expire quickly in case rebuild fails Solutions - Soft expire
  15. Thundering Herd • The best way to prevent the thundering

    herd problem is positive cache expiry • Update the cache when things change! • Sometimes easier said than done Solutions - Update the cache!!!
  16. • Recently acquired by Future Publishing • World’s number one

    cycling site • Flat HTML website • Editors used text editor to hand code HTML and FTP to update the website • Laborious to edit but fast to serve Our next project
  17. • CMS driven website needed • De-skill editorial requirements •

    More modern design • Still needs to handle 4 million page views in a day during the Tour de France • Massive peak towards the end of a stage Our next project
  18. • It’s a news site and needs to update instantly

    • High traffic peaks during the Tour de France or a Lance doping story • Couldn't tolerate the caching issues of that TotalFilm • Don't have a Facebook's hardware budget Our next project
  19. • Instead of caching whole pages, cache individual elements •

    We called them "panels" • The data in each panel relates to a model so when that model is saved we can update the cache of panels related to that model How would we cache it?
  20. • At peak times (i.e. at the end of a

    stage) • Not the CSS, JS and other static elements they are served via a CDN • That's serving just the HTML 1000 requests per second
  21. • After the success of TotalFilm we wanted to use

    CakePHP again • But we we're worried it wouldn't scale to 1000 requests a second • So we benchmarked it Can we handle that with CakePHP?
  22. • CakePHP with requestAction was very slow • CakePHP with

    elements cached was actually pretty quick but too slow Can we handle that with CakePHP?
  23. • CakePHP for the CMS • CakePHP generating the HTML

    separated into panels • CakePHP for publishing HTML panels into Memcache So what did we do?
  24. • We used a very simple PHP script to compile

    the HTML into full pages • Works out which panels are required from URL, fetches parts of the page from Memcache and compiles page So what did we do?
  25. • Using CakePHP with cached elements 0.1521 seconds • Pagecompiler

    0.0488 seconds • Pagecompiler with optimisation 0.0031 seconds Benchmarks (average with all panels cached)
  26. • View panel - an article • When article changes,

    panel needs to be updated How do we update the panels?
  27. • We stored the find params for each panel •

    Check beforeSave and afterSave, compare the resulting arrays and if they've changed the HTML panel needs to be regenerated Hub panels - Need to update?
  28. Model beforeSave Model->find() for the each panel associated with model

    Model afterSave Model->find() for the each panel associated with model Update HTML panel in cache Ensure find query isn't cached!!!
  29. • To avoid making the CMS really slow generating loads

    of HTML panels for every save we decided to use a queue • And made a CakePHP shell to work through the panels and publish them Lots of panels
  30. • CakePHP shell • Works through the panels and publishes

    them • Queue system prevents panels being regenerated over and over again when altered by multiple users Panelworker
  31. • Parent/child architecture with IPC • Able to process 20

    panels at once • State engine with socket_select() Panelworker
  32. Panel queue /news /news/view/lance-is-amazing /races/view/tour-de-france /features/view/some-new-bike Panelworker Next item from

    queue Fetch panel HTML from CMS CMS Publish panel HTML to Memcache www2 www1 Place changed panels in queue
  33. • We cached generated pages for 60 seconds in Memcache

    and served directly from Nginx to give us even more head room • www. servers also run Panelworkers to enable them to get panels they don't have • Panels are also cached on disk for when they fall out of Memcache What else?
  34. • 1181 forum posts complaining • Lance Armstrong tweets saying

    he hates the site • But, it easily handled the traffic both front end servers running at a load average of about 0.5 So the site launched
  35. • If your site needs more interactivity with users, replace

    HTML panels with data • Use a similar system to update data in the cache when things change • A queue system similar to the Panelworker could still be useful to keep your site responsive Alternatively
  36. • Instead of storing find params use a model method

    for fetching content for each panel • Despite being fast the web site lacks interactively • Use Membase instead of both Memcache and files to store panels What we'd do differently
  37. • Since we did out benchmarks over a year ago

    CakePHP 1.3 seems a lot quicker • LazyModel by Frank de Graaf (and others) http://bakery.cakephp.org/articles/view/optimizing-model-loading-with-lazymodel • We've got a new website build coming up which requires a lot interactivity But...