Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CakePHP at a massive scale, on a budget

Andy Gale
September 05, 2010

CakePHP at a massive scale, on a budget

Talk given at Cakefest 2010. About the platform used to create Cyclingnews.com using CakePHP.

Andy Gale

September 05, 2010
Tweet

More Decks by Andy Gale

Other Decks in Programming

Transcript

  1. Introduction • I’m a web developer • I’ve been making

    websites since ’96 • And professionally since ’98 • I’m from Bristol, UK Wednesday, 29 May 13
  2. Introduction • I work for Future Publishing Plc • We

    are based in Bath, UK and have offices in London, San Francisco, New York and Sydney • We publish over 180 special-interest publications Wednesday, 29 May 13
  3. Publications • In the UK we publish SFX, Cycling Plus,

    TotalFilm, MBUK, Simply Knitting, Official Playstation Magazine, .Net • In the US we publish Mac|Life, World of Warcraft Official Magazine • In Australia we publish Guitarist, T3, Official Nintendo, Official Xbox 360 To name a few: Wednesday, 29 May 13
  4. Websites • Most websites contain: news, features, reviews, products, forums

    • Ad funded • Popular sites, lots of traffic, often market leaders Wednesday, 29 May 13
  5. Website Platforms • Wordpress for small builds http://www.futureplc.com • Drupal

    for medium builds http://www.photoradar.com • CakePHP for large custom builds http://www.totalfilm.com http://www.cyclingnews.com We try to use the most appropriate website platform for the job Wednesday, 29 May 13
  6. • The first CakePHP build had 6 developers, 2 front-end

    developers and most hadn’t used CakePHP before • Design change halfway through site build • But the site was completely with time to spare Our first big CakePHP build Wednesday, 29 May 13
  7. • Most developers thought they knew better • They didn’t

    really embrace CakePHP conventions - myself included • They didn’t know better We learnt some lessons Wednesday, 29 May 13
  8. • We ended up with at four different ways of

    doing everything • Our future site builds with CakePHP really should use developers that “get” the framework We learnt some lessons Wednesday, 29 May 13
  9. • Expecting 30k page views • Got 100k page views

    • Site held up nicely • We all went to the pub Launch day Wednesday, 29 May 13
  10. • Marketing and PR departments announced the new site and

    started pimping links • IMDB liked one of the launch stories and put a link on their homepage • We got 500k page views • CakePHP view caching did not hold up because the server just couldn’t run enough copies of PHP The day after launch day Wednesday, 29 May 13
  11. • How did we fix it? • JavaScript used for

    dynamic aspects such as user login/out • Cached full pages in Memcache via a helper • Served directly from Memcache with Nginx • More info: http://andy-gale.com/cakephp-view-memcache.html Oh crap! It broke Wednesday, 29 May 13
  12. • Hub pages - i.e. the homepage, features - weren’t

    immediately updated as content changed • Simple changes to content meant entire sections of the site should be rebuilt • Susceptible to the “thundering herd” problem Quick fix cache solution wasn’t perfect! Wednesday, 29 May 13
  13. Thundering Herd • When cached item expires an unlucky user

    has to rebuild item in cache • But we have many requests every second • That’s a lot of concurrent requests trying to rebuild the item in cache • And then we have an unlucky server A cache issue Wednesday, 29 May 13
  14. Thundering Herd • A single unlucky user rebuilds that cache

    item • And creates a lock in the cache so no other users try to recreate • Lock must expire quickly in case rebuild fails Solutions - Soft expire Wednesday, 29 May 13
  15. Thundering Herd • The best way to prevent the thundering

    herd problem is positive cache expiry • Update the cache when things change! • Sometimes easier said than done Solutions - Update the cache!!! Wednesday, 29 May 13
  16. • Recently acquired by Future Publishing • World’s number one

    cycling site • Flat HTML website • Editors used text editor to hand code HTML and FTP to update the website • Laborious to edit but fast to serve Our next project Wednesday, 29 May 13
  17. • CMS driven website needed • De-skill editorial requirements •

    More modern design • Still needs to handle 4 million page views in a day during the Tour de France • Massive peak towards the end of a stage Our next project Wednesday, 29 May 13
  18. • It’s a news site and needs to update instantly

    • High traffic peaks during the Tour de France or a Lance doping story • Couldn’t tolerate the caching issues of that TotalFilm • Don’t have a Facebook’s hardware budget Our next project Wednesday, 29 May 13
  19. • Instead of caching whole pages, cache individual elements •

    We called them “panels” • The data in each panel relates to a model so when that model is saved we can update the cache of panels related to that model How would we cache it? Wednesday, 29 May 13
  20. • At peak times (i.e. at the end of a

    stage) • Not the CSS, JS and other static elements they are served via a CDN • That’s serving just the HTML 1000 requests per second Wednesday, 29 May 13
  21. • After the success of TotalFilm we wanted to use

    CakePHP again • But we we’re worried it wouldn’t scale to 1000 requests a second • So we benchmarked it Can we handle that with CakePHP? Wednesday, 29 May 13
  22. • CakePHP with requestAction was very slow • CakePHP with

    elements cached was actually pretty quick but too slow Can we handle that with CakePHP? Wednesday, 29 May 13
  23. • CakePHP for the CMS • CakePHP generating the HTML

    separated into panels • CakePHP for publishing HTML panels into Memcache So what did we do? Wednesday, 29 May 13
  24. • We used a very simple PHP script to compile

    the HTML into full pages • Works out which panels are required from URL, fetches parts of the page from Memcache and compiles page So what did we do? Wednesday, 29 May 13
  25. • Using CakePHP with cached elements 0.1521 seconds • Pagecompiler

    0.0488 seconds • Pagecompiler with optimisation 0.0031 seconds Benchmarks (average with all panels cached) Wednesday, 29 May 13
  26. • View panel - an article • When article changes,

    panel needs to be updated How do we update the panels? Wednesday, 29 May 13
  27. • We stored the find params for each panel •

    Check beforeSave and afterSave, compare the resulting arrays and if they’ve changed the HTML panel needs to be regenerated Hub panels - Need to update? Wednesday, 29 May 13
  28. Model beforeSave Model->find() for the each panel associated with model

    Model afterSave Model->find() for the each panel associated with model Update HTML panel in cache Ensure find query isn’t cached!!! Wednesday, 29 May 13
  29. • To avoid making the CMS really slow generating loads

    of HTML panels for every save we decided to use a queue • And made a CakePHP shell to work through the panels and publish them Lots of panels Wednesday, 29 May 13
  30. • CakePHP shell • Works through the panels and publishes

    them • Queue system prevents panels being regenerated over and over again when altered by multiple users Panelworker Wednesday, 29 May 13
  31. • Parent/child architecture with IPC • Able to process 20

    panels at once • State engine with socket_select() Panelworker Wednesday, 29 May 13
  32. Panel queue /news /news/view/lance-is-amazing /races/view/tour-de-france /features/view/some-new-bike Panelworker Next item from

    queue Fetch panel HTML from CMS CMS Publish panel HTML to Memcache www2 www1 Place changed panels in queue Wednesday, 29 May 13
  33. • We cached generated pages for 60 seconds in Memcache

    and served directly from Nginx to give us even more head room • www. servers also run Panelworkers to enable them to get panels they don’t have • Panels are also cached on disk for when they fall out of Memcache What else? Wednesday, 29 May 13
  34. • 1181 forum posts complaining • Lance Armstrong tweets saying

    he hates the site • But, it easily handled the traffic both front end servers running at a load average of about 0.5 So the site launched Wednesday, 29 May 13
  35. • If your site needs more interactivity with users, replace

    HTML panels with data • Use a similar system to update data in the cache when things change • A queue system similar to the Panelworker could still be useful to keep your site responsive Alternatively Wednesday, 29 May 13
  36. • Instead of storing find params use a model method

    for fetching content for each panel • Despite being fast the web site lacks interactively • Use Membase instead of both Memcache and files to store panels What we’d do differently Wednesday, 29 May 13
  37. • Since we did out benchmarks over a year ago

    CakePHP 1.3 seems a lot quicker • LazyModel by Frank de Graaf (and others) http://bakery.cakephp.org/articles/view/optimizing-model-loading-with-lazymodel • We’ve got a new website build coming up which requires a lot interactivity But... Wednesday, 29 May 13