Save 37% off PRO during our Black Friday Sale! »

Growing up with PHP

Growing up with PHP

I've now been using PHP for over 12 years, and over that time I've learnt a lot of lessons, many of them the hard way, about building good web applications that scale, and some of the relatively easy and cheap things you can do to avoid pain later on.


Andrew Betts

April 20, 2012


  1. Growing up with PHP Deploying enterprise scale applications with PHP

    Andrew Betts, Assanka
  2. Stuff to think about §  Output caching §  Object caching

    §  Opcode caching §  Sessions and state §  Reads vs writes §  Graceful failure §  Search §  Localisation and validation §  Design patterns §  Standards §  Testing & QA §  Deployment process §  Authentication §  Security §  Documentation §  Bug tracking §  Version control
  3. Your application stack Data storage Object model Request controllers Front

    controller / views Web server Edge cache Load balancer Network proxy User agent (browser) Firefox, Chrome, IE, Safari, Opera, Blackberry ISP or corporate level, eg Squid Edgecast, Amazon cloudfront, Akamai Squid, Varnish, Nginx Apache, IIS, Lighttpd PHP script executed by web server, template engine PHP to handle this URI / request type PHP for data access abstraction MySQL, MongoDB, Postgresql, flatfile Outside your control Your network architecture PHP app
  4. Fear sessions, and you will scale well. The master Jedi

    plans ahead.
  5. Expires: Thu, 19 Nov 1981 08:52:00 GMT Set-Cookie: path=/; phpsessid=b7977f7c69eb898bf42526652dda4c6c

    Sessions bork caching
  6. Hi! I'm "b7977f7c69e" Client Server 1 name: Andrew logindate: 2009-02-28

    userid: 453245 Client Server 2 Huh? Sessions are local to each server Hi! I'm still "b7977f7c69e"
  7. Session access is sequential vs.

  8. 1 Inject session state information

  9. Session only needed for tiny bit?

  10. Solution §  Generate non-personalised content and personal content separately § 

    Merge the two later
  11. Options §  JavaScript •  Implemented by user agent •  Closest

    to the user, generally best server performance •  But can cause icky user experience §  Edge side includes (ESI) •  At load balancer or edge level •  Supported by Varnish, Akamai §  Server side includes (SSI) or PHP •  Implemented by web server/PHP •  Caching needed at application level Data storage Object model Request controllers Web server Edge cache Load balancer Network proxy User agent (browser) Front controller / views
  12. <head> <script type='text/javascript' src='/js/session.js'></script> <script type='text/javascript' src=' sessiondata'></script> loadSession({ un:'Andrew',

    em:'', cartid:824457, cart:[ {id:12, desc:'Coal', qty:1, units:'lumps'}, {id:28, desc:'Rudolf jumper', qty:1, units:'jumpers'}, {id:82, desc:'Socks', qty:8, units:'pairs') ] }); Using JavaScript
  13. <head> <script type='text/javascript' src='/js/session.js'></script> <script> <esi:include src="" /> </script> </head>

    <body> ... loadSession({ un:'Andrew', em:'', cartid:824457, cart:[ {id:12, desc:'Coal', qty:1, units:'lumps'}, {id:28, desc:'Rudolf jumper', qty:1, units:'jumpers'}, {id:82, desc:'Socks', qty:8, units:'pairs') ] }); Using ESI
  14. Result. §  Session locking is confined to just session-related activity

    •  Most scripts don't need to track sessions, so can load in parallel •  Scaling sessions is less critical because far less load is being placed on the session store. §  You can cache stuff •  Depending on where you merge in the personal data §  However... •  We're still relying on a single data store. We're just making fewer demands of it.
  15. 2 Use cookies for client-side session storage

  16. Solution §  Simple, cheap horizontal session scaling §  Store all

    session state data in a cookie §  Sign it with a secret and a hash (eg. sha1) §  Timestamp allows you to expire it §  You can get a lot in there §  Remember it's not encrypted on the wire, and adds to your bandwidth (minimally) 27478932510|triblondon|1231936510|2,4,6,52,183|a152c24d9874ba15235f userid | username | sessionstart | groupmemberships | signature
  17. Think before you session_start();

  18. Caching. Not using intelligence when stupidity will do just fine.

  19. How to cache TTL based §  Simple §  Easy to

    implement §  Users may be served stale content §  Some users will occasionally have to wait for content to be regenerated Update on change §  Must have control over storage system §  Need to know what to purge and when §  Content is never stale §  Updates can be pushed into the cache, so end users never wait
  20. Where to cache Data storage Object model Request controllers Front

    controller / views Web server Edge cache Load balancer Network proxy User agent (browser)
  21. 3 Add query strings to enable far-future caching

  22. Update-on-change without the purge §  /lib/img/my_website_header.png Expires: Sun, 17 Jan

    2038 19:26:00 GMT But it’s OK - these are not the same object: §  /lib/img/my_website_header.png?v=2 §  /lib/img/my_website_header.png?v=3 §  /lib/img/my_website_header.png?v=4 Cripes
  23. Result. §  Changing the filename or adding a query string

    will cause all browsers to re-request the file. §  All the benefits of long term caching §  No update latency §  Essentially an update-on-change strategy which is possible even though you can’t force the user’s cache to purge. Genius
  24. How to remember to do it §  Don’t. §  Write

    this (or similar) instead: §  Then find/replace in a deploy script •  Use revision ID or unix timestamp §  What gets deployed: §  And it changes automatically every time you deploy new code! <script src=‘/blah/blah/script.js?v=@@deploy_version@@></script> <script src=‘/blah/blah/script.js?v=567></script> Tip
  25. 4 Use private directive to allow personal content to be

  26. Caching customised pages §  Can't allow customised pages to be

    cached publicly (disaster!) §  But, just set 'private' in cache-control header: §  Page will be cached by end user's browser, but not by shared proxies §  Always make use of browser cache when you can Cache-control: max-age=86400, private
  27. 5 Use a load balancer to cache output

  28. Load balancer vs edge caching §  CDN might be overkill

    •  But moving caching from load balancer level to edge level is easy later. §  Easier to invalidate Varnish •  Purge cache on write •  Ability to use advanced purging logic §  Varnish better for whole pages §  Edge caching best for resources - where cache rules are simple •  Edge caching providers: Edgecast, Amazon, Limelight, Akamai Data storage Object model Request controllers Front controller / views Web server Edge cache Load balancer Network proxy User agent (browser)
  29. Advanced purging using Varnish §  A record (eg a post)

    will appear on lots of index pages as well as it's own article page. §  Add tokens to output headers in PHP §  Cache server now knows which pages feature that post §  When the post changes, purge all at once: header('X-PostIDs: '.join(',', $postids_array)); > telnet varnish1 80 Connected to varnish1. PURGE /purge HTTP/1.1 X-PostID: 345 HTTP/1.1 204 Purged URL. Server: Varnish
  30. Are you cachable? §

  31. 6 Use memcached getMulti for cached list items

  32. Caching post teasers foreach ($posts as $post) { $keys[$post->id] =

    "postteaser-" . $post->id . "-" . ($user->hasMinRole('member')?'member':'nonmember') . "-" . $uatype; } $cacheresults = $mc->getMulti($keys); $op = ''; foreach ($posts as $post) { if (!empty($cacheresults[$keys[$post->id]])) { $op .= $cacheresults[$keys[$post->id]]; } else { // Generate post teaser $teaserhtml = '...'; $twoweeks = (60 * 60 * 24 * 14); $mc->set($keys[$post->id], $teaserhtml, $twoweeks); $op .= $html; } }
  33. Invalidating on edit $mc->delete('postteaser-'.$id.'-member-desktop'); $mc->delete('postteaser-'.$id.'-nonmember-desktop'); $mc->delete('postteaser-'.$id.'-member-mobile'); $mc->delete('postteaser-'.$id.'-nonmember-mobile'); $mc->delete('postteaser-'.$id.'-member-rss'); $mc->delete('postteaser-'.$id.'-nonmember-rss');

  34. Overburden your site with writes, and you're going nowhere fast.

  35. 7 Using hosted analytics to avoid logging

  36. Don't aggregate data yourself §  'Most viewed/emailed' widgets §  Thinking

    about doing this? §  Well, don't. You're writing on every page load! BBC News Online Screenshot. UPDATE content SET viewcount=viewcount+1 WHERE contentid=5309342; BAD IDEA
  37. Use Google Analytics instead §  GA:PI - Excellent GA API

    library for PHP §  Retrieve list of URL paths in order of descending page views §  But what about AJAX / downloads / JavaScript actions? <a href="" onClick="javascript: pageTracker._trackPageview('/outgoing/');">
  38. 8 Prep content in advance to avoid thundering herds

  39. Don't generate on demand §  Your Google Analytics based 'most

    viewed' content takes 2 seconds to generate §  You get 200 requests per second §  When the cached version expires, the next request will regenerate it §  But so will the next 399 requests! It's a thundering herd! §  Use a separate process to write to the cache, periodically or event driven, but not triggered by web requests. §  Scripts handling HTTP requests never write
  40. Learn to fail well, manage perception, and fix the problem.

  41. What happens when errors occur §  Usually embarrassing, but managing

    perception is important §  Twitter's fail whale is surprisingly popular:
  42. 9 Show something sensible when your app falls over

  43. Something more 'enterprisey'

  44. 10 Use custom error handlers to capture data

  45. Investigate using data

  46. Security is only as good as its weakest point.

  47. A cautionary tale HBGary vs Anonymous

  48. "They think I have nothing but a heirarchy based on

    IRC aliases! As 1337 as these guys are supposed to be they don't get it. I have pwned them! :)" Aaron Barr, CEO, HBGary
  49. SQL injection: Download user table § was susceptible to

    SQL injection §  Anonymous gains access to user database becomes %20SELECT%20%2A%20FROM%20users%3B&page=27
  50. Rainbow tables: decrypt passwords §  Users table contained hashed passwords

    §  BUT: •  Passwords were only hashed once •  No salt was used •  The CEO's password was a eight character string comprising six letters and two numbers §  Rainbow table decoded almost all the passwords instantly §  CEO's account gave admin access to the CMS
  51. Password reuse: gain SSH access §  COO's password for SSH

    was the same as the cracked CMS password §  Server was vulnerable to a privilege escalation vulnerability •  A patch had been available for 6 months §  Anonymous gain root access
  52. More password reuse: Email §  The HBGary CEO used Google

    Apps for email §  Same password as the cracked CMS §  His account was the administrator of the domain §  Anonymous gain access to HBGary and owner Greg Hoglund's mail account §  Account contains root password for server §  But you can't log in directly as root
  53. Social engineering: get an account From: Greg To: Jussi Subject:

    need to ssh into rootkit im in europe and need to ssh into the server. can you drop open up firewall and allow ssh through port 59022 or something vague? and is our root password still 88j4bb3rw0cky88 or did we change to 88Scr3am3r88 ? thanks From: Jussi To: Greg Subject: Re: need to ssh into rootkit ok, it should now accept from anywhere to 47152 as ssh. i am doing testing so that it works for sure. your password is changeme123 i am online so just shoot me if you need something. in europe, but not in finland? :-) _jussi
  54. Result §  All corporate email published to the world § 

    Complete user databases with cleartext passwords for both HBGary and published §  All backups and main website erased §  Anonymous publishes a statement... on hbgary's own site.
  55. None
  56. 18 days later

  57. 11 Filter inputs and outputs to avoid XSS and SQL

  58. Filtering input, encoding output §  Filter and validate inputs • 

    HTMLPurifier ( •  PHP's filter_var() and mysql_real_escape_string() •  Validate HTML against DTDs, particularly in publishing systems •  Try making your own DTD to exclude the stuff you don't like §  Encode outputs •  htmlentites, htmlspecialchars §  Store raw - storing data pre-encoded for display, presupposes where you are going to display it
  59. 12 Use SSL throughout to avoid session hijacking

  60. Firesheep

  61. How to actually be secure §  Disable browser autocomplete on

    login forms §  Set session cookies with secure flag §  Serve whole site over HTTPS
  62. 13 Don't be too helpful if it exposes private data

  63. Exposing existence of accounts

  64. Being friendly means being local, because not everyone is like

  65. 14 Use DateTime to simplify time zone problems

  66. Avoiding dating disaster §  Every time is meaningless without a

    zone §  Your application may rely on more than one setting •  OS, Database, PHP config §  Daylight savings time changes many time zone offsets twice a year •  Date comparisons also require historical DST records §  May want to show end user the time in their local zone §  Work out what to store (UTC is good) §  Use PHP's DateTime object whenever you are handling dates and, read Derick's 'dating manual'
  67. Never stop testing, because the cock-ups only get bigger

  68. 15 Use PHPCodeSniffer and PHPLint pre-commit to check coding standards

  69. PHPCodeSniffer §  Comes with lots of 'sniffs', but easy to

    write your own or customise Running pre_commit hooks [Code sniff precommit hook] FILE: /home/andrew/test ------------------------------------------------------------------- FOUND 1 ERROR(S) AND 0 WARNING(S) AFFECTING 1 LINE(S) ------------------------------------------------------------------- 339 | ERROR | Use of count() or sizeof() prohibited in while | | expression -------------------------------------------------------------------
  70. Checking syntax §  Sometimes editors will do it for you

    §  If not, easy to do it with PHP binary: php -l myfile.php Parse error: syntax error, unexpected T_ECHO in myfile.php on line 21 Errors parsing myfile.php
  71. 16 Run continuous integration testing so you don't forget

  72. CI tools §  Hudson, CruiseControl+PHPUnderControl, PHPunit, Selenium §  Run tests

    automatically after every commit §  Set up alerts by email / status screen
  73. Finally, if you must. Open the doors

  74. Thanks § §  We're hiring PHP experts, JavaScript gurus

    and UX masters. § § § §