Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Growing up with PHP

Growing up with PHP

I've now been using PHP for over 12 years, and over that time I've learnt a lot of lessons, many of them the hard way, about building good web applications that scale, and some of the relatively easy and cheap things you can do to avoid pain later on.

Andrew Betts

April 20, 2012
Tweet

More Decks by Andrew Betts

Other Decks in Technology

Transcript

  1. Stuff to think about §  Output caching §  Object caching

    §  Opcode caching §  Sessions and state §  Reads vs writes §  Graceful failure §  Search §  Localisation and validation §  Design patterns §  Standards §  Testing & QA §  Deployment process §  Authentication §  Security §  Documentation §  Bug tracking §  Version control
  2. Your application stack Data storage Object model Request controllers Front

    controller / views Web server Edge cache Load balancer Network proxy User agent (browser) Firefox, Chrome, IE, Safari, Opera, Blackberry ISP or corporate level, eg Squid Edgecast, Amazon cloudfront, Akamai Squid, Varnish, Nginx Apache, IIS, Lighttpd PHP script executed by web server, template engine PHP to handle this URI / request type PHP for data access abstraction MySQL, MongoDB, Postgresql, flatfile Outside your control Your network architecture PHP app
  3. Hi! I'm "b7977f7c69e" Client Server 1 name: Andrew logindate: 2009-02-28

    userid: 453245 Client Server 2 Huh? Sessions are local to each server Hi! I'm still "b7977f7c69e"
  4. Options §  JavaScript •  Implemented by user agent •  Closest

    to the user, generally best server performance •  But can cause icky user experience §  Edge side includes (ESI) •  At load balancer or edge level •  Supported by Varnish, Akamai §  Server side includes (SSI) or PHP •  Implemented by web server/PHP •  Caching needed at application level Data storage Object model Request controllers Web server Edge cache Load balancer Network proxy User agent (browser) Front controller / views
  5. <head> <script type='text/javascript' src='/js/session.js'></script> <script type='text/javascript' src='http://sessions.example.com/ sessiondata'></script> loadSession({ un:'Andrew',

    em:'[email protected]', cartid:824457, cart:[ {id:12, desc:'Coal', qty:1, units:'lumps'}, {id:28, desc:'Rudolf jumper', qty:1, units:'jumpers'}, {id:82, desc:'Socks', qty:8, units:'pairs') ] }); Using JavaScript
  6. <head> <script type='text/javascript' src='/js/session.js'></script> <script> <esi:include src="http://sessions.example.com/sessiondata" /> </script> </head>

    <body> ... loadSession({ un:'Andrew', em:'[email protected]', cartid:824457, cart:[ {id:12, desc:'Coal', qty:1, units:'lumps'}, {id:28, desc:'Rudolf jumper', qty:1, units:'jumpers'}, {id:82, desc:'Socks', qty:8, units:'pairs') ] }); Using ESI
  7. Result. §  Session locking is confined to just session-related activity

    •  Most scripts don't need to track sessions, so can load in parallel •  Scaling sessions is less critical because far less load is being placed on the session store. §  You can cache stuff •  Depending on where you merge in the personal data §  However... •  We're still relying on a single data store. We're just making fewer demands of it.
  8. Solution §  Simple, cheap horizontal session scaling §  Store all

    session state data in a cookie §  Sign it with a secret and a hash (eg. sha1) §  Timestamp allows you to expire it §  You can get a lot in there §  Remember it's not encrypted on the wire, and adds to your bandwidth (minimally) 27478932510|triblondon|1231936510|2,4,6,52,183|a152c24d9874ba15235f userid | username | sessionstart | groupmemberships | signature
  9. How to cache TTL based §  Simple §  Easy to

    implement §  Users may be served stale content §  Some users will occasionally have to wait for content to be regenerated Update on change §  Must have control over storage system §  Need to know what to purge and when §  Content is never stale §  Updates can be pushed into the cache, so end users never wait
  10. Where to cache Data storage Object model Request controllers Front

    controller / views Web server Edge cache Load balancer Network proxy User agent (browser)
  11. Update-on-change without the purge §  /lib/img/my_website_header.png Expires: Sun, 17 Jan

    2038 19:26:00 GMT But it’s OK - these are not the same object: §  /lib/img/my_website_header.png?v=2 §  /lib/img/my_website_header.png?v=3 §  /lib/img/my_website_header.png?v=4 Cripes
  12. Result. §  Changing the filename or adding a query string

    will cause all browsers to re-request the file. §  All the benefits of long term caching §  No update latency §  Essentially an update-on-change strategy which is possible even though you can’t force the user’s cache to purge. Genius
  13. How to remember to do it §  Don’t. §  Write

    this (or similar) instead: §  Then find/replace in a deploy script •  Use revision ID or unix timestamp §  What gets deployed: §  And it changes automatically every time you deploy new code! <script src=‘/blah/blah/script.js?v=@@deploy_version@@></script> <script src=‘/blah/blah/script.js?v=567></script> Tip
  14. Caching customised pages §  Can't allow customised pages to be

    cached publicly (disaster!) §  But, just set 'private' in cache-control header: §  Page will be cached by end user's browser, but not by shared proxies §  Always make use of browser cache when you can Cache-control: max-age=86400, private
  15. Load balancer vs edge caching §  CDN might be overkill

    •  But moving caching from load balancer level to edge level is easy later. §  Easier to invalidate Varnish •  Purge cache on write •  Ability to use advanced purging logic §  Varnish better for whole pages §  Edge caching best for resources - where cache rules are simple •  Edge caching providers: Edgecast, Amazon, Limelight, Akamai Data storage Object model Request controllers Front controller / views Web server Edge cache Load balancer Network proxy User agent (browser)
  16. Advanced purging using Varnish §  A record (eg a post)

    will appear on lots of index pages as well as it's own article page. §  Add tokens to output headers in PHP §  Cache server now knows which pages feature that post §  When the post changes, purge all at once: header('X-PostIDs: '.join(',', $postids_array)); > telnet varnish1 80 Connected to varnish1. PURGE /purge HTTP/1.1 X-PostID: 345 HTTP/1.1 204 Purged URL. Server: Varnish
  17. Caching post teasers foreach ($posts as $post) { $keys[$post->id] =

    "postteaser-" . $post->id . "-" . ($user->hasMinRole('member')?'member':'nonmember') . "-" . $uatype; } $cacheresults = $mc->getMulti($keys); $op = ''; foreach ($posts as $post) { if (!empty($cacheresults[$keys[$post->id]])) { $op .= $cacheresults[$keys[$post->id]]; } else { // Generate post teaser $teaserhtml = '...'; $twoweeks = (60 * 60 * 24 * 14); $mc->set($keys[$post->id], $teaserhtml, $twoweeks); $op .= $html; } }
  18. Don't aggregate data yourself §  'Most viewed/emailed' widgets §  Thinking

    about doing this? §  Well, don't. You're writing on every page load! BBC News Online Screenshot. UPDATE content SET viewcount=viewcount+1 WHERE contentid=5309342; BAD IDEA
  19. Use Google Analytics instead §  GA:PI - Excellent GA API

    library for PHP §  Retrieve list of URL paths in order of descending page views §  But what about AJAX / downloads / JavaScript actions? <a href="http://www.example.com" onClick="javascript: pageTracker._trackPageview('/outgoing/example.com');">
  20. Don't generate on demand §  Your Google Analytics based 'most

    viewed' content takes 2 seconds to generate §  You get 200 requests per second §  When the cached version expires, the next request will regenerate it §  But so will the next 399 requests! It's a thundering herd! §  Use a separate process to write to the cache, periodically or event driven, but not triggered by web requests. §  Scripts handling HTTP requests never write
  21. What happens when errors occur §  Usually embarrassing, but managing

    perception is important §  Twitter's fail whale is surprisingly popular:
  22. "They think I have nothing but a heirarchy based on

    IRC aliases! As 1337 as these guys are supposed to be they don't get it. I have pwned them! :)" Aaron Barr, CEO, HBGary
  23. SQL injection: Download user table §  Hbgaryfederal.com was susceptible to

    SQL injection §  Anonymous gains access to user database http://www.hbgaryfederal.com/pages.php?pageNav=2&page=27 becomes http://www.hbgaryfederal.com/pages.php?pageNav=2%3B %20SELECT%20%2A%20FROM%20users%3B&page=27
  24. Rainbow tables: decrypt passwords §  Users table contained hashed passwords

    §  BUT: •  Passwords were only hashed once •  No salt was used •  The CEO's password was a eight character string comprising six letters and two numbers §  Rainbow table decoded almost all the passwords instantly §  CEO's account gave admin access to the CMS
  25. Password reuse: gain SSH access §  COO's password for SSH

    was the same as the cracked CMS password §  Server was vulnerable to a privilege escalation vulnerability •  A patch had been available for 6 months §  Anonymous gain root access
  26. More password reuse: Email §  The HBGary CEO used Google

    Apps for email §  Same password as the cracked CMS §  His account was the administrator of the domain §  Anonymous gain access to HBGary and rootkit.com owner Greg Hoglund's mail account §  Account contains root password for rootkit.com server §  But you can't log in directly as root
  27. Social engineering: get an account From: Greg To: Jussi Subject:

    need to ssh into rootkit im in europe and need to ssh into the server. can you drop open up firewall and allow ssh through port 59022 or something vague? and is our root password still 88j4bb3rw0cky88 or did we change to 88Scr3am3r88 ? thanks From: Jussi To: Greg Subject: Re: need to ssh into rootkit ok, it should now accept from anywhere to 47152 as ssh. i am doing testing so that it works for sure. your password is changeme123 i am online so just shoot me if you need something. in europe, but not in finland? :-) _jussi
  28. Result §  All corporate email published to the world § 

    Complete user databases with cleartext passwords for both HBGary and rootkit.com published §  All backups and main website erased §  Anonymous publishes a statement... on hbgary's own site.
  29. Filtering input, encoding output §  Filter and validate inputs • 

    HTMLPurifier (http://htmlpurifier.org/) •  PHP's filter_var() and mysql_real_escape_string() •  Validate HTML against DTDs, particularly in publishing systems •  Try making your own DTD to exclude the stuff you don't like §  Encode outputs •  htmlentites, htmlspecialchars §  Store raw - storing data pre-encoded for display, presupposes where you are going to display it
  30. How to actually be secure §  Disable browser autocomplete on

    login forms §  Set session cookies with secure flag §  Serve whole site over HTTPS
  31. Avoiding dating disaster §  Every time is meaningless without a

    zone §  Your application may rely on more than one setting •  OS, Database, PHP config §  Daylight savings time changes many time zone offsets twice a year •  Date comparisons also require historical DST records §  May want to show end user the time in their local zone §  Work out what to store (UTC is good) §  Use PHP's DateTime object whenever you are handling dates and, read Derick's 'dating manual'
  32. PHPCodeSniffer §  Comes with lots of 'sniffs', but easy to

    write your own or customise Running pre_commit hooks [Code sniff precommit hook] FILE: /home/andrew/test ------------------------------------------------------------------- FOUND 1 ERROR(S) AND 0 WARNING(S) AFFECTING 1 LINE(S) ------------------------------------------------------------------- 339 | ERROR | Use of count() or sizeof() prohibited in while | | expression -------------------------------------------------------------------
  33. Checking syntax §  Sometimes editors will do it for you

    §  If not, easy to do it with PHP binary: php -l myfile.php Parse error: syntax error, unexpected T_ECHO in myfile.php on line 21 Errors parsing myfile.php
  34. CI tools §  Hudson, CruiseControl+PHPUnderControl, PHPunit, Selenium §  Run tests

    automatically after every commit §  Set up alerts by email / status screen
  35. Thanks §  [email protected] §  We're hiring PHP experts, JavaScript gurus

    and UX masters. §  assanka.net/jobs §  http://www.flickr.com/photos/57158820@N00/920872985 §  http://www.flickr.com/photos/exotictransport/163976659/ §  http://www.flickr.com/photos/erazmilic/178574918/