Slide 1

Slide 1 text

Growing up with PHP Deploying enterprise scale applications with PHP Andrew Betts, Assanka

Slide 2

Slide 2 text

Stuff to think about §  Output caching §  Object caching §  Opcode caching §  Sessions and state §  Reads vs writes §  Graceful failure §  Search §  Localisation and validation §  Design patterns §  Standards §  Testing & QA §  Deployment process §  Authentication §  Security §  Documentation §  Bug tracking §  Version control

Slide 3

Slide 3 text

Your application stack Data storage Object model Request controllers Front controller / views Web server Edge cache Load balancer Network proxy User agent (browser) Firefox, Chrome, IE, Safari, Opera, Blackberry ISP or corporate level, eg Squid Edgecast, Amazon cloudfront, Akamai Squid, Varnish, Nginx Apache, IIS, Lighttpd PHP script executed by web server, template engine PHP to handle this URI / request type PHP for data access abstraction MySQL, MongoDB, Postgresql, flatfile Outside your control Your network architecture PHP app

Slide 4

Slide 4 text

Fear sessions, and you will scale well. The master Jedi plans ahead.

Slide 5

Slide 5 text

Expires: Thu, 19 Nov 1981 08:52:00 GMT Set-Cookie: path=/; phpsessid=b7977f7c69eb898bf42526652dda4c6c Sessions bork caching

Slide 6

Slide 6 text

Hi! I'm "b7977f7c69e" Client Server 1 name: Andrew logindate: 2009-02-28 userid: 453245 Client Server 2 Huh? Sessions are local to each server Hi! I'm still "b7977f7c69e"

Slide 7

Slide 7 text

Session access is sequential vs.

Slide 8

Slide 8 text

1 Inject session state information

Slide 9

Slide 9 text

Session only needed for tiny bit?

Slide 10

Slide 10 text

Solution §  Generate non-personalised content and personal content separately §  Merge the two later

Slide 11

Slide 11 text

Options §  JavaScript •  Implemented by user agent •  Closest to the user, generally best server performance •  But can cause icky user experience §  Edge side includes (ESI) •  At load balancer or edge level •  Supported by Varnish, Akamai §  Server side includes (SSI) or PHP •  Implemented by web server/PHP •  Caching needed at application level Data storage Object model Request controllers Web server Edge cache Load balancer Network proxy User agent (browser) Front controller / views

Slide 12

Slide 12 text

loadSession({ un:'Andrew', em:'[email protected]', cartid:824457, cart:[ {id:12, desc:'Coal', qty:1, units:'lumps'}, {id:28, desc:'Rudolf jumper', qty:1, units:'jumpers'}, {id:82, desc:'Socks', qty:8, units:'pairs') ] }); Using JavaScript

Slide 13

Slide 13 text

<esi:include src="http://sessions.example.com/sessiondata" /> ... loadSession({ un:'Andrew', em:'[email protected]', cartid:824457, cart:[ {id:12, desc:'Coal', qty:1, units:'lumps'}, {id:28, desc:'Rudolf jumper', qty:1, units:'jumpers'}, {id:82, desc:'Socks', qty:8, units:'pairs') ] }); Using ESI

Slide 14

Slide 14 text

Result. §  Session locking is confined to just session-related activity •  Most scripts don't need to track sessions, so can load in parallel •  Scaling sessions is less critical because far less load is being placed on the session store. §  You can cache stuff •  Depending on where you merge in the personal data §  However... •  We're still relying on a single data store. We're just making fewer demands of it.

Slide 15

Slide 15 text

2 Use cookies for client-side session storage

Slide 16

Slide 16 text

Solution §  Simple, cheap horizontal session scaling §  Store all session state data in a cookie §  Sign it with a secret and a hash (eg. sha1) §  Timestamp allows you to expire it §  You can get a lot in there §  Remember it's not encrypted on the wire, and adds to your bandwidth (minimally) 27478932510|triblondon|1231936510|2,4,6,52,183|a152c24d9874ba15235f userid | username | sessionstart | groupmemberships | signature

Slide 17

Slide 17 text

Think before you session_start();

Slide 18

Slide 18 text

Caching. Not using intelligence when stupidity will do just fine.

Slide 19

Slide 19 text

How to cache TTL based §  Simple §  Easy to implement §  Users may be served stale content §  Some users will occasionally have to wait for content to be regenerated Update on change §  Must have control over storage system §  Need to know what to purge and when §  Content is never stale §  Updates can be pushed into the cache, so end users never wait

Slide 20

Slide 20 text

Where to cache Data storage Object model Request controllers Front controller / views Web server Edge cache Load balancer Network proxy User agent (browser)

Slide 21

Slide 21 text

3 Add query strings to enable far-future caching

Slide 22

Slide 22 text

Update-on-change without the purge §  /lib/img/my_website_header.png Expires: Sun, 17 Jan 2038 19:26:00 GMT But it’s OK - these are not the same object: §  /lib/img/my_website_header.png?v=2 §  /lib/img/my_website_header.png?v=3 §  /lib/img/my_website_header.png?v=4 Cripes

Slide 23

Slide 23 text

Result. §  Changing the filename or adding a query string will cause all browsers to re-request the file. §  All the benefits of long term caching §  No update latency §  Essentially an update-on-change strategy which is possible even though you can’t force the user’s cache to purge. Genius

Slide 24

Slide 24 text

How to remember to do it §  Don’t. §  Write this (or similar) instead: §  Then find/replace in a deploy script •  Use revision ID or unix timestamp §  What gets deployed: §  And it changes automatically every time you deploy new code! Tip

Slide 25

Slide 25 text

4 Use private directive to allow personal content to be cached

Slide 26

Slide 26 text

Caching customised pages §  Can't allow customised pages to be cached publicly (disaster!) §  But, just set 'private' in cache-control header: §  Page will be cached by end user's browser, but not by shared proxies §  Always make use of browser cache when you can Cache-control: max-age=86400, private

Slide 27

Slide 27 text

5 Use a load balancer to cache output

Slide 28

Slide 28 text

Load balancer vs edge caching §  CDN might be overkill •  But moving caching from load balancer level to edge level is easy later. §  Easier to invalidate Varnish •  Purge cache on write •  Ability to use advanced purging logic §  Varnish better for whole pages §  Edge caching best for resources - where cache rules are simple •  Edge caching providers: Edgecast, Amazon, Limelight, Akamai Data storage Object model Request controllers Front controller / views Web server Edge cache Load balancer Network proxy User agent (browser)

Slide 29

Slide 29 text

Advanced purging using Varnish §  A record (eg a post) will appear on lots of index pages as well as it's own article page. §  Add tokens to output headers in PHP §  Cache server now knows which pages feature that post §  When the post changes, purge all at once: header('X-PostIDs: '.join(',', $postids_array)); > telnet varnish1 80 Connected to varnish1. PURGE /purge HTTP/1.1 X-PostID: 345 HTTP/1.1 204 Purged URL. Server: Varnish

Slide 30

Slide 30 text

Are you cachable? §  http://www.ircache.net/cgi-bin/cacheability.py

Slide 31

Slide 31 text

6 Use memcached getMulti for cached list items

Slide 32

Slide 32 text

Caching post teasers foreach ($posts as $post) { $keys[$post->id] = "postteaser-" . $post->id . "-" . ($user->hasMinRole('member')?'member':'nonmember') . "-" . $uatype; } $cacheresults = $mc->getMulti($keys); $op = ''; foreach ($posts as $post) { if (!empty($cacheresults[$keys[$post->id]])) { $op .= $cacheresults[$keys[$post->id]]; } else { // Generate post teaser $teaserhtml = '...'; $twoweeks = (60 * 60 * 24 * 14); $mc->set($keys[$post->id], $teaserhtml, $twoweeks); $op .= $html; } }

Slide 33

Slide 33 text

Invalidating on edit $mc->delete('postteaser-'.$id.'-member-desktop'); $mc->delete('postteaser-'.$id.'-nonmember-desktop'); $mc->delete('postteaser-'.$id.'-member-mobile'); $mc->delete('postteaser-'.$id.'-nonmember-mobile'); $mc->delete('postteaser-'.$id.'-member-rss'); $mc->delete('postteaser-'.$id.'-nonmember-rss');

Slide 34

Slide 34 text

Overburden your site with writes, and you're going nowhere fast.

Slide 35

Slide 35 text

7 Using hosted analytics to avoid logging

Slide 36

Slide 36 text

Don't aggregate data yourself §  'Most viewed/emailed' widgets §  Thinking about doing this? §  Well, don't. You're writing on every page load! BBC News Online Screenshot. UPDATE content SET viewcount=viewcount+1 WHERE contentid=5309342; BAD IDEA

Slide 37

Slide 37 text

Use Google Analytics instead §  GA:PI - Excellent GA API library for PHP §  Retrieve list of URL paths in order of descending page views §  But what about AJAX / downloads / JavaScript actions?

Slide 38

Slide 38 text

8 Prep content in advance to avoid thundering herds

Slide 39

Slide 39 text

Don't generate on demand §  Your Google Analytics based 'most viewed' content takes 2 seconds to generate §  You get 200 requests per second §  When the cached version expires, the next request will regenerate it §  But so will the next 399 requests! It's a thundering herd! §  Use a separate process to write to the cache, periodically or event driven, but not triggered by web requests. §  Scripts handling HTTP requests never write

Slide 40

Slide 40 text

Learn to fail well, manage perception, and fix the problem.

Slide 41

Slide 41 text

What happens when errors occur §  Usually embarrassing, but managing perception is important §  Twitter's fail whale is surprisingly popular:

Slide 42

Slide 42 text

9 Show something sensible when your app falls over

Slide 43

Slide 43 text

Something more 'enterprisey'

Slide 44

Slide 44 text

10 Use custom error handlers to capture data

Slide 45

Slide 45 text

Investigate using data

Slide 46

Slide 46 text

Security is only as good as its weakest point.

Slide 47

Slide 47 text

A cautionary tale HBGary vs Anonymous

Slide 48

Slide 48 text

"They think I have nothing but a heirarchy based on IRC aliases! As 1337 as these guys are supposed to be they don't get it. I have pwned them! :)" Aaron Barr, CEO, HBGary

Slide 49

Slide 49 text

SQL injection: Download user table §  Hbgaryfederal.com was susceptible to SQL injection §  Anonymous gains access to user database http://www.hbgaryfederal.com/pages.php?pageNav=2&page=27 becomes http://www.hbgaryfederal.com/pages.php?pageNav=2%3B %20SELECT%20%2A%20FROM%20users%3B&page=27

Slide 50

Slide 50 text

Rainbow tables: decrypt passwords §  Users table contained hashed passwords §  BUT: •  Passwords were only hashed once •  No salt was used •  The CEO's password was a eight character string comprising six letters and two numbers §  Rainbow table decoded almost all the passwords instantly §  CEO's account gave admin access to the CMS

Slide 51

Slide 51 text

Password reuse: gain SSH access §  COO's password for SSH was the same as the cracked CMS password §  Server was vulnerable to a privilege escalation vulnerability •  A patch had been available for 6 months §  Anonymous gain root access

Slide 52

Slide 52 text

More password reuse: Email §  The HBGary CEO used Google Apps for email §  Same password as the cracked CMS §  His account was the administrator of the domain §  Anonymous gain access to HBGary and rootkit.com owner Greg Hoglund's mail account §  Account contains root password for rootkit.com server §  But you can't log in directly as root

Slide 53

Slide 53 text

Social engineering: get an account From: Greg To: Jussi Subject: need to ssh into rootkit im in europe and need to ssh into the server. can you drop open up firewall and allow ssh through port 59022 or something vague? and is our root password still 88j4bb3rw0cky88 or did we change to 88Scr3am3r88 ? thanks From: Jussi To: Greg Subject: Re: need to ssh into rootkit ok, it should now accept from anywhere to 47152 as ssh. i am doing testing so that it works for sure. your password is changeme123 i am online so just shoot me if you need something. in europe, but not in finland? :-) _jussi

Slide 54

Slide 54 text

Result §  All corporate email published to the world §  Complete user databases with cleartext passwords for both HBGary and rootkit.com published §  All backups and main website erased §  Anonymous publishes a statement... on hbgary's own site.

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

18 days later

Slide 57

Slide 57 text

11 Filter inputs and outputs to avoid XSS and SQL attacks

Slide 58

Slide 58 text

Filtering input, encoding output §  Filter and validate inputs •  HTMLPurifier (http://htmlpurifier.org/) •  PHP's filter_var() and mysql_real_escape_string() •  Validate HTML against DTDs, particularly in publishing systems •  Try making your own DTD to exclude the stuff you don't like §  Encode outputs •  htmlentites, htmlspecialchars §  Store raw - storing data pre-encoded for display, presupposes where you are going to display it

Slide 59

Slide 59 text

12 Use SSL throughout to avoid session hijacking

Slide 60

Slide 60 text

Firesheep

Slide 61

Slide 61 text

How to actually be secure §  Disable browser autocomplete on login forms §  Set session cookies with secure flag §  Serve whole site over HTTPS

Slide 62

Slide 62 text

13 Don't be too helpful if it exposes private data

Slide 63

Slide 63 text

Exposing existence of accounts

Slide 64

Slide 64 text

Being friendly means being local, because not everyone is like you.

Slide 65

Slide 65 text

14 Use DateTime to simplify time zone problems

Slide 66

Slide 66 text

Avoiding dating disaster §  Every time is meaningless without a zone §  Your application may rely on more than one setting •  OS, Database, PHP config §  Daylight savings time changes many time zone offsets twice a year •  Date comparisons also require historical DST records §  May want to show end user the time in their local zone §  Work out what to store (UTC is good) §  Use PHP's DateTime object whenever you are handling dates and, read Derick's 'dating manual'

Slide 67

Slide 67 text

Never stop testing, because the cock-ups only get bigger

Slide 68

Slide 68 text

15 Use PHPCodeSniffer and PHPLint pre-commit to check coding standards

Slide 69

Slide 69 text

PHPCodeSniffer §  Comes with lots of 'sniffs', but easy to write your own or customise Running pre_commit hooks [Code sniff precommit hook] FILE: /home/andrew/test ------------------------------------------------------------------- FOUND 1 ERROR(S) AND 0 WARNING(S) AFFECTING 1 LINE(S) ------------------------------------------------------------------- 339 | ERROR | Use of count() or sizeof() prohibited in while | | expression -------------------------------------------------------------------

Slide 70

Slide 70 text

Checking syntax §  Sometimes editors will do it for you §  If not, easy to do it with PHP binary: php -l myfile.php Parse error: syntax error, unexpected T_ECHO in myfile.php on line 21 Errors parsing myfile.php

Slide 71

Slide 71 text

16 Run continuous integration testing so you don't forget

Slide 72

Slide 72 text

CI tools §  Hudson, CruiseControl+PHPUnderControl, PHPunit, Selenium §  Run tests automatically after every commit §  Set up alerts by email / status screen

Slide 73

Slide 73 text

Finally, if you must. Open the doors

Slide 74

Slide 74 text

Thanks §  [email protected] §  We're hiring PHP experts, JavaScript gurus and UX masters. §  assanka.net/jobs §  http://www.flickr.com/photos/57158820@N00/920872985 §  http://www.flickr.com/photos/exotictransport/163976659/ §  http://www.flickr.com/photos/erazmilic/178574918/