Filtering and Escaping Sometimes are conflated, with side-effects being things like backslashes or entity tags found in stored data. This is all very well and good, but what happens if you want to put this into an email message or save out to a text document? “Strip out all tags!” can result in mangled content: eg. <3, <( ^o^)>, 1 < x < 5, “if foo <> bar then”
Filtering and Escaping Validation & Filtering: Checking for and getting rid of the nasties. Checking data is of the correct type, eg. email addresses, postcodes, message text. Stripping out control characters, fixing multibyte encoding shenanigans with iconv(). Escaping: Packaging data up for transport. mysql_real_escape_string() for MySQL strings. htmlentities($x, ENT_QUOTES, 'UTF-8'); for HTML. urlencode() for query params.
Filtering and Escaping Why don't we just kill any tags we find?<br/><IMG SRC=javascript:alert('a')><br/><img src=javascript:alert("a")><br/><img “””><script>alert('a')”> ipt:aler t('XSS')> ipt:aler t('XSS')>
Filtering and Escaping Yeah, no. The transport is HTML; package it appropriately. Using htmlentities($xsslol, ENT_QUOTES, 'UTF-8') will completely neuter most of this stuff. Use it even on the things you “trust” like $_SERVER['PHP_SELF'], or REQUEST_URI. It gets hard when you need to put user data into src=”” and style=”” fields; suggest using a whitelist instead, no matter how much of a pain it is to implement. (Or in the case of images and other files, generating the filename for them.)