This presentation explores common mistakes made by programmers when dealing with Unicode support and character encodings on the Web. For each mistake, I explain how to fix/prevent it, but also how it could possibly be exploited.
Iterate over all symbols in a string function getSymbols(string) { var length = string.length; var index = -1; var output = []; var character; var charCode; while (++index < length) { character = string.charAt(index); charCode = character.charCodeAt(0); if (charCode >= 0xD800 && charCode <= 0xDBFF) { // note: this doesn’t account for lone high surrogates output.push(character + string.charAt(++index)); } else { output.push(character); } } return output; } var symbols = getSymbols('! '); symbols.forEach(function(symbol) { assert(symbol == '! '); });
var data = '"Hello\u2028"'; // JSON-formatted data containing a string // containing an (unescaped!) Line Separator eval('(' + data + ')'); // h SyntaxError: Unexpected token ILLEGAL JSON.parse(data); // h 'Hello\u2028'
var data = 'foo\u2028'; var serialized = JSON.stringify(data); // h '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc
var data = 'foo\u2028'; var serialized = JSON.stringify(data); // h '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc
var string = String.fromCharCode(0xD800); // a string containing an (unescaped!) // lone surrogate var data = JSON.stringify(string); // the same string as JSON-formatted data storeInDatabaseAsUtf8(data); // h error/crash sendOverWebSocketConnection(data); // h error/crash/DoS
var data = 'foo\uD800'; var serialized = JSON.stringify(data); // h '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc
var data = 'foo\uD800'; var serialized = JSON.stringify(data); // h '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc
CVE-2013-4338 “wp-includes/functions.php in WordPress before 3.6.1 does not properly determine whether data has been serialize()d, which allows remote attackers to execute arbitrary code by triggering erroneous PHP unserialize() operations.” https://mths.be/brq
WordPress Before writing it to the database, data gets serialized only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true https://mths.be/brq
WordPress Before writing it to the database, data gets serialized only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true https://mths.be/brq uses MySQL’s ✌utf8✌
CVE-2015-8562 “Joomla! 1.5.x, 2.x, and 3.x before 3.4.6 allow remote attackers to conduct PHP object injection attacks and execute arbitrary PHP code via the HTTP User-Agent header, as exploited in the wild in December 2015.” https://mths.be/bvg
https://mths.be/bvh Exploit 1. Serialize a specially-crafted object containing PHP code to be executed 2. Use that as HTTP User-Agent header value, with ! as suffix