Pro Yearly is on sale from $80 to $50! »

Hacking with Unicode

Hacking with Unicode

This presentation explores common mistakes made by programmers when dealing with Unicode support and character encodings on the Web. For each mistake, I explain how to fix/prevent it, but also how it could possibly be exploited.

Event: HackPra — the hacking lecture at the Ruhr University in Bochum
Video: https://www.youtube.com/watch?v=qFfjJ8pOrWY&hd=1 (use these slides though, not the ones in the video)
Links: http://lanyrd.com/2014/hackpra/sczxgz/

2016 update: https://speakerdeck.com/mathiasbynens/hacking-with-unicode-in-2016

24e08a9ea84deb17ae121074d0f17125?s=128

Mathias Bynens

May 21, 2014
Tweet

Transcript

  1. @mathias · #hackpra Hacking with Unicode

  2. @mathias · #hackpra 1337 unic0de h4xx

  3. @mathias

  4. 1. Unicode 2. Encodings for Unicode 3. Unicode in JavaScript

    4. Unicode in MySQL 5. Hacking with Unicode What we’ll cover
  5. 1. Unicode

  6. code point unique name symbol/glyph

  7. A LATIN CAPITAL LETTER A U+0041

  8. a LATIN SMALL LETTER A U+0061

  9. © COPYRIGHT SIGN U+00A9

  10. ‚ SNOWMAN U+2603

  11. PILE OF POO U+1F4A9 !

  12. U+000000 → U+10FFFF

  13. (0x10FFFF + 1) code points ! ↓ ! 17 planes

    (0xFFFF + 1) code points each
  14. Unicode plane #1 U+0000 → U+FFFF Basic Multilingual Plane

  15. Unicode planes #2-17 ! U+010000 → U+10FFFF ! supplementary planes

    astral planes
  16. 2. Encodings

  17. Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 U+000000

    – U+00007F 1 2 4 1 1 U+000080 – U+00009F 2 2 for characters inherited from! GB 2312/GBK (e.g. most! Chinese characters);! 4 for! everything else U+0000A0 – U+0003FF 2 U+000400 – U+0007FF 3 U+000800 – U+003FFF 3 U+004000 – U+00FFFF 4 U+010000 – U+03FFFF 4 4 4 U+040000 – U+10FFFF 5 Number of bytes per code point
  18. Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 U+000000

    – U+00007F 1 2 4 1 1 U+000080 – U+00009F 2 2 for characters inherited from! GB 2312/GBK (e.g. most! Chinese characters);! 4 for! everything else U+0000A0 – U+0003FF 2 U+000400 – U+0007FF 3 U+000800 – U+003FFF 3 U+004000 – U+00FFFF 4 U+010000 – U+03FFFF 4 4 4 U+040000 – U+10FFFF 5 Number of bytes per code point
  19. None
  20. IETF RFCs

  21. IETF RFCs ✘

  22. None
  23. None
  24. None
  25. 3. JavaScript " Unicode http://mths.be/jsu

  26. None
  27. None
  28. JavaScript has a Unicode problem http://mths.be/jsu

  29. None
  30. Hexadecimal escape sequences >> '\x41\x42\x43' 'ABC' >> '\x61\x62\x63' 'abc' can

    be used for U+0000 → U+00FF
  31. Unicode escape sequences >> '\u0041\u0042\u0043' 'ABC' >> 'I \u2661 JavaScript!'

    'I ὑ JavaScript!' can be used for U+0000 → U+FFFF
  32. …what about astral code points?

  33. …what about !? *…and other, equally important astral symbols *

  34. !

  35. Unicode code point escapes >> '\u{41}\u{42}\u{43}' 'ABC' >> '\u{1F4A9}' '!'

    // U+1F4A9 ! can be used for U+000000 → U+10FFFF ES6
  36. Surrogate pairs >> '\uD83D\uDCA9' '!' // U+1F4A9 ! can be

    used for U+010000 → U+10FFFF
  37. Surrogate pairs // for astral code points (> 0xFFFF) function

    getSurrogates(codePoint) { var high = Math.floor((codePoint - 0x10000) / 0x400) + 0xD800; var low = (codePoint - 0x10000) % 0x400 + 0xDC00; return [ high, low ]; } ! function getCodePoint(high, low) { var codePoint = (high - 0xD800) * 0x400 + low - 0xDC00 + 0x10000; return codePoint; } ! >> getSurrogates(0x1F4A9); // U+1F4A9 is ! [ 0xD83D, 0xDCA9 ] >> getCodePoint(0xD83D, 0xDCA9); 0x1F4A9 http://mths.be/bed
  38. JavaScript string length >> 'A'.length // U+0041 1 >> 'A'

    == '\u0041' true >> 'B'.length // U+0042 1 >> 'B' == '\u0042' true
  39. String length ≠ char count >> '!'.length // U+1D400 2

    >> '!' == '\uD835\uDC00' true >> '"'.length // U+1D401 2 >> '"' == '\uD835\uDC01' true
  40. String length ≠ char count >> '!'.length // U+1F4A9 2

    >> '!' == '\uD83D\uDCA9' true insert obligatory “number two” joke here
  41. Real-world example

  42. Real-world example

  43. None
  44. None
  45. String character count function countSymbols(string) { return punycode.ucs2.decode(string).length; } !

    >> countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 http://mths.be/punycode
  46. String character count function countSymbols(string) { return Array.from(string).length; } !

    >> countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 ES6
  47. JavaScript escape sequences http://mths.be/bmf

  48. If we’re being pedantic… // it’s actually even more complicated:

    ! >> 'mañana' == 'mañana' false
  49. If we’re being pedantic… // it’s actually even more complicated:

    ! >> 'mañana' == 'mañana' false >> 'ma\xF1ana' == 'man\u0303ana' false >> 'ma\xF1ana'.length 6 >> 'man\u0303ana'.length 7
  50. function countSymbolsPedantically(string) { // Unicode Normalization, NFC form: var normalized

    = string.normalize('NFC'); // Account for astral symbols / surrogates: return Array.from(normalized).length; } ! >> countSymbolsPedantically('mañana') // U+00F1 6 >> countSymbolsPedantically('mañana') // U+006E + U+0303 6 Unicode normalization http://git.io/unorm ES6
  51. Perfect? >> var zalgo = 'H ̹̙̦̮͉̩̗̗ ͧ̇̏̊̾ Eͨ͆͒̆ͮ̃ ͏̷̮̣̫̤̣

    ̵̞̹̻ ̀̉̓ͬ͑͡ ͅ Cͯ̂͐ ͏̨̛͔̦̟͈̻ O ̜͎͍͙͚̬̝̣ ̽ͮ͐͗̀ͤ̍̀ ͢ M ̴̡̲̭͍͇̼̟̯̦ ̉̒͠ Ḛ̛̙̞̪̗ ͥ ͤͩ̾͑̔͐ ͅ Ṯ̴̷̷̗̼͍ ̿̿̓̽͐ H ̙̙ ̔̄ ͜ ';
  52. Perfect? Nope. → can be ‘fixed’ using epic regex-fu >>

    var zalgo = 'H ̹̙̦̮͉̩̗̗ ͧ̇̏̊̾ Eͨ͆͒̆ͮ̃ ͏̷̮̣̫̤̣ ̵̞̹̻ ̀̉̓ͬ͑͡ ͅ Cͯ̂͐ ͏̨̛͔̦̟͈̻ O ̜͎͍͙͚̬̝̣ ̽ͮ͐͗̀ͤ̍̀ ͢ M ̴̡̲̭͍͇̼̟̯̦ ̉̒͠ Ḛ̛̙̞̪̗ ͥ ͤͩ̾͑̔͐ ͅ Ṯ̴̷̷̗̼͍ ̿̿̓̽͐ H ̙̙ ̔̄ ͜ '; ! >> countSymbolsPedantically(zalgo) 116 // not 9
  53. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); }
  54. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); } ! >> reverse('abc') 'cba'
  55. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); } ! >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam'
  56. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); } ! >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anãnam'
  57. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); } ! >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anãnam' >> reverse('!') // U+1F4A9 '��' '\uDCA9\uD83D' // the surrogate pair for !, in the wrong order
  58. “I put my thang down, flip it, and reverse it”

    — Missy ‘Misdemeanor’ Elliot, 2002
  59. Reversing a string in JavaScript // Using the Esrever library

    var reverse = esrever.reverse; ! >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anañam' >> reverse('!') // U+1F4A9 '!' http://mths.be/esrever
  60. This behavior affects other string methods, too.

  61. String.fromCharCode() >> String.fromCharCode(0x0041) // U+0041 'A' // U+0041 >> String.fromCharCode(0x1F4A9)

    // U+1F4A9 '!' // U+F4A9 ! only works as you’d expect for U+0000 → U+FFFF
  62. String.fromCharCode() → use surrogate pairs for astral symbols: ! >>

    String.fromCharCode(0xD83D, 0xDCA9) '!' // U+1F4A9 ! → or just use Punycode.js: ! >> punycode.ucs2.encode([ 0x1F4A9 ]) '!' // U+1F4A9
  63. String.fromCodePoint() >> String.fromCodePoint(0x1F4A9) '!' ! can be used for U+000000

    → U+10FFFF ES6 http://mths.be/fromcodepoint
  64. String#charAt() >> '!'.charAt(0) // U+1F4A9 '\uD83D' // U+D83D

  65. String#at() >> '!'.at(0) // U+1F4A9 '!' // U+1F4A9 ES7 http://mths.be/at

  66. String#charCodeAt() >> '!'.charCodeAt(0) // U+1F4A9 0xD83D

  67. String#codePointAt() >> '!'.codePointAt(0) // U+1F4A9 0x1F4A9 ES6 http://mths.be/codepointat

  68. Iterate over all symbols in a string function getSymbols(string) {

    var length = string.length; var index = -1; var output = []; var character; var charCode; while (++index < length) { character = string.charAt(index); charCode = character.charCodeAt(0); if (charCode >= 0xD800 && charCode <= 0xDBFF) { // note: this doesn’t account for lone high surrogates output.push(character + string.charAt(++index)); } else { output.push(character); } } return output; } ! var symbols = getSymbols('! '); symbols.forEach(function(symbol) { assert(symbol == '! '); });
  69. Iterate over all symbols in a string for (let symbol

    of '!') { assert(symbol == '!'); } ES6
  70. More string madness •String#substring •String#slice •…anything that involves strings

  71. Regular expressions >> /foo.bar/.test('foo!bar') false

  72. Match any Unicode symbol >> /^.$/.test('!') false // doesn’t match

    line breaks, either
  73. Match any Unicode symbol >> /^.$/.test('!') false // doesn’t match

    line breaks, either ! >> /^[\s\S]$/.test('!') false // matches line breaks, but still doesn’t match whole astral symbols
  74. Match any Unicode symbol >> /^.$/.test('!') false // doesn’t match

    line breaks, either ! >> /^[\s\S]$/.test('!') false // matches line breaks, but still doesn’t match whole astral symbols ! >> /^[\0-\uD7FF\uDC00-\uFFFF]|[\uD800- \uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF] $/.test('!') true // wtf
  75. Create Unicode-aware regular expressions >> regenerate().addRange(0x0, 0x10FFFF).toString() http://mths.be/regenerate

  76. Create Unicode-aware regular expressions >> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' http://mths.be/regenerate

  77. >> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() http://mths.be/regenerate Create Unicode-aware

    regular expressions
  78. >> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF)

    // add all Unicode code points http://mths.be/regenerate Create Unicode-aware regular expressions
  79. >> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF)

    // add all Unicode code points …… .removeRange('A', 'z') // remove all symbols from `A` to `z` http://mths.be/regenerate Create Unicode-aware regular expressions
  80. >> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF)

    // add all Unicode code points …… .removeRange('A', 'z') // remove all symbols from `A` to `z` …… .remove('!') // remove U+1F4A9 PILE OF POO http://mths.be/regenerate Create Unicode-aware regular expressions
  81. >> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF)

    // add all Unicode code points …… .removeRange('A', 'z') // remove all symbols from `A` to `z` …… .remove('!') // remove U+1F4A9 PILE OF POO …… .toString(); http://mths.be/regenerate Create Unicode-aware regular expressions
  82. >> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF)

    // add all Unicode code points …… .removeRange('A', 'z') // remove all symbols from `A` to `z` …… .remove('!') // remove U+1F4A9 PILE OF POO …… .toString(); '[\0-\x1F\x21-\x40\x7B-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00- \uDFFF]|[\uD800-\uDBFF]' http://mths.be/regenerate Create Unicode-aware regular expressions
  83. >> var regenerate = require('regenerate'); >> var symbols = require('unicode-7.0.0/scripts/Greek/symbols');

    >> var set = regenerate(symbols); >> set.toString(); http://mths.be/regenerate Create Unicode-aware regular expressions http://mths.be/node-unicode-data
  84. >> var regenerate = require('regenerate'); >> var symbols = require('unicode-7.0.0/scripts/Greek/symbols');

    >> var set = regenerate(symbols); >> set.toString(); '[\u0370-\u0373\u0375-\u0377\u037A-\u037D\u037F\u0384\u0386\u0388- \u038A\u038C\u038E-\u03A1\u03A3-\u03E1\u03F0-\u03FF\u1D26-\u1D2A \u1D5D-\u1D61\u1D66-\u1D6A\u1DBF\u1F00-\u1F15\u1F18-\u1F1D\u1F20- \u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D \u1F80-\u1FB4\u1FB6-\u1FC4\u1FC6-\u1FD3\u1FD6-\u1FDB\u1FDD-\u1FEF \u1FF2-\u1FF4\u1FF6-\u1FFE\u2126\uAB65]|\uD800[\uDD40-\uDD8C\uDDA0]| \uD834[\uDE00-\uDE45]' http://mths.be/regenerate Create Unicode-aware regular expressions http://mths.be/node-unicode-data
  85. Regular expressions >> /foo.bar/.test('foo!bar') false ! >> /foo.bar/u.test('foo!bar') true ES6

  86. Regex character classes >> /[a-c]/ // matches: // U+0061 LATIN

    SMALL LETTER A // U+0062 LATIN SMALL LETTER B // U+0063 LATIN SMALL LETTER C >> /^[a-c]$/.test('a') true >> /^[a-c]$/.test('b') true >> /^[a-c]$/.test('c') true
  87. >> /[!-"]/ // matches: // U+1F4A9 PILE OF POO //

    U+1F4AA FLEXED BICEPS // U+1F4AB DIZZY SYMBOL >> /^[!-"]$/.test('!') true >> /^[!-"]$/.test('#') true >> /^[!-"]$/.test('"') true Regex character classes
  88. >> /[!-"]/ // matches: // U+1F4A9 PILE OF POO //

    U+1F4AA FLEXED BICEPS // U+1F4AB DIZZY SYMBOL >> /^[!-"]$/.test('!') true >> /^[!-"]$/.test('#') true >> /^[!-"]$/.test('"') true Regex character classes ✘
  89. Regex character classes >> /[!-"]/ SyntaxError: Invalid regular expression: Range

    out of order in character class
  90. Regex character classes >> /[!-"]/ SyntaxError: Invalid regular expression: Range

    out of order in character class >> /[\uD83D\uDCA9-\uD83D\uDCAB]/
  91. Regex character classes >> /[!-"]/ SyntaxError: Invalid regular expression: Range

    out of order in character class >> /[\uD83D\uDCA9-\uD83D\uDCAB]/
  92. Regex character classes ES6 ✔ >> /[!-"]/u // matches: //

    U+1F4A9 PILE OF POO // U+1F4AA FLEXED BICEPS // U+1F4AB DIZZY SYMBOL >> /^[!-"]$/u.test('!') true >> /^[!-"]$/u.test('#') true >> /^[!-"]$/u.test('"') true
  93. >> regenerate().addRange('!', '"').toString() '\uD83D[\uDCA9-\uDCAB]' >> /^\uD83D[\uDCA9-\uDCAB]$/.test('!') true >> /^\uD83D[\uDCA9-\uDCAB]$/.test('#') true

    >> /^\uD83D[\uDCA9-\uDCAB]$/.test('"') true Regex character classes http://mths.be/regenerate
  94. JavaScript has a Unicode problem http://mths.be/jsu

  95. ! The Pile of Poo Test™ http://mths.be/jsu

  96. ! is the new %00

  97. 4. MySQL $ Unicode

  98. CREATE TABLE `table_name` ( `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,

    `column_name` VARCHAR(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  99. CREATE TABLE `table_name` ( `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,

    `column_name` VARCHAR(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  100. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)
  101. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)
  102. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)
  103. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)
  104. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec) ✘
  105. MySQL’s ✌️utf8✌️

  106. MySQL’s ✌️utf8✌️

  107. ALTER TABLE `table_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

    http://mths.be/utf8mb4
  108. ALTER TABLE `table_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

    http://mths.be/utf8mb4
  109. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) http://mths.be/utf8mb4
  110. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) http://mths.be/utf8mb4
  111. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) http://mths.be/utf8mb4
  112. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) http://mths.be/utf8mb4 ✔
  113. 5. Hacking with Unicode

  114. http://mths.be/brk

  115. $ curl -sL http://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0
  116. $ curl -sL http://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0
  117. $ curl -sL http://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0
  118. $ curl -sL http://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0  (U+3C73 CJK UNIFIED IDEOGRAPH-3C73)
  119. $ curl -sL http://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0 ᷼ (U+6372 CJK UNIFIED IDEOGRAPH-6372)
  120. $ curl -sL http://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0  (U+6970 CJK UNIFIED IDEOGRAPH-6970)
  121. http://mths.be/brm

  122. http://mths.be/brm ∀scriptalert(1)/script

  123. http://mths.be/brm ∀scriptalert(1)/script

  124. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c
  125. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c
  126. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c ∀ (U+2200 FOR ALL)
  127. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c  (U+3E00 CJK UNIFIED IDEOGRAPH-3E00)
  128. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c  (U+3C00 CJK UNIFIED IDEOGRAPH-3C00)
  129. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c s (U+0073 LATIN SMALL LETTER S)
  130. None
  131. None
  132. http://mths.be/brl

  133. http://mths.be/brl OBAMA vs. ᴼᴮᴬᴹᴬ

  134. JavaScript vs. JSON

  135. <script> // <?php echo strip($userInput); ?> </script> ! <?php /*

    Note: `strip()` strips ASCII newlines and `</script`. */ ?>
  136. http://mths.be/brn

  137. http://mths.be/brn

  138. <script> // <?php echo strip($userInput); ?> </script> foo[U+2028]alert('XSS')

  139. <script> // <?php echo strip($userInput); ?> </script> ✘ foo[U+2028]alert('XSS')

  140. JSON ∉ JavaScript

  141. var data = '"Hello\u2028"'; // JSON-formatted data containing a string

    // containing an (unescaped!) Line Separator ! eval('(' + data + ')'); // i SyntaxError: Unexpected token ILLEGAL ! JSON.parse(data); // i 'Hello\u2028'
  142. Always escape JSON-formatted data before passing it to a JavaScript

    parser. http://mths.be/jsesc
  143. http://mths.be/jsesc

  144. var data = 'foo\u2028'; ! var serialized = JSON.stringify(data); //

    i '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc
  145. var data = 'foo\u2028'; ! var serialized = JSON.stringify(data); //

    i '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc
  146. var data = 'foo\u2028'; ! var serialized = JSON.stringify(data); //

    i '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc
  147. var string = String.fromCharCode(0xD800); // a string containing an (unescaped!)

    // lone surrogate ! var data = JSON.stringify(string); // the same string as JSON-formatted data ! storeInDatabaseAsUtf8(data); // " error/crash ! sendOverWebSocketConnection(data); // " error/crash/DoS
  148. Always escape JSON-formatted data before passing it to a UTF-8

    encoder. http://mths.be/jsesc
  149. http://mths.be/jsesc

  150. var data = 'foo\uD800'; ! var serialized = JSON.stringify(data); //

    i '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc
  151. var data = 'foo\uD800'; ! var serialized = JSON.stringify(data); //

    i '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc
  152. var data = 'foo\uD800'; ! var serialized = JSON.stringify(data); //

    i '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc
  153. Phabricator

  154. Phabricator

  155. None
  156. None
  157. None
  158. uses MySQL

  159. uses MySQL’s ✌️utf8✌️

  160. None
  161. None
  162. http://mths.be/bro

  163. RCE in WordPress < 3.6.1 http://mths.be/brq

  164. CVE-2013-4338 “wp-includes/functions.php in WordPress before 3.6.1 does not properly determine

    whether data has been serialize()d, which allows remote attackers to execute arbitrary code by triggering erroneous PHP unserialize() operations.” http://mths.be/brq
  165. function is_serialized( $data ) { $data = trim( $data );

    $length = strlen( $data ); $lastc = $data[$length - 1]; if ( ';' !== $lastc && '}' !== $lastc ) return false; $token = $data[0]; switch ( $token ) { case 's' : if ( '"' !== $data[$length - 2] ) return false; case 'a' : case 'O' : return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data ); case 'b' : case 'i' : case 'd' : return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/", $data ); } return false; } http://mths.be/brq
  166. function is_serialized( $data ) { $data = trim( $data );

    $length = strlen( $data ); $lastc = $data[$length - 1]; if ( ';' !== $lastc && '}' !== $lastc ) return false; $token = $data[0]; switch ( $token ) { case 's' : if ( '"' !== $data[$length - 2] ) return false; case 'a' : case 'O' : return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data ); case 'b' : case 'i' : case 'd' : return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/", $data ); } return false; } http://mths.be/brq
  167. WordPress Before writing it to the database, data gets serialized

    only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) ! After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true http://mths.be/brq
  168. WordPress Before writing it to the database, data gets serialized

    only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) ! After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true http://mths.be/brq uses MySQL’s ✌️utf8✌️
  169. http://mths.be/brq

  170. http://mths.be/brq

  171. http://mths.be/brq

  172. http://mths.be/brq

  173. class Foo { private $command; public function setCommand($command) { $this->command

    = $command; } public function __destruct() { if ($this->command) { shell_exec($this->command); } } } ! $object = new Foo(); $object->setCommand('echo "pwned!" > /tmp/pwned.txt'); $serialized = serialize($object); $payload = $serialized . '!'; http://mths.be/brq
  174. http://mths.be/brq

  175. None
  176. “[The following C# code] takes a provided HTML string and

    removes any potentially dangerous XSS HTML tags using a whitelist approach.” http://mths.be/brp
  177. private static Regex _whitelist = new Regex(@" ^</?(a|b(lockquote)?|code|em|h(1|2|3)|i|li|ol|p(re)?|s(ub| up|trong|trike)?|ul)>$ |^<(b|h)r\s?/?>$

    |^<a[^>]+>$ |^<img[^>]+/?>$", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.Compiled ); http://mths.be/brp
  178. /// <summary> /// sanitize any potentially dangerous tags from the

    provided /// raw HTML input using a whitelist based approach, leaving /// the "safe" HTML tags /// </summary> public static string Sanitize(string html) { var tagname = ""; Match tag; var tags = _tags.Matches(html); // iterate through all HTML tags in the input for (int i = tags.Count-1; i > -1; i--) { tag = tags[i]; tagname = tag.Value.ToLower(); if (!_whitelist.IsMatch(tagname)) { // not on our whitelist? I SAY GOOD DAY TO YOU, SIR. GOOD DAY! html = html.Remove(tag.Index, tag.Length); } http://mths.be/brp
  179. /// <summary> /// sanitize any potentially dangerous tags from the

    provided /// raw HTML input using a whitelist based approach, leaving /// the "safe" HTML tags /// </summary> public static string Sanitize(string html) { var tagname = ""; Match tag; var tags = _tags.Matches(html); // iterate through all HTML tags in the input for (int i = tags.Count-1; i > -1; i--) { tag = tags[i]; tagname = tag.Value.ToLower(); if (!_whitelist.IsMatch(tagname)) { // not on our whitelist? I SAY GOOD DAY TO YOU, SIR. GOOD DAY! html = html.Remove(tag.Index, tag.Length); } http://mths.be/brp
  180. /// <summary> /// sanitize any potentially dangerous tags from the

    provided /// raw HTML input using a whitelist based approach, leaving /// the "safe" HTML tags /// </summary> public static string Sanitize(string html) { var tagname = ""; Match tag; var tags = _tags.Matches(html); // iterate through all HTML tags in the input for (int i = tags.Count-1; i > -1; i--) { tag = tags[i]; tagname = tag.Value.ToLower(); if (!_whitelist.IsMatch(tagname)) { // not on our whitelist? I SAY GOOD DAY TO YOU, SIR. GOOD DAY! html = html.Remove(tag.Index, tag.Length); } http://mths.be/brp
  181. else if (tagname.StartsWith("<a")) { // detailed <a> tag checking if

    (!IsMatch(tagname, @"<a\s href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"" (\stitle=""[^""]+"")?\s?>")) { html = html.Remove(tag.Index, tag.Length); } } else if (tagname.StartsWith("<img")) { // detailed <img> tag checking if (!IsMatch(tagname, @"<img\s src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+"" (\swidth=""\d{1,3}"")? (\sheight=""\d{1,3}"")? (\salt=""[^""]*"")? http://mths.be/brp
  182. (\stitle=""[^""]*"")? \s?/?>")) { html = html.Remove(tag.Index, tag.Length); } } }

    return html; } http://mths.be/brp
  183. else if (tagname.StartsWith("<a")) { // detailed <a> tag checking if

    (!IsMatch(tagname, @"<a\s href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"" (\stitle=""[^""]+"")?\s?>")) { html = html.Remove(tag.Index, tag.Length); } } else if (tagname.StartsWith("<img")) { // detailed <img> tag checking if (!IsMatch(tagname, @"<img\s src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+"" (\swidth=""\d{1,3}"")? (\sheight=""\d{1,3}"")? (\salt=""[^""]*"")? http://mths.be/brp
  184. else if (tagname.StartsWith("<a")) { // detailed <a> tag checking if

    (!IsMatch(tagname, @"<a\s href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"" (\stitle=""[^""]+"")?\s?>")) { html = html.Remove(tag.Index, tag.Length); } } else if (tagname.StartsWith("<img")) { // detailed <img> tag checking if (!IsMatch(tagname, @"<img\s src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+"" (\swidth=""\d{1,3}"")? (\sheight=""\d{1,3}"")? (\salt=""[^""]*"")? http://mths.be/brp
  185. else if (tagname.StartsWith("<a")) { // detailed <a> tag checking if

    (!IsMatch(tagname, @"<a\s href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"" (\stitle=""[^""]+"")?\s?>")) { html = html.Remove(tag.Index, tag.Length); } } else if (tagname.StartsWith("<img")) { // detailed <img> tag checking if (!IsMatch(tagname, @"<img\s src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+"" (\swidth=""\d{1,3}"")? (\sheight=""\d{1,3}"")? (\salt=""[^""]*"")? http://mths.be/brp <img ̊ src="404" onerror="alert('XSS')">
  186. else if (tagname.StartsWith("<a")) { // detailed <a> tag checking if

    (!IsMatch(tagname, @"<a\s href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"" (\stitle=""[^""]+"")?\s?>")) { html = html.Remove(tag.Index, tag.Length); } } else if (tagname.StartsWith("<img")) { // detailed <img> tag checking if (!IsMatch(tagname, @"<img\s src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+"" (\swidth=""\d{1,3}"")? (\sheight=""\d{1,3}"")? (\salt=""[^""]*"")? http://mths.be/brp <img ̊ src="404" onerror="alert('XSS')">
  187. else if (tagname.StartsWith("<a")) { // detailed <a> tag checking if

    (!IsMatch(tagname, @"<a\s href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)"" (\stitle=""[^""]+"")?\s?>")) { html = html.Remove(tag.Index, tag.Length); } } else if (tagname.StartsWith("<img")) { // detailed <img> tag checking if (!IsMatch(tagname, @"<img\s src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+"" (\swidth=""\d{1,3}"")? (\sheight=""\d{1,3}"")? (\salt=""[^""]*"")? http://mths.be/brp <img ̊ src="404" onerror="alert('XSS')"> <img ̊ src="404" onerror="alert('XSS')">
  188. http://mths.be/brp

  189. TweetDeck

  190. None
  191. None
  192. None
  193. function getTweetHtml(tweet) { var htmlEscapedTweet = htmlEscape(tweet); if (containsEmoji(htmlEscapedTweet)) {

    return replaceSymbolsWithImgTags( AccidentallyUndoHtmlEscaping(htmlEscapedTweet) ); } else { return htmlEscapedTweet; } } http://mths.be/bsq
  194. Thanks! Questions? → @mathias

  195. None
  196. http://mths.be/brr

  197. http://mths.be/brr