Slide 1

Slide 1 text

@mathias · #hackpra Hacking with Unicode

Slide 2

Slide 2 text

@mathias · #hackpra 1337 unic0de h4xx

Slide 3

Slide 3 text

@mathias

Slide 4

Slide 4 text

1. Unicode 2. Encodings for Unicode 3. Unicode in JavaScript 4. Unicode in MySQL 5. Hacking with Unicode What we’ll cover

Slide 5

Slide 5 text

1. Unicode

Slide 6

Slide 6 text

code point unique name symbol/glyph

Slide 7

Slide 7 text

A LATIN CAPITAL LETTER A U+0041

Slide 8

Slide 8 text

a LATIN SMALL LETTER A U+0061

Slide 9

Slide 9 text

© COPYRIGHT SIGN U+00A9

Slide 10

Slide 10 text

‚ SNOWMAN U+2603

Slide 11

Slide 11 text

PILE OF POO U+1F4A9 !

Slide 12

Slide 12 text

U+000000 → U+10FFFF

Slide 13

Slide 13 text

(0x10FFFF + 1) code points ! ↓ ! 17 planes (0xFFFF + 1) code points each

Slide 14

Slide 14 text

Unicode plane #1 U+0000 → U+FFFF Basic Multilingual Plane

Slide 15

Slide 15 text

Unicode planes #2-17 ! U+010000 → U+10FFFF ! supplementary planes astral planes

Slide 16

Slide 16 text

2. Encodings

Slide 17

Slide 17 text

Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 U+000000 – U+00007F 1 2 4 1 1 U+000080 – U+00009F 2 2 for characters inherited from! GB 2312/GBK (e.g. most! Chinese characters);! 4 for! everything else U+0000A0 – U+0003FF 2 U+000400 – U+0007FF 3 U+000800 – U+003FFF 3 U+004000 – U+00FFFF 4 U+010000 – U+03FFFF 4 4 4 U+040000 – U+10FFFF 5 Number of bytes per code point

Slide 18

Slide 18 text

Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 U+000000 – U+00007F 1 2 4 1 1 U+000080 – U+00009F 2 2 for characters inherited from! GB 2312/GBK (e.g. most! Chinese characters);! 4 for! everything else U+0000A0 – U+0003FF 2 U+000400 – U+0007FF 3 U+000800 – U+003FFF 3 U+004000 – U+00FFFF 4 U+010000 – U+03FFFF 4 4 4 U+040000 – U+10FFFF 5 Number of bytes per code point

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

IETF RFCs

Slide 21

Slide 21 text

IETF RFCs ✘

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

3. JavaScript " Unicode http://mths.be/jsu

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

JavaScript has a Unicode problem http://mths.be/jsu

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Hexadecimal escape sequences >> '\x41\x42\x43' 'ABC' >> '\x61\x62\x63' 'abc' can be used for U+0000 → U+00FF

Slide 32

Slide 32 text

Unicode escape sequences >> '\u0041\u0042\u0043' 'ABC' >> 'I \u2661 JavaScript!' 'I ὑ JavaScript!' can be used for U+0000 → U+FFFF

Slide 33

Slide 33 text

…what about astral code points?

Slide 34

Slide 34 text

…what about !? *…and other, equally important astral symbols *

Slide 35

Slide 35 text

!

Slide 36

Slide 36 text

Unicode code point escapes >> '\u{41}\u{42}\u{43}' 'ABC' >> '\u{1F4A9}' '!' // U+1F4A9 ! can be used for U+000000 → U+10FFFF ES6

Slide 37

Slide 37 text

Surrogate pairs >> '\uD83D\uDCA9' '!' // U+1F4A9 ! can be used for U+010000 → U+10FFFF

Slide 38

Slide 38 text

Surrogate pairs // for astral code points (> 0xFFFF) function getSurrogates(codePoint) { var high = Math.floor((codePoint - 0x10000) / 0x400) + 0xD800; var low = (codePoint - 0x10000) % 0x400 + 0xDC00; return [ high, low ]; } ! function getCodePoint(high, low) { var codePoint = (high - 0xD800) * 0x400 + low - 0xDC00 + 0x10000; return codePoint; } ! >> getSurrogates(0x1F4A9); // U+1F4A9 is ! [ 0xD83D, 0xDCA9 ] >> getCodePoint(0xD83D, 0xDCA9); 0x1F4A9 http://mths.be/bed

Slide 39

Slide 39 text

JavaScript string length >> 'A'.length // U+0041 1 >> 'A' == '\u0041' true >> 'B'.length // U+0042 1 >> 'B' == '\u0042' true

Slide 40

Slide 40 text

String length ≠ char count >> '!'.length // U+1D400 2 >> '!' == '\uD835\uDC00' true >> '"'.length // U+1D401 2 >> '"' == '\uD835\uDC01' true

Slide 41

Slide 41 text

String length ≠ char count >> '!'.length // U+1F4A9 2 >> '!' == '\uD83D\uDCA9' true insert obligatory “number two” joke here

Slide 42

Slide 42 text

Real-world example

Slide 43

Slide 43 text

Real-world example

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

String character count function countSymbols(string) { return punycode.ucs2.decode(string).length; } ! >> countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 http://mths.be/punycode

Slide 47

Slide 47 text

String character count function countSymbols(string) { return Array.from(string).length; } ! >> countSymbols('A') // U+0041 1 >> countSymbols('!') // U+1D400 1 >> countSymbols('!') // U+1F4A9 1 ES6

Slide 48

Slide 48 text

JavaScript escape sequences http://mths.be/bmf

Slide 49

Slide 49 text

If we’re being pedantic… // it’s actually even more complicated: ! >> 'mañana' == 'mañana' false

Slide 50

Slide 50 text

If we’re being pedantic… // it’s actually even more complicated: ! >> 'mañana' == 'mañana' false >> 'ma\xF1ana' == 'man\u0303ana' false >> 'ma\xF1ana'.length 6 >> 'man\u0303ana'.length 7

Slide 51

Slide 51 text

function countSymbolsPedantically(string) { // Unicode Normalization, NFC form: var normalized = string.normalize('NFC'); // Account for astral symbols / surrogates: return Array.from(normalized).length; } ! >> countSymbolsPedantically('mañana') // U+00F1 6 >> countSymbolsPedantically('mañana') // U+006E + U+0303 6 Unicode normalization http://git.io/unorm ES6

Slide 52

Slide 52 text

Perfect? >> var zalgo = 'H ̹̙̦̮͉̩̗̗ ͧ̇̏̊̾ Eͨ͆͒̆ͮ̃ ͏̷̮̣̫̤̣ ̵̞̹̻ ̀̉̓ͬ͑͡ ͅ Cͯ̂͐ ͏̨̛͔̦̟͈̻ O ̜͎͍͙͚̬̝̣ ̽ͮ͐͗̀ͤ̍̀ ͢ M ̴̡̲̭͍͇̼̟̯̦ ̉̒͠ Ḛ̛̙̞̪̗ ͥ ͤͩ̾͑̔͐ ͅ Ṯ̴̷̷̗̼͍ ̿̿̓̽͐ H ̙̙ ̔̄ ͜ ';

Slide 53

Slide 53 text

Perfect? Nope. → can be ‘fixed’ using epic regex-fu >> var zalgo = 'H ̹̙̦̮͉̩̗̗ ͧ̇̏̊̾ Eͨ͆͒̆ͮ̃ ͏̷̮̣̫̤̣ ̵̞̹̻ ̀̉̓ͬ͑͡ ͅ Cͯ̂͐ ͏̨̛͔̦̟͈̻ O ̜͎͍͙͚̬̝̣ ̽ͮ͐͗̀ͤ̍̀ ͢ M ̴̡̲̭͍͇̼̟̯̦ ̉̒͠ Ḛ̛̙̞̪̗ ͥ ͤͩ̾͑̔͐ ͅ Ṯ̴̷̷̗̼͍ ̿̿̓̽͐ H ̙̙ ̔̄ ͜ '; ! >> countSymbolsPedantically(zalgo) 116 // not 9

Slide 54

Slide 54 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); }

Slide 55

Slide 55 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); } ! >> reverse('abc') 'cba'

Slide 56

Slide 56 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); } ! >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam'

Slide 57

Slide 57 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); } ! >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anãnam'

Slide 58

Slide 58 text

Reversing a string in JavaScript // naive solution function reverse(string) { return string.split('').reverse().join(''); } ! >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anãnam' >> reverse('!') // U+1F4A9 '��' '\uDCA9\uD83D' // the surrogate pair for !, in the wrong order

Slide 59

Slide 59 text

“I put my thang down, flip it, and reverse it” — Missy ‘Misdemeanor’ Elliot, 2002

Slide 60

Slide 60 text

Reversing a string in JavaScript // Using the Esrever library var reverse = esrever.reverse; ! >> reverse('abc') 'cba' >> reverse('mañana') // U+00F1 'anañam' >> reverse('mañana') // U+006E + U+0303 'anañam' >> reverse('!') // U+1F4A9 '!' http://mths.be/esrever

Slide 61

Slide 61 text

This behavior affects other string methods, too.

Slide 62

Slide 62 text

String.fromCharCode() >> String.fromCharCode(0x0041) // U+0041 'A' // U+0041 >> String.fromCharCode(0x1F4A9) // U+1F4A9 '!' // U+F4A9 ! only works as you’d expect for U+0000 → U+FFFF

Slide 63

Slide 63 text

String.fromCharCode() → use surrogate pairs for astral symbols: ! >> String.fromCharCode(0xD83D, 0xDCA9) '!' // U+1F4A9 ! → or just use Punycode.js: ! >> punycode.ucs2.encode([ 0x1F4A9 ]) '!' // U+1F4A9

Slide 64

Slide 64 text

String.fromCodePoint() >> String.fromCodePoint(0x1F4A9) '!' ! can be used for U+000000 → U+10FFFF ES6 http://mths.be/fromcodepoint

Slide 65

Slide 65 text

String#charAt() >> '!'.charAt(0) // U+1F4A9 '\uD83D' // U+D83D

Slide 66

Slide 66 text

String#at() >> '!'.at(0) // U+1F4A9 '!' // U+1F4A9 ES7 http://mths.be/at

Slide 67

Slide 67 text

String#charCodeAt() >> '!'.charCodeAt(0) // U+1F4A9 0xD83D

Slide 68

Slide 68 text

String#codePointAt() >> '!'.codePointAt(0) // U+1F4A9 0x1F4A9 ES6 http://mths.be/codepointat

Slide 69

Slide 69 text

Iterate over all symbols in a string function getSymbols(string) { var length = string.length; var index = -1; var output = []; var character; var charCode; while (++index < length) { character = string.charAt(index); charCode = character.charCodeAt(0); if (charCode >= 0xD800 && charCode <= 0xDBFF) { // note: this doesn’t account for lone high surrogates output.push(character + string.charAt(++index)); } else { output.push(character); } } return output; } ! var symbols = getSymbols('! '); symbols.forEach(function(symbol) { assert(symbol == '! '); });

Slide 70

Slide 70 text

Iterate over all symbols in a string for (let symbol of '!') { assert(symbol == '!'); } ES6

Slide 71

Slide 71 text

More string madness •String#substring •String#slice •…anything that involves strings

Slide 72

Slide 72 text

Regular expressions >> /foo.bar/.test('foo!bar') false

Slide 73

Slide 73 text

Match any Unicode symbol >> /^.$/.test('!') false // doesn’t match line breaks, either

Slide 74

Slide 74 text

Match any Unicode symbol >> /^.$/.test('!') false // doesn’t match line breaks, either ! >> /^[\s\S]$/.test('!') false // matches line breaks, but still doesn’t match whole astral symbols

Slide 75

Slide 75 text

Match any Unicode symbol >> /^.$/.test('!') false // doesn’t match line breaks, either ! >> /^[\s\S]$/.test('!') false // matches line breaks, but still doesn’t match whole astral symbols ! >> /^[\0-\uD7FF\uDC00-\uFFFF]|[\uD800- \uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF] $/.test('!') true // wtf

Slide 76

Slide 76 text

Create Unicode-aware regular expressions >> regenerate().addRange(0x0, 0x10FFFF).toString() http://mths.be/regenerate

Slide 77

Slide 77 text

Create Unicode-aware regular expressions >> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' http://mths.be/regenerate

Slide 78

Slide 78 text

>> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() http://mths.be/regenerate Create Unicode-aware regular expressions

Slide 79

Slide 79 text

>> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points http://mths.be/regenerate Create Unicode-aware regular expressions

Slide 80

Slide 80 text

>> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points …… .removeRange('A', 'z') // remove all symbols from `A` to `z` http://mths.be/regenerate Create Unicode-aware regular expressions

Slide 81

Slide 81 text

>> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points …… .removeRange('A', 'z') // remove all symbols from `A` to `z` …… .remove('!') // remove U+1F4A9 PILE OF POO http://mths.be/regenerate Create Unicode-aware regular expressions

Slide 82

Slide 82 text

>> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points …… .removeRange('A', 'z') // remove all symbols from `A` to `z` …… .remove('!') // remove U+1F4A9 PILE OF POO …… .toString(); http://mths.be/regenerate Create Unicode-aware regular expressions

Slide 83

Slide 83 text

>> regenerate().addRange(0x0, 0x10FFFF).toString() '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800- \uDBFF]' >> regenerate() …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points …… .removeRange('A', 'z') // remove all symbols from `A` to `z` …… .remove('!') // remove U+1F4A9 PILE OF POO …… .toString(); '[\0-\x1F\x21-\x40\x7B-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00- \uDFFF]|[\uD800-\uDBFF]' http://mths.be/regenerate Create Unicode-aware regular expressions

Slide 84

Slide 84 text

>> var regenerate = require('regenerate'); >> var symbols = require('unicode-7.0.0/scripts/Greek/symbols'); >> var set = regenerate(symbols); >> set.toString(); http://mths.be/regenerate Create Unicode-aware regular expressions http://mths.be/node-unicode-data

Slide 85

Slide 85 text

>> var regenerate = require('regenerate'); >> var symbols = require('unicode-7.0.0/scripts/Greek/symbols'); >> var set = regenerate(symbols); >> set.toString(); '[\u0370-\u0373\u0375-\u0377\u037A-\u037D\u037F\u0384\u0386\u0388- \u038A\u038C\u038E-\u03A1\u03A3-\u03E1\u03F0-\u03FF\u1D26-\u1D2A \u1D5D-\u1D61\u1D66-\u1D6A\u1DBF\u1F00-\u1F15\u1F18-\u1F1D\u1F20- \u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D \u1F80-\u1FB4\u1FB6-\u1FC4\u1FC6-\u1FD3\u1FD6-\u1FDB\u1FDD-\u1FEF \u1FF2-\u1FF4\u1FF6-\u1FFE\u2126\uAB65]|\uD800[\uDD40-\uDD8C\uDDA0]| \uD834[\uDE00-\uDE45]' http://mths.be/regenerate Create Unicode-aware regular expressions http://mths.be/node-unicode-data

Slide 86

Slide 86 text

Regular expressions >> /foo.bar/.test('foo!bar') false ! >> /foo.bar/u.test('foo!bar') true ES6

Slide 87

Slide 87 text

Regex character classes >> /[a-c]/ // matches: // U+0061 LATIN SMALL LETTER A // U+0062 LATIN SMALL LETTER B // U+0063 LATIN SMALL LETTER C >> /^[a-c]$/.test('a') true >> /^[a-c]$/.test('b') true >> /^[a-c]$/.test('c') true

Slide 88

Slide 88 text

>> /[!-"]/ // matches: // U+1F4A9 PILE OF POO // U+1F4AA FLEXED BICEPS // U+1F4AB DIZZY SYMBOL >> /^[!-"]$/.test('!') true >> /^[!-"]$/.test('#') true >> /^[!-"]$/.test('"') true Regex character classes

Slide 89

Slide 89 text

>> /[!-"]/ // matches: // U+1F4A9 PILE OF POO // U+1F4AA FLEXED BICEPS // U+1F4AB DIZZY SYMBOL >> /^[!-"]$/.test('!') true >> /^[!-"]$/.test('#') true >> /^[!-"]$/.test('"') true Regex character classes ✘

Slide 90

Slide 90 text

Regex character classes >> /[!-"]/ SyntaxError: Invalid regular expression: Range out of order in character class

Slide 91

Slide 91 text

Regex character classes >> /[!-"]/ SyntaxError: Invalid regular expression: Range out of order in character class >> /[\uD83D\uDCA9-\uD83D\uDCAB]/

Slide 92

Slide 92 text

Regex character classes >> /[!-"]/ SyntaxError: Invalid regular expression: Range out of order in character class >> /[\uD83D\uDCA9-\uD83D\uDCAB]/

Slide 93

Slide 93 text

Regex character classes ES6 ✔ >> /[!-"]/u // matches: // U+1F4A9 PILE OF POO // U+1F4AA FLEXED BICEPS // U+1F4AB DIZZY SYMBOL >> /^[!-"]$/u.test('!') true >> /^[!-"]$/u.test('#') true >> /^[!-"]$/u.test('"') true

Slide 94

Slide 94 text

>> regenerate().addRange('!', '"').toString() '\uD83D[\uDCA9-\uDCAB]' >> /^\uD83D[\uDCA9-\uDCAB]$/.test('!') true >> /^\uD83D[\uDCA9-\uDCAB]$/.test('#') true >> /^\uD83D[\uDCA9-\uDCAB]$/.test('"') true Regex character classes http://mths.be/regenerate

Slide 95

Slide 95 text

JavaScript has a Unicode problem http://mths.be/jsu

Slide 96

Slide 96 text

! The Pile of Poo Test™ http://mths.be/jsu

Slide 97

Slide 97 text

! is the new %00

Slide 98

Slide 98 text

4. MySQL $ Unicode

Slide 99

Slide 99 text

CREATE TABLE `table_name` ( `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT, `column_name` VARCHAR(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Slide 100

Slide 100 text

CREATE TABLE `table_name` ( `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT, `column_name` VARCHAR(255) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Slide 101

Slide 101 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)

Slide 102

Slide 102 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)

Slide 103

Slide 103 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)

Slide 104

Slide 104 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec)

Slide 105

Slide 105 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 ! mysql> SHOW WARNINGS; +---------+------+--------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' | | | | for column 'column_name' at row 1 | +---------+------+--------------------------------------------+ 1 row in set (0.00 sec) ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo | +-------------+ 1 row in set (0.00 sec) ✘

Slide 106

Slide 106 text

MySQL’s ✌️utf8✌️

Slide 107

Slide 107 text

MySQL’s ✌️utf8✌️

Slide 108

Slide 108 text

ALTER TABLE `table_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; http://mths.be/utf8mb4

Slide 109

Slide 109 text

ALTER TABLE `table_name` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; http://mths.be/utf8mb4

Slide 110

Slide 110 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) http://mths.be/utf8mb4

Slide 111

Slide 111 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) http://mths.be/utf8mb4

Slide 112

Slide 112 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) http://mths.be/utf8mb4

Slide 113

Slide 113 text

mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 ! mysql> SELECT column_name FROM table_name WHERE id = 9001; +-------------+ | column_name | +-------------+ | foo!bar | +-------------+ 1 row in set (0.00 sec) http://mths.be/utf8mb4 ✔

Slide 114

Slide 114 text

5. Hacking with Unicode

Slide 115

Slide 115 text

http://mths.be/brk

Slide 116

Slide 116 text

$ curl -sL http://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0

Slide 117

Slide 117 text

$ curl -sL http://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0

Slide 118

Slide 118 text

$ curl -sL http://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0

Slide 119

Slide 119 text

$ curl -sL http://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0 (U+3C73 CJK UNIFIED IDEOGRAPH-3C73)

Slide 120

Slide 120 text

$ curl -sL http://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0 ᷼ (U+6372 CJK UNIFIED IDEOGRAPH-6372)

Slide 121

Slide 121 text

$ curl -sL http://mths.be/brk | hexdump -C | tail -n 19 000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .| 000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .| 000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.| 000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.| 00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.| 00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.| 00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.| 00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .| 00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .| 00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.| 00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.| 00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...| 00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.| 00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert| 000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS"); <./.p.>.....<.| 000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.| 000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....| 000003e0 (U+6970 CJK UNIFIED IDEOGRAPH-6970)

Slide 122

Slide 122 text

http://mths.be/brm

Slide 123

Slide 123 text

http://mths.be/brm ∀scriptalert(1)/script

Slide 124

Slide 124 text

http://mths.be/brm ∀scriptalert(1)/script

Slide 125

Slide 125 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c

Slide 126

Slide 126 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c

Slide 127

Slide 127 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c ∀ (U+2200 FOR ALL)

Slide 128

Slide 128 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c (U+3E00 CJK UNIFIED IDEOGRAPH-3E00)

Slide 129

Slide 129 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c (U+3C00 CJK UNIFIED IDEOGRAPH-3C00)

Slide 130

Slide 130 text

$ hexdump -C utf-32-xss.html 00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s| 00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p| 00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l| 00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(| 00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../| 00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i| 00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.| 0000006c s (U+0073 LATIN SMALL LETTER S)

Slide 131

Slide 131 text

No content

Slide 132

Slide 132 text

No content

Slide 133

Slide 133 text

http://mths.be/brl

Slide 134

Slide 134 text

http://mths.be/brl OBAMA vs. ᴼᴮᴬᴹᴬ

Slide 135

Slide 135 text

JavaScript vs. JSON

Slide 136

Slide 136 text

// <?php echo strip($userInput); ?> !

Slide 137

Slide 137 text

http://mths.be/brn

Slide 138

Slide 138 text

http://mths.be/brn

Slide 139

Slide 139 text

// <?php echo strip($userInput); ?> foo[U+2028]alert('XSS')

Slide 140

Slide 140 text

// <?php echo strip($userInput); ?> ✘ foo[U+2028]alert('XSS')

Slide 141

Slide 141 text

JSON ∉ JavaScript

Slide 142

Slide 142 text

var data = '"Hello\u2028"'; // JSON-formatted data containing a string // containing an (unescaped!) Line Separator ! eval('(' + data + ')'); // i SyntaxError: Unexpected token ILLEGAL ! JSON.parse(data); // i 'Hello\u2028'

Slide 143

Slide 143 text

Always escape JSON-formatted data before passing it to a JavaScript parser. http://mths.be/jsesc

Slide 144

Slide 144 text

http://mths.be/jsesc

Slide 145

Slide 145 text

var data = 'foo\u2028'; ! var serialized = JSON.stringify(data); // i '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc

Slide 146

Slide 146 text

var data = 'foo\u2028'; ! var serialized = JSON.stringify(data); // i '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc

Slide 147

Slide 147 text

var data = 'foo\u2028'; ! var serialized = JSON.stringify(data); // i '"foo\u2028"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\u2028"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc

Slide 148

Slide 148 text

var string = String.fromCharCode(0xD800); // a string containing an (unescaped!) // lone surrogate ! var data = JSON.stringify(string); // the same string as JSON-formatted data ! storeInDatabaseAsUtf8(data); // " error/crash ! sendOverWebSocketConnection(data); // " error/crash/DoS

Slide 149

Slide 149 text

Always escape JSON-formatted data before passing it to a UTF-8 encoder. http://mths.be/jsesc

Slide 150

Slide 150 text

http://mths.be/jsesc

Slide 151

Slide 151 text

var data = 'foo\uD800'; ! var serialized = JSON.stringify(data); // i '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc

Slide 152

Slide 152 text

var data = 'foo\uD800'; ! var serialized = JSON.stringify(data); // i '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc

Slide 153

Slide 153 text

var data = 'foo\uD800'; ! var serialized = JSON.stringify(data); // i '"foo\uD800"' (contains the raw, unescaped // Unicode symbol) ! var escaped = jsesc(data, { 'json': true }); // i '"foo\\uD800"' (contains an escape sequence // for the Unicode symbol i safer) ! JSON.parse(serialized) == JSON.parse(escaped); // i true (both strings unserialize to the same value) http://mths.be/jsesc

Slide 154

Slide 154 text

Phabricator

Slide 155

Slide 155 text

Phabricator

Slide 156

Slide 156 text

No content

Slide 157

Slide 157 text

No content

Slide 158

Slide 158 text

No content

Slide 159

Slide 159 text

uses MySQL

Slide 160

Slide 160 text

uses MySQL’s ✌️utf8✌️

Slide 161

Slide 161 text

No content

Slide 162

Slide 162 text

No content

Slide 163

Slide 163 text

http://mths.be/bro

Slide 164

Slide 164 text

RCE in WordPress < 3.6.1 http://mths.be/brq

Slide 165

Slide 165 text

CVE-2013-4338 “wp-includes/functions.php in WordPress before 3.6.1 does not properly determine whether data has been serialize()d, which allows remote attackers to execute arbitrary code by triggering erroneous PHP unserialize() operations.” http://mths.be/brq

Slide 166

Slide 166 text

function is_serialized( $data ) { $data = trim( $data ); $length = strlen( $data ); $lastc = $data[$length - 1]; if ( ';' !== $lastc && '}' !== $lastc ) return false; $token = $data[0]; switch ( $token ) { case 's' : if ( '"' !== $data[$length - 2] ) return false; case 'a' : case 'O' : return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data ); case 'b' : case 'i' : case 'd' : return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/", $data ); } return false; } http://mths.be/brq

Slide 167

Slide 167 text

function is_serialized( $data ) { $data = trim( $data ); $length = strlen( $data ); $lastc = $data[$length - 1]; if ( ';' !== $lastc && '}' !== $lastc ) return false; $token = $data[0]; switch ( $token ) { case 's' : if ( '"' !== $data[$length - 2] ) return false; case 'a' : case 'O' : return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data ); case 'b' : case 'i' : case 'd' : return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/", $data ); } return false; } http://mths.be/brq

Slide 168

Slide 168 text

WordPress Before writing it to the database, data gets serialized only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) ! After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true http://mths.be/brq

Slide 169

Slide 169 text

WordPress Before writing it to the database, data gets serialized only if it’s an array or an object, or if is_serialized($data) returns true (double serialization) ! After retrieving data from the database, it gets unserialized only if is_serialized($data) returns true http://mths.be/brq uses MySQL’s ✌️utf8✌️

Slide 170

Slide 170 text

http://mths.be/brq

Slide 171

Slide 171 text

http://mths.be/brq

Slide 172

Slide 172 text

http://mths.be/brq

Slide 173

Slide 173 text

http://mths.be/brq

Slide 174

Slide 174 text

class Foo { private $command; public function setCommand($command) { $this->command = $command; } public function __destruct() { if ($this->command) { shell_exec($this->command); } } } ! $object = new Foo(); $object->setCommand('echo "pwned!" > /tmp/pwned.txt'); $serialized = serialize($object); $payload = $serialized . '!'; http://mths.be/brq

Slide 175

Slide 175 text

http://mths.be/brq

Slide 176

Slide 176 text

No content

Slide 177

Slide 177 text

“[The following C# code] takes a provided HTML string and removes any potentially dangerous XSS HTML tags using a whitelist approach.” http://mths.be/brp

Slide 178

Slide 178 text

private static Regex _whitelist = new Regex(@" ^?(a|b(lockquote)?|code|em|h(1|2|3)|i|li|ol|p(re)?|s(ub| up|trong|trike)?|ul)>$ |^<(b|h)r\s?/?>$ |^]+>$ |^]+/?>$", RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.Compiled ); http://mths.be/brp

Slide 179

Slide 179 text

/// /// sanitize any potentially dangerous tags from the provided /// raw HTML input using a whitelist based approach, leaving /// the "safe" HTML tags /// public static string Sanitize(string html) { var tagname = ""; Match tag; var tags = _tags.Matches(html); // iterate through all HTML tags in the input for (int i = tags.Count-1; i > -1; i--) { tag = tags[i]; tagname = tag.Value.ToLower(); if (!_whitelist.IsMatch(tagname)) { // not on our whitelist? I SAY GOOD DAY TO YOU, SIR. GOOD DAY! html = html.Remove(tag.Index, tag.Length); } http://mths.be/brp

Slide 180

Slide 180 text

/// /// sanitize any potentially dangerous tags from the provided /// raw HTML input using a whitelist based approach, leaving /// the "safe" HTML tags /// public static string Sanitize(string html) { var tagname = ""; Match tag; var tags = _tags.Matches(html); // iterate through all HTML tags in the input for (int i = tags.Count-1; i > -1; i--) { tag = tags[i]; tagname = tag.Value.ToLower(); if (!_whitelist.IsMatch(tagname)) { // not on our whitelist? I SAY GOOD DAY TO YOU, SIR. GOOD DAY! html = html.Remove(tag.Index, tag.Length); } http://mths.be/brp

Slide 181

Slide 181 text

/// /// sanitize any potentially dangerous tags from the provided /// raw HTML input using a whitelist based approach, leaving /// the "safe" HTML tags /// public static string Sanitize(string html) { var tagname = ""; Match tag; var tags = _tags.Matches(html); // iterate through all HTML tags in the input for (int i = tags.Count-1; i > -1; i--) { tag = tags[i]; tagname = tag.Value.ToLower(); if (!_whitelist.IsMatch(tagname)) { // not on our whitelist? I SAY GOOD DAY TO YOU, SIR. GOOD DAY! html = html.Remove(tag.Index, tag.Length); } http://mths.be/brp

Slide 183

Slide 183 text

(\stitle=""[^""]*"")? \s?/?>")) { html = html.Remove(tag.Index, tag.Length); } } } return html; } http://mths.be/brp

Slide 189

Slide 189 text

http://mths.be/brp

Slide 190

Slide 190 text

TweetDeck

Slide 191

Slide 191 text

No content

Slide 192

Slide 192 text

No content

Slide 193

Slide 193 text

No content

Slide 194

Slide 194 text

function getTweetHtml(tweet) { var htmlEscapedTweet = htmlEscape(tweet); if (containsEmoji(htmlEscapedTweet)) { return replaceSymbolsWithImgTags( AccidentallyUndoHtmlEscaping(htmlEscapedTweet) ); } else { return htmlEscapedTweet; } } http://mths.be/bsq

Slide 195

Slide 195 text

Thanks! Questions? → @mathias

Slide 196

Slide 196 text

No content

Slide 197

Slide 197 text

http://mths.be/brr

Slide 198

Slide 198 text

http://mths.be/brr