Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Character Building - Fun with charsets and encodings

Character Building - Fun with charsets and encodings

Christoph Lühr

July 02, 2013
Tweet

More Decks by Christoph Lühr

Other Decks in Programming

Transcript

  1. Christoph Lühr @chluehr / @bephpug 2013 "Fun with charsets and

    encodings" Character Building ٩(͡๏̯͡๏)۶
  2. Set of Characters [ A B C ... 1 2

    3 ... @#$ ] UNICODE / CODE PAGES
  3. BOM BOM BOM... UTF8 BOM 0xEF 0xBB 0xBF UTF32BE BOM

    0x00 0x00 0xFE 0xFF UTF32LE BOM 0xFF 0xFE 0x00 0x00
  4. hexdump -C foo.txt 00000000 48 61 6c 6c 6f 20

    62 65 70 68 70 75 67 21 0a 48 |Hallo bephpug!.H| 00000010 69 65 72 20 65 69 6e 20 61 2d 55 6d 6c 61 75 74 |ier ein a-Umlaut| 00000020 3a c3 a4 21 0a 48 69 65 72 20 65 69 6e 20 61 2d |:..!.Hier ein a-| 00000030 6d 69 74 2d 4b 72 69 6e 67 65 6c 3a c3 a5 21 0a |mit-Kringel:..!.| 00000040 0a
  5. <?php CR LF LF // lets say hello! LF echo

    "hello" LF <?php // lets say hello! LF echo "hello"
  6. Diacritics ü => u+" U+00FC ü c3 bc LATIN SMALL

    LETTER U WITH DIAERESIS U+0075 u 75 LATIN SMALL LETTER U U+0308 _̈ cc 88 COMBINING DIAERESIS
  7. Contact Christoph Lühr eMail: [email protected], [email protected] Twitter: @chluehr Slides license

    Attribution-NonCommercial-ShareAlike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ Thanks! Questions? U+3020 POSTAL MARK FACE
  8. Links • Kore Nordmann (FAQ!) http://kore-nordmann.de/blog/0082_charset_versus_encoding.html http://kore-nordmann.de/blog/php_charset_encoding_FAQ.html • Misc. Resources

    http://www.iana.org/assignments/character-sets/character-sets.xml http://www.joelonsoftware.com/articles/Unicode.html http://www.unicode.org/charts/ http://t-a-w.blogspot.de/2008/12/funny-characters-in-unicode.html http://www.utf8-zeichentabelle.de/unicode-utf8-table.pl?number=1024 http://stackoverflow.com/questions/3417180/exotic-names-for-methods- constants-variables-and-fields-bug-or-feature