Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What a charset!

What a charset!

Gunnar Bittersmann

September 10, 2021
Tweet

More Decks by Gunnar Bittersmann

Other Decks in Programming

Transcript

  1. What a
    character set!

    View Slide

  2. What a
    charset!

    View Slide

  3. Bundesarchiv, Bild 183-58117-0010 / CC-BY-SA 3.0

    View Slide

  4. ISO-8859-1

    View Slide

  5. ISO-8859-2

    View Slide

  6. ISO-8859-6

    View Slide

  7. ISO-8859-8

    View Slide

  8. Photo by Nicholas Lazarine on Unsplash

    View Slide

  9. He always used to refer this guitar,
    never “Fender guitar”
    or “Gibson guitar,”
    it was always the “goddamn guitar.”
    —Bruce Springsteen talking about his father

    View Slide

  10. When I was growing up
    there were two things
    that were unpopular in my house:
    one was me,
    and the other one was my guitar.
    —Bruce Springsteen

    View Slide

  11. םולש
    ISO-8859-8
    ISO-8859-8-I
    FD E5 EC F9
    F9 EC E5 FD
    ש ל ו ם
    visuell
    logisch

    View Slide

  12. character set
    =
    character encoding
    /

    View Slide

  13. HTML
    escapes
    a ä “
    U+0061 U+00E4 U+201C
    ä “
    ä “
    Unicode
    Antonín Dvořák
    Antonín Dvořák


    View Slide

  14. HTML
    escapes
    a ä “
    U+0061 U+00E4 U+201C
    ä “
    ä “
    Unicode
    Antonín Dvořák – der weltweit
    meistgespielte tschechische Komponist
    Antonín Dvořák – der weltweit
    meistgespielte tschechische Komponist

    View Slide

  15. HTML
    escapes
    a ä “ BOM
    U+0061 U+00E4 U+201C U+FEFF
    ä “
    ä “
    UTF-16 BE 00 61 00 E4 20 1C FE FF
    Unicode
    UTF-16 LE 61 00 E4 00 1C 20 FF FE

    View Slide

  16. HTML
    escapes
    a ä “ BOM
    😝
    U+0061 U+00E4 U+201C U+FEFF U+1F61D
    ä “
    ä “
    UTF-16 BE 00 61 00 E4 20 1C FE FF
    Unicode
    D8 3D DE 1D
    UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8
    ≫ '😝'.length
    ← 2
    😝

    View Slide

  17. HTML
    escapes
    a ä “ BOM
    😝
    U+0061 U+00E4 U+201C U+FEFF U+1F61D
    ä “
    ä “
    UTF-16 BE
    UTF-32 BE
    00 61
    00 00 00 61
    00 E4 20 1C FE FF
    Unicode
    D8 3D DE 1D
    00 00 00 E4 00 00 20 1C 00 00 FE FF 00 01 F6 1D
    UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8
    😝

    View Slide

  18. HTML
    escapes
    UTF-8
    a ä “ BOM
    😝
    U+0061 U+00E4 U+201C U+FEFF U+1F61D
    ä “
    ä “
    UTF-16 BE
    UTF-32 BE
    61
    00 61
    00 00 00 61
    00 E4 20 1C FE FF
    Unicode
    F0 98 9F 9C
    D8 3D DE 1D
    00 00 00 E4 00 00 20 1C 00 00 FE FF 00 01 F6 1D
    E2 80 9C
    C3 A4
    UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8
    EF BB BF
    😝

    View Slide

  19. UTF-8
    a ä “ BOM
    😝
    U+0061 U+00E4 U+201C U+FEFF U+1F61D
    UTF-16 BE
    UTF-32 BE
    61
    00 61
    00 00 00 61
    00 E4 20 1C FE FF
    Unicode
    F0 98 9F 9C
    D8 3D DE 1D
    00 00 00 E4 00 00 20 1C 00 00 FE FF 00 01 F6 1D
    E2 80 9C
    C3 A4
    UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8
    EF BB BF
    character encoding
    character set

    View Slide

  20. HTML
    character encoding
    XML

    View Slide

  21. C3 A4
    U+00E4 LATIN SMALL LETTER A WITH DIAERESIS
    ä
    character encoding
    font

    View Slide

  22. a

    View Slide

  23. View Slide

  24. OPENTYPE
    FEATURES

    View Slide

  25. charset
    =
    character set
    /

    View Slide

  26. charset
    =
    character encoding

    View Slide

  27. The end.

    View Slide