Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What a charset!

What a charset!

Gunnar Bittersmann

September 10, 2021
Tweet

More Decks by Gunnar Bittersmann

Other Decks in Programming

Transcript

  1. He always used to refer this guitar, never “Fender guitar”

    or “Gibson guitar,” it was always the “goddamn guitar.” —Bruce Springsteen talking about his father
  2. When I was growing up there were two things that

    were unpopular in my house: one was me, and the other one was my guitar. —Bruce Springsteen
  3. HTML escapes a ä “ U+0061 U+00E4 U+201C ä “

    &auml; &ldquo; Unicode <p>Antonín Dvořák</p> <p>Anton&#xED;n Dvo&#x159;&#xE1;k</p> ✘ ✔
  4. HTML escapes a ä “ U+0061 U+00E4 U+201C &#xE4; &#x201C;

    &auml; &ldquo; Unicode <p>Antonín Dvořák &ndash; der weltweit meistgespielte tschechische Komponist</p> <p>Antonín Dvořák – der weltweit meistgespielte tschechische Komponist</p> ✔
  5. HTML escapes a ä “ BOM U+0061 U+00E4 U+201C U+FEFF

    &#xE4; &#x201C; &auml; &ldquo; UTF-16 BE 00 61 00 E4 20 1C FE FF Unicode UTF-16 LE 61 00 E4 00 1C 20 FF FE
  6. HTML escapes a ä “ BOM 😝 U+0061 U+00E4 U+201C

    U+FEFF U+1F61D &#xE4; &#x201C; &auml; &ldquo; UTF-16 BE 00 61 00 E4 20 1C FE FF Unicode D8 3D DE 1D UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8 ≫ '😝'.length ← 2 &#x1F61D;
  7. HTML escapes a ä “ BOM 😝 U+0061 U+00E4 U+201C

    U+FEFF U+1F61D &#xE4; &#x201C; &auml; &ldquo; UTF-16 BE UTF-32 BE 00 61 00 00 00 61 00 E4 20 1C FE FF Unicode D8 3D DE 1D 00 00 00 E4 00 00 20 1C 00 00 FE FF 00 01 F6 1D UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8 &#x1F61D;
  8. HTML escapes UTF-8 a ä “ BOM 😝 U+0061 U+00E4

    U+201C U+FEFF U+1F61D &#xE4; &#x201C; &auml; &ldquo; UTF-16 BE UTF-32 BE 61 00 61 00 00 00 61 00 E4 20 1C FE FF Unicode F0 98 9F 9C D8 3D DE 1D 00 00 00 E4 00 00 20 1C 00 00 FE FF 00 01 F6 1D E2 80 9C C3 A4 UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8 EF BB BF &#x1F61D;
  9. UTF-8 a ä “ BOM 😝 U+0061 U+00E4 U+201C U+FEFF

    U+1F61D UTF-16 BE UTF-32 BE 61 00 61 00 00 00 61 00 E4 20 1C FE FF Unicode F0 98 9F 9C D8 3D DE 1D 00 00 00 E4 00 00 20 1C 00 00 FE FF 00 01 F6 1D E2 80 9C C3 A4 UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8 EF BB BF character encoding character set
  10. a