Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hello, my name is __________.

Nova Patch
October 28, 2015

Hello, my name is __________.

Video and notes: https://novapatch.is/talks/hello-my-name-is/

Our personal identity is core to how we perceive ourselves and wish to be seen. All too often, however, applications, databases, and user interfaces are not designed to fully support the diversity of names expressed both locally and internationally. This talk demonstrates ways to build applications that respect users’ identities instead of limiting them.

Topics include:
◦ Input, validation, storage, and display of personal names
◦ Unicode usernames and solutions to security concerns
◦ Internationalization and localization considerations

The intended audience includes programmers, UX designers, and QA testers. Together we can build inclusive software that supports diverse identities.

Presented at:
◦ 2015-10-28: Internationalization & Unicode Conference 39 (IUC39), Santa Clara, CA
◦  2015-07-23: OSCON 2015, Portland, OR
◦  2015-06-23: Open Source Bridge 2015, Portland, OR
◦  2015-06-10: YAPC::NA 2015, Salt Lake City, UT

Nova Patch

October 28, 2015
Tweet

More Decks by Nova Patch

Other Decks in Programming

Transcript

  1. Hello, my name is ___________.

    View Slide

  2. Hello, my name is Nova Patch

    View Slide

  3. Hello, my name is @novapatch

    View Slide

  4. Hello, my name is #MyNameIs

    View Slide

  5. Why don’t you support my name?

    View Slide

  6. 1. Unintentional bugs
    Why don’t you support my name?

    View Slide

  7. 1. Unintentional bugs
    2. Uninformed decisions
    Why don’t you support my name?

    View Slide

  8. 1. Unintentional bugs
    2. Uninformed decisions
    3. Oppressive “real” name policies
    Why don’t you support my name?

    View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. 文字化け

    View Slide

  14. Mojibake

    View Slide

  15. æ–‡å—化ã� ‘

    View Slide

  16. In memory of
    Nóirín Plunkett

    View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. Source: W3C “Character encodings: Essential concepts” by Richard Ishida; © W3C

    View Slide

  29. MySQL
    utf8 vs. utf8mb4

    View Slide

  30. JavaScript
    “characters”

    View Slide

  31. “Almost all emoji
    —and all new ones—
    are encoded in Plane 1”
    Why support non-BMP characters?
    Source: ”2015 Top Ten List: Why Support Beyond-BMP Code Points?” by Dr. Ken Lunde
    © Adobe Systems Incorporated

    View Slide

  32. “Japan’s 2,136 Jōyō Kanji
    requires one Extension B ideograph”

    Why support non-BMP characters?
    Source: ”2015 Top Ten List: Why Support Beyond-BMP Code Points?” by Dr. Ken Lunde
    © Adobe Systems Incorporated

    View Slide

  33. “JIS X 0213:2004 requires
    303 Extension B ideographs”
    Why support non-BMP characters?
    Source: ”2015 Top Ten List: Why Support Beyond-BMP Code Points?” by Dr. Ken Lunde
    © Adobe Systems Incorporated

    View Slide

  34. Why support non-BMP characters?

    View Slide

  35. “GB 18030 certification without PUA
    requires six Extension B ideographs”
    Why support non-BMP characters?
    Source: ”2015 Top Ten List: Why Support Beyond-BMP Code Points?” by Dr. Ken Lunde
    © Adobe Systems Incorporated

    View Slide

  36. “China’s 8,105 hànzì set
    requires 196 Extension B
    through E ideographs”
    Why support non-BMP characters?
    Source: ”2015 Top Ten List: Why Support Beyond-BMP Code Points?” by Dr. Ken Lunde
    © Adobe Systems Incorporated

    View Slide

  37. “Hong Kong SCS-2008 requires
    1,702 Extension B & C ideographs”
    Why support non-BMP characters?
    Source: ”2015 Top Ten List: Why Support Beyond-BMP Code Points?” by Dr. Ken Lunde
    © Adobe Systems Incorporated

    View Slide

  38. “Modern OSes and applications support
    code points outside the BMP”
    Why support non-BMP characters?
    Source: ”2015 Top Ten List: Why Support Beyond-BMP Code Points?” by Dr. Ken Lunde
    © Adobe Systems Incorporated

    View Slide

  39. “As of Unicode Version 6.0,
    there are more characters
    outside the BMP”
    Why support non-BMP characters?
    Source: ”2015 Top Ten List: Why Support Beyond-BMP Code Points?” by Dr. Ken Lunde
    © Adobe Systems Incorporated

    View Slide

  40. “The BMP is effectively full”
    Why support non-BMP characters?
    Source: ”2015 Top Ten List: Why Support Beyond-BMP Code Points?” by Dr. Ken Lunde
    © Adobe Systems Incorporated

    View Slide

  41. 李炘煜

    View Slide

  42. 李▯煜

    View Slide

  43. 李炘煜
    U+674E U+7098 U+715C

    View Slide

  44. View Slide

  45. View Slide

  46. View Slide

  47. View Slide

  48. O'Reilly

    View Slide

  49. O\'Reilly

    View Slide

  50. O'Reilly

    View Slide

  51. OReilly

    View Slide

  52. O

    View Slide

  53. View Slide

  54. View Slide

  55. View Slide

  56. 1. Identifier characters
    2. Case folding
    3. Normalization
    4. Confusable characters
    5. Mixed scripts
    Unicode Usernames

    View Slide

  57. 1. UTS #31: Unicode Identifier and Pattern Syntax
    2. UTR #36: Unicode Security Considerations
    3. UTS #39: Unicode Security Mechanisms
    4. RFC 7613: Preparation, Enforcement, and
    Comparison of Internationalized Strings
    Representing Usernames and Passwords
    Unicode Usernames

    View Slide

  58. View Slide

  59. Preventing fake names
    is not worth discriminating
    against real users.

    View Slide

  60. Nova Patch
    @novapatch
    Shutterstock

    View Slide