$30 off During Our Annual Pro Sale. View Details »

「文字数」ってなぁに?〜String, NSString, Unicodeの基本〜

takasek
November 16, 2016

「文字数」ってなぁに?〜String, NSString, Unicodeの基本〜

第5回スタートアップiOS勉強会( http://connpass.com/event/43260/ )での発表資料です。

## 参考リンク

Unicode のサロゲートペアとは何か - ひだまりソケットは壊れない
http://vividcode.hatenablog.com/entry/unicode/surrogate-pair

なぜSwiftの文字列APIは難しいのか | プログラミング | POSTD
http://postd.cc/why-is-swifts-string-api-so-hard/

Unicodeとは? その歴史と進化、開発者向け基礎知識 - Build Insider
http://www.buildinsider.net/language/csharpunicode/01

Unicodeと、C#での文字列の扱い - Build Insider
http://www.buildinsider.net/language/csharpunicode/02

Swift 3のStringのViewに対して、Intでsubscript出来ない理由 – Swift・iOSコラム – Medium
https://medium.com/swift-column/swift-string-7147f3f496b1#.x1n3vrh1p

takasek

November 16, 2016
Tweet

More Decks by takasek

Other Decks in Programming

Transcript

  1. ʮจࣈ਺ʯͬͯͳ͊ʹʁ
    ʙString, NSString, Unicodeͷجຊʙ
    by takasek
    2016/11/16
    ୈ5ճελʔτΞοϓiOSษڧձ
    1

    View Slide

  2. 2

    View Slide

  3. 3

    View Slide

  4. ݁࿦
    ࠓ೔͸͜Ε͚͍֮ͩ͑ͯͩ͘͞
    4 String ͱ NSString ͸ผ෺
    4 Range ͱ NSRange ΋ผ෺
    4 Cocoaੈք͕ؔΘΔจࣈྻૢ࡞͸ɺඞͣ
    NSString ͷAPIΛ௨͠·͠ΐ͏
    4

    View Slide

  5. 5

    View Slide

  6. 6

    View Slide

  7. ਖ਼͍͠
    range͕
    ࡞Εͳ͍!
    7

    View Slide

  8. String ͱ NSString Կ͕ҧ͏ʁ
    4 struct ͱ class?
    4 Swift ͱ Obj-C?
    8

    View Slide

  9. String ͱ NSString Կ͕ҧ͏ʁ
    4 struct ͱ class?
    4 Swift ͱ Obj-C?
    ͦΕҎ্ʹॏཁͳҧ͍͕͋Γ·͢
    9

    View Slide

  10. String ͱ NSString Կ͕ҧ͏ʁ
    4 NSString
    4 ಺෦తʹ͸UTF-16ͰόΠτྻΛอ࣋
    4 UTF-16ͷόΠτྻΛૢ࡞͢ΔͨΊͷAPIΛఏڙ
    4 String
    4 ಺෦ͷόΠτྻ͸Ӆṭ
    4 จࣈྻૢ࡞ͷͨΊʹɺॻهૉΫϥελ͓Αͼ֤छ
    UnicodeͷίʔυϢχοτͷViewΛఏڙ͍ͯ͠Δ
    10

    View Slide

  11. !ʮ❓❓❓ʯ
    11

    View Slide

  12. View?
    ॻهૉΫϥελ??
    ֤छUnicodeͷ
    ίʔυϢχοτ???
    12

    View Slide

  13. !ʮ…ʯ
    13

    View Slide

  14. Unicodeͱ͸ 1
    ූ߸Խจࣈू߹΍จࣈූ߸ԽํࣜͳͲΛఆΊͨɺ
    จࣈίʔυͷۀքن֨Ͱ͋Δɻ
    จࣈू߹͕୯Ұͷେن໛จࣈηοτͰ͋Δ͜ͱ
    ʢʮUniʯͱ͍͏໊͸ͦΕʹ༝དྷ͢Δʣ
    ͳͲ͕ಛ௃Ͱ͋Δɻ
    1 ग़య: https://ja.wikipedia.org/wiki/Unicode
    14

    View Slide

  15. ͭ·Γʁ
    4 21bitͷ੔਺஋ۭؒʢίʔυۭؒʣ
    Λ༻ҙ
    4 ੔਺஋ʹจࣈΛׂΓ౰ͯ(ූ߸Խ)
    4 ׂΓ౰ͯΒΕͨ஋ΛɺίʔυϙΠϯτͱݺͿ
    15

    View Slide

  16. 16

    View Slide

  17. 4 ίʔυϙΠϯτ͸ U+(16ਐ਺) Ͱදݱ͢Δ
    4 BMP
    ʢجຊଟݴޠ໘ʣ
    = U+0000 ʙ U+FFFF
    4 SMP
    ʢ௥Ճ໘ʣ
    = U+10000 ʙ
    ͔͠͠…
    21bitͷίʔυۭؒΛɺ
    Ͳ͏΍࣮ͬͯࡍͷόΠτྻͱͯ͠
    දݱ͢Δ͔ʁ
    17

    View Slide

  18. ΤϯίʔσΟϯά
    ཁ͸ɺಛఆͷίʔυϙΠϯτΛදݱ͢ΔͨΊͷ
    ࠷খ୯Ґ(ίʔυϢχοτ)ΛԿbitʹ͢Δ͔ɺͱ͍͏࿩ɻ
    ಉ͡ίʔυϙΠϯτͰ΋ɺόΠτྻදݱʹ͸όϦΤʔγϣϯ͕͋Δɻ2
    2 ਤͷग़య: http://www.unicode.org/versions/Unicode6.2.0/ch02.pdf
    18

    View Slide

  19. UTF-32
    4 1ίʔυϢχοτ = 32bit(4όΠτ)
    4 1ίʔυϙΠϯτ = 4όΠτͷݻఆ௕
    ※ίʔυϢχοτ(32bit)͕ίʔυۭؒ(21bit)ΑΓେ͖͍ͷͰɺݻఆ௕ʹͰ͖
    Δ
    19

    View Slide

  20. UTF-16
    4 1ίʔυϢχοτ = 16bit(2όΠτ)
    4 1ίʔυϙΠϯτ = 2όΠτ or 4όΠτͷՄม௕
    ※Unicode 1.0.0ͷࠒ͸ɺ͜ͷൣғ಺ʹੈքதͷจࣈ͕ऩ·Δ૝ఆͩͬͨΜ͚ͩ
    Ͳ…
    20

    View Slide

  21. UTF-8
    4 1ίʔυϢχοτ = 8bit(1όΠτ)
    4 1ίʔυϙΠϯτ = 1όΠτʙ4όΠτͷՄม௕
    21

    View Slide

  22. !ʮͤΜͤʔʯ
    22

    View Slide

  23. !ʮίʔυϙΠϯτͱ͔
    ݴΘΕͯ΋
    ෼͔ΓͮΒ͍ΜͰ͚͢Ͳʯ
    23

    View Slide

  24. !ʮཁ͸ɺ
    ίʔυϙΠϯτ = 1จࣈ
    ͬͯ͜ͱͰ͢ΑͶʯ
    24

    View Slide

  25. !ʮͱɺ
    ࢥ͏͡ΌΜʁʯ
    25

    View Slide

  26. ந৅จࣈ(abstract
    character)
    4 ਓ͕ؒࢥ͍ඳ͘ʮจࣈʯ
    4 ΧʔιϧΩʔͷҠಈ୯Ґ
    ɹͱߟ͑Δͱ͍͍
    4 ந৅จࣈͱූ߸Խจࣈ͸
    ɹ ଟରଟ ͷؔ܎
    26

    View Slide

  27. ͞ΒʹɺίʔυϙΠϯτʹ͸૊Έ߹Θͤͯ࢖͏΋ͷ΋
    = ॻهૉΫϥελ (grapheme cluster)3
    3 ग़య: Unicodeͱ͸ʁ ͦͷྺ࢙ͱਐԽɺ։ൃऀ޲͚جૅ஌ࣝ
    http://www.buildinsider.net/language/csharpunicode/01
    27

    View Slide

  28. ·ͱΊ
    4 ந৅จࣈͷ࡞Γ͔ͨ
    4 ίʔυۭؒ(21bit)಺ʹίʔυϙΠϯτ͕͋ΔͷͰɺ
    4 ׂΓ౰ͯΒΕͨූ߸Խจࣈ , Λ૊Έ߹Θͤͯ
    4 ந৅จࣈ / ॻهૉΫϥελ Λ࡞Δ
    4 ΤϯίʔσΟϯάͱ͸
    4 ಛఆͷ ίʔυϙΠϯτ (࠷େ21bit)Λ
    4 نఆαΠζͷ ίʔυϢχοτ (8bit / 16bit / 32bit) ʹ٧Ίࠐ
    Ήํ๏
    28

    View Slide

  29. ༨ஊ
    άϦϑ(glyph)ͱ͍͏·ͨผͷ֓೦΋͋Δ
    4 จࣈͷݟͨ໨ʢҐஔɺαΠζʣͷ৘ใ
    4 จࣈ͸ඞͣࠨ͔ΒӈʹྲྀΕΔΘ͚͡Όͳ͍
    Ͱ͠ΐɺͬͯ͜ͱ
    4 CoreTextͰ؅ཧ͞ΕΔ…ͷ͚ͩͲɺࠓճ͸ল

    29

    View Slide

  30. ͱ΋͋Εɺ
    StringͷAPI͕ཧղͰ͖ͨ
    30

    View Slide

  31. ܁Γฦ͢ͱɺ
    Swift.String͸ɺॻهૉΫϥελ͓Αͼ
    ֤छUnicodeͷίʔυϢχοτͷView
    Λఏڙ͍ͯ͠Δ
    4 String.CharacterView
    4 String.UnicodeScalarView
    4 String.UTF8View
    4 String.UTF16View
    31

    View Slide

  32. let str = "\u{41}\u{3A9}\u{8A9E}\u{10384}"
    str.characters.count //4 !ॻهૉΫϥελ
    str.unicodeScalars.count // 4 !UTF-32ͰͷίʔυϢχοτ਺
    str.utf16.count // 5 !UTF-16ͰͷίʔυϢχοτ਺
    str.utf8.count // 10 !UTF-8ͰͷίʔυϢχοτ਺
    32

    View Slide

  33. String.CharacterView͸ ڧ͍
    ߹ࣈ΋equalͰ൑ผͯ͘͠ΕΔ
    let cafe1 = "Cafe\u{301}"
    let cafe2 = "Café"
    print(cafe1 == cafe2) // true
    33

    View Slide

  34. ͨͩɺ෺ʹΑͬͯ͸1จࣈͱͯ͠ೝࣝ͞Εͳ͍͜ͱ
    ΋͋Δ
    "!".characters.count // 4
    34

    View Slide

  35. Unicodeͷ
    ϧʔϧ΍
    จࣈ͸ɺ
    ૿͑Δɻ
    ͭ·Γ
    ʮ1จࣈʯͷ൑அ͸؀ڥґଘ 4
    4 Unicodeͱ͸ʁ ͦͷྺ࢙ͱਐԽɺ։ൃऀ޲͚جૅ஌
    ࣝ - Build Insider http://www.buildinsider.net/
    language/csharpunicode/01
    35

    View Slide

  36. ͍ͣΕʹͤΑɺ
    SwiftͷString API͸
    Α͘ߟ͑ΒΕ͍ͯ·
    ͢ɻ
    36

    View Slide

  37. ͬͯ͜ͱͰ
    NSStringͷ͜ͱ͸
    ΋͏๨Εͯ
    Swift.Stringͱָ͘͠
    աͯ͝͠Լ͍͞!
    37

    View Slide

  38. ͓ΘΓ
    38

    View Slide

  39. …͡Όͳ͍ɻ
    39

    View Slide

  40. 40

    View Slide

  41. NSAttributedTextɾ
    UITextView
    ʮ΍͊!ʯ
    41

    View Slide

  42. ʮFoundationʹࠜΛԼΖ͠
    NSStringͱڞʹੜ͖Α͏ʯ
    © 1986 Studio Ghibliʗఱۭͷ৓ϥϐϡλ
    42

    View Slide

  43. ͲΜͳʹྑ͘ઃܭ͞Εͨ
    ValueTypeΛ࡞ͬͯ΋ɺ
    Cocoa͔Β཭Εͯ͸
    ੜ͖ΒΕͳ͍ͷΑʂ
    © 1986 Studio Ghibliʗఱۭͷ৓ϥϐϡλ
    43

    View Slide

  44. NSString͸ UTF-16
    ॻهૉΫϥελͱ͸Χ΢ϯτ͕ҟͳΔ
    let flag = "\u{1F1EF}\u{1F1F5}" // "!"
    flag.characters.count // 1 "ॻهૉΫϥελ͸ 1
    flag.unicodeScalars.count // 2 "ίʔυϙΠϯτ2ݸ
    flag.utf16.count // 4 "2ίʔυϙΠϯτ × 2ίʔυϢχοτ
    flag.utf8.count // 8 "2ίʔυϙΠϯτ × 4ίʔυϢχοτ
    (flag as NSString).length // 4 "utf16.countͱҰக
    44

    View Slide

  45. ͡Ό͋ String.utf16 Λ࢖͑͹
    ͍͍ʁ
    ͦΕ͸Ͳ͏ͩΖ͏…
    4 ͨͱ͑ String.utf16 ͱޓ׵ੑ͕͋Δͱͯ͠΋ɺ
    Range → NSRange ͷม׵͸ਏ͍
    4 String.Index Λ Int ʹม׵͢Δͷखؒͩ͠…5
    4 ͱ͍͏͔ɺͦ͜ͰϛεΛ൜ͨ͘͠ͳ͍
    5
    ͋͑ͯͦ͏͍͏σβΠϯʹͳ͍ͬͯΔཧ༝Λߟ͑Δࢀߟهࣄ
    Swift 3ͷStringͷViewʹରͯ͠ɺIntͰsubscriptग़དྷͳ͍ཧ༝ – SwiftɾiOSίϥϜ – Medium
    https://medium.com/swift-column/swift-string-7147f3f496b1#.x1n3vrh1p
    45

    View Slide

  46. ͩͬͨΒɺ࠷ॳ͔Β΋͏
    CocoaͷੈքͰॲཧΛดͨ͡
    ΄͏͕ྑ͍Ͱ͢Ͷ
    46

    View Slide


  47. let range = str.range(of: target)!
    // Range Λ NSRange ʹม׵
    let nsRange = NSRange(
    location: str.distance(from: str.startIndex,
    to: range.lowerBound),
    length: target.characters.count
    )

    let nsStr = NSString(string: str)
    // Ұ౓NSStringͷੈքʹೖΕ͹…
    let nsRange = nsStr.range(of: target)
    // RangeΛհ͞ͳ͍ͷͰɺม׵ॲཧෆཁ
    47

    View Slide

  48. 48

    View Slide

  49. 49

    View Slide

  50. !
    50

    View Slide

  51. ΋͏Ұ౓ɺࠓ೔ͷ݁࿦
    4 String ͱ NSString ͸ผ෺
    4 Range ͱ NSRange ΋ผ෺
    4 Cocoaੈք͕ؔΘΔจࣈྻૢ࡞͸ɺඞͣ
    NSString ͷAPIΛ௨͠·͠ΐ͏
    4 ͋ͱɺ࢓༷ʹʮจࣈ਺ΛΧ΢ϯτͯ͠Ӡʑʯ͕
    ग़͖ͯͨΒ਎ߏ͑·͠ΐ͏
    51

    View Slide

  52. ࢀߟϦϯΫ
    Unicode ͷαϩήʔτϖΞͱ͸Կ͔ - ͻͩ·Γιέοτ͸յΕͳ͍
    http://vividcode.hatenablog.com/entry/unicode/surrogate-pair
    ͳͥSwiftͷจࣈྻAPI͸೉͍͠ͷ͔ | ϓϩάϥϛϯά | POSTD
    http://postd.cc/why-is-swifts-string-api-so-hard/
    Unicodeͱ͸ʁ ͦͷྺ࢙ͱਐԽɺ։ൃऀ޲͚جૅ஌ࣝ - Build Insider
    http://www.buildinsider.net/language/csharpunicode/01
    UnicodeͱɺC#Ͱͷจࣈྻͷѻ͍ - Build Insider
    http://www.buildinsider.net/language/csharpunicode/02
    Swift 3ͷStringͷViewʹରͯ͠ɺIntͰsubscriptग़དྷͳ͍ཧ༝ – SwiftɾiOSίϥϜ – Medium
    https://medium.com/swift-column/swift-string-7147f3f496b1#.x1n3vrh1p
    52

    View Slide