Upgrade to Pro — share decks privately, control downloads, hide ads and more …

「文字数」ってなぁに?〜String, NSString, Unicodeの基本〜

704056da9a4c4e075ad14479beaebab7?s=47 takasek
November 16, 2016

「文字数」ってなぁに?〜String, NSString, Unicodeの基本〜

第5回スタートアップiOS勉強会( http://connpass.com/event/43260/ )での発表資料です。

## 参考リンク

Unicode のサロゲートペアとは何か - ひだまりソケットは壊れない
http://vividcode.hatenablog.com/entry/unicode/surrogate-pair

なぜSwiftの文字列APIは難しいのか | プログラミング | POSTD
http://postd.cc/why-is-swifts-string-api-so-hard/

Unicodeとは? その歴史と進化、開発者向け基礎知識 - Build Insider
http://www.buildinsider.net/language/csharpunicode/01

Unicodeと、C#での文字列の扱い - Build Insider
http://www.buildinsider.net/language/csharpunicode/02

Swift 3のStringのViewに対して、Intでsubscript出来ない理由 – Swift・iOSコラム – Medium
https://medium.com/swift-column/swift-string-7147f3f496b1#.x1n3vrh1p

704056da9a4c4e075ad14479beaebab7?s=128

takasek

November 16, 2016
Tweet

Transcript

  1. ʮจࣈ਺ʯͬͯͳ͊ʹʁ ʙString, NSString, Unicodeͷجຊʙ by takasek 2016/11/16 ୈ5ճελʔτΞοϓiOSษڧձ 1

  2. 2

  3. 3

  4. ݁࿦ ࠓ೔͸͜Ε͚͍֮ͩ͑ͯͩ͘͞ 4 String ͱ NSString ͸ผ෺ 4 Range<String.Index> ͱ

    NSRange ΋ผ෺ 4 Cocoaੈք͕ؔΘΔจࣈྻૢ࡞͸ɺඞͣ NSString ͷAPIΛ௨͠·͠ΐ͏ 4
  5. 5

  6. 6

  7. ਖ਼͍͠ range͕ ࡞Εͳ͍! 7

  8. String ͱ NSString Կ͕ҧ͏ʁ 4 struct ͱ class? 4 Swift

    ͱ Obj-C? 8
  9. String ͱ NSString Կ͕ҧ͏ʁ 4 struct ͱ class? 4 Swift

    ͱ Obj-C? ͦΕҎ্ʹॏཁͳҧ͍͕͋Γ·͢ 9
  10. String ͱ NSString Կ͕ҧ͏ʁ 4 NSString 4 ಺෦తʹ͸UTF-16ͰόΠτྻΛอ࣋ 4 UTF-16ͷόΠτྻΛૢ࡞͢ΔͨΊͷAPIΛఏڙ

    4 String 4 ಺෦ͷόΠτྻ͸Ӆṭ 4 จࣈྻૢ࡞ͷͨΊʹɺॻهૉΫϥελ͓Αͼ֤छ UnicodeͷίʔυϢχοτͷViewΛఏڙ͍ͯ͠Δ 10
  11. !ʮ❓❓❓ʯ 11

  12. View? ॻهૉΫϥελ?? ֤छUnicodeͷ ίʔυϢχοτ??? 12

  13. !ʮ…ʯ 13

  14. Unicodeͱ͸ 1 ූ߸Խจࣈू߹΍จࣈූ߸ԽํࣜͳͲΛఆΊͨɺ จࣈίʔυͷۀքن֨Ͱ͋Δɻ จࣈू߹͕୯Ұͷେن໛จࣈηοτͰ͋Δ͜ͱ ʢʮUniʯͱ͍͏໊͸ͦΕʹ༝དྷ͢Δʣ ͳͲ͕ಛ௃Ͱ͋Δɻ 1 ग़య: https://ja.wikipedia.org/wiki/Unicode

    14
  15. ͭ·Γʁ 4 21bitͷ੔਺஋ۭؒʢίʔυۭؒʣ Λ༻ҙ 4 ੔਺஋ʹจࣈΛׂΓ౰ͯ(ූ߸Խ) 4 ׂΓ౰ͯΒΕͨ஋ΛɺίʔυϙΠϯτͱݺͿ 15

  16. 16

  17. 4 ίʔυϙΠϯτ͸ U+(16ਐ਺) Ͱදݱ͢Δ 4 BMP ʢجຊଟݴޠ໘ʣ = U+0000 ʙ

    U+FFFF 4 SMP ʢ௥Ճ໘ʣ = U+10000 ʙ ͔͠͠… 21bitͷίʔυۭؒΛɺ Ͳ͏΍࣮ͬͯࡍͷόΠτྻͱͯ͠ දݱ͢Δ͔ʁ 17
  18. ΤϯίʔσΟϯά ཁ͸ɺಛఆͷίʔυϙΠϯτΛදݱ͢ΔͨΊͷ ࠷খ୯Ґ(ίʔυϢχοτ)ΛԿbitʹ͢Δ͔ɺͱ͍͏࿩ɻ ಉ͡ίʔυϙΠϯτͰ΋ɺόΠτྻදݱʹ͸όϦΤʔγϣϯ͕͋Δɻ2 2 ਤͷग़య: http://www.unicode.org/versions/Unicode6.2.0/ch02.pdf 18

  19. UTF-32 4 1ίʔυϢχοτ = 32bit(4όΠτ) 4 1ίʔυϙΠϯτ = 4όΠτͷݻఆ௕ ※ίʔυϢχοτ(32bit)͕ίʔυۭؒ(21bit)ΑΓେ͖͍ͷͰɺݻఆ௕ʹͰ͖

    Δ 19
  20. UTF-16 4 1ίʔυϢχοτ = 16bit(2όΠτ) 4 1ίʔυϙΠϯτ = 2όΠτ or

    4όΠτͷՄม௕ ※Unicode 1.0.0ͷࠒ͸ɺ͜ͷൣғ಺ʹੈքதͷจࣈ͕ऩ·Δ૝ఆͩͬͨΜ͚ͩ Ͳ… 20
  21. UTF-8 4 1ίʔυϢχοτ = 8bit(1όΠτ) 4 1ίʔυϙΠϯτ = 1όΠτʙ4όΠτͷՄม௕ 21

  22. !ʮͤΜͤʔʯ 22

  23. !ʮίʔυϙΠϯτͱ͔ ݴΘΕͯ΋ ෼͔ΓͮΒ͍ΜͰ͚͢Ͳʯ 23

  24. !ʮཁ͸ɺ ίʔυϙΠϯτ = 1จࣈ ͬͯ͜ͱͰ͢ΑͶʯ 24

  25. !ʮͱɺ ࢥ͏͡ΌΜʁʯ 25

  26. ந৅จࣈ(abstract character) 4 ਓ͕ؒࢥ͍ඳ͘ʮจࣈʯ 4 ΧʔιϧΩʔͷҠಈ୯Ґ ɹͱߟ͑Δͱ͍͍ 4 ந৅จࣈͱූ߸Խจࣈ͸ ɹ

    ଟରଟ ͷؔ܎ 26
  27. ͞ΒʹɺίʔυϙΠϯτʹ͸૊Έ߹Θͤͯ࢖͏΋ͷ΋ = ॻهૉΫϥελ (grapheme cluster)3 3 ग़య: Unicodeͱ͸ʁ ͦͷྺ࢙ͱਐԽɺ։ൃऀ޲͚جૅ஌ࣝ http://www.buildinsider.net/language/csharpunicode/01

    27
  28. ·ͱΊ 4 ந৅จࣈͷ࡞Γ͔ͨ 4 ίʔυۭؒ(21bit)಺ʹίʔυϙΠϯτ͕͋ΔͷͰɺ 4 ׂΓ౰ͯΒΕͨූ߸Խจࣈ , Λ૊Έ߹Θͤͯ 4

    ந৅จࣈ / ॻهૉΫϥελ Λ࡞Δ 4 ΤϯίʔσΟϯάͱ͸ 4 ಛఆͷ ίʔυϙΠϯτ (࠷େ21bit)Λ 4 نఆαΠζͷ ίʔυϢχοτ (8bit / 16bit / 32bit) ʹ٧Ίࠐ Ήํ๏ 28
  29. ༨ஊ άϦϑ(glyph)ͱ͍͏·ͨผͷ֓೦΋͋Δ 4 จࣈͷݟͨ໨ʢҐஔɺαΠζʣͷ৘ใ 4 จࣈ͸ඞͣࠨ͔ΒӈʹྲྀΕΔΘ͚͡Όͳ͍ Ͱ͠ΐɺͬͯ͜ͱ 4 CoreTextͰ؅ཧ͞ΕΔ…ͷ͚ͩͲɺࠓճ͸ল ུ

    29
  30. ͱ΋͋Εɺ StringͷAPI͕ཧղͰ͖ͨ 30

  31. ܁Γฦ͢ͱɺ Swift.String͸ɺॻهૉΫϥελ͓Αͼ ֤छUnicodeͷίʔυϢχοτͷView Λఏڙ͍ͯ͠Δ 4 String.CharacterView 4 String.UnicodeScalarView 4 String.UTF8View

    4 String.UTF16View 31
  32. let str = "\u{41}\u{3A9}\u{8A9E}\u{10384}" str.characters.count //4 !ॻهૉΫϥελ str.unicodeScalars.count // 4

    !UTF-32ͰͷίʔυϢχοτ਺ str.utf16.count // 5 !UTF-16ͰͷίʔυϢχοτ਺ str.utf8.count // 10 !UTF-8ͰͷίʔυϢχοτ਺ 32
  33. String.CharacterView͸ ڧ͍ ߹ࣈ΋equalͰ൑ผͯ͘͠ΕΔ let cafe1 = "Cafe\u{301}" let cafe2 =

    "Café" print(cafe1 == cafe2) // true 33
  34. ͨͩɺ෺ʹΑͬͯ͸1จࣈͱͯ͠ೝࣝ͞Εͳ͍͜ͱ ΋͋Δ "!".characters.count // 4 34

  35. Unicodeͷ ϧʔϧ΍ จࣈ͸ɺ ૿͑Δɻ ͭ·Γ ʮ1จࣈʯͷ൑அ͸؀ڥґଘ 4 4 Unicodeͱ͸ʁ ͦͷྺ࢙ͱਐԽɺ։ൃऀ޲͚جૅ஌

    ࣝ - Build Insider http://www.buildinsider.net/ language/csharpunicode/01 35
  36. ͍ͣΕʹͤΑɺ SwiftͷString API͸ Α͘ߟ͑ΒΕ͍ͯ· ͢ɻ 36

  37. ͬͯ͜ͱͰ NSStringͷ͜ͱ͸ ΋͏๨Εͯ Swift.Stringͱָ͘͠ աͯ͝͠Լ͍͞! 37

  38. ͓ΘΓ 38

  39. …͡Όͳ͍ɻ 39

  40. 40

  41. NSAttributedTextɾ UITextView ʮ΍͊!ʯ 41

  42. ʮFoundationʹࠜΛԼΖ͠ NSStringͱڞʹੜ͖Α͏ʯ © 1986 Studio Ghibliʗఱۭͷ৓ϥϐϡλ 42

  43. ͲΜͳʹྑ͘ઃܭ͞Εͨ ValueTypeΛ࡞ͬͯ΋ɺ Cocoa͔Β཭Εͯ͸ ੜ͖ΒΕͳ͍ͷΑʂ © 1986 Studio Ghibliʗఱۭͷ৓ϥϐϡλ 43

  44. NSString͸ UTF-16 ॻهૉΫϥελͱ͸Χ΢ϯτ͕ҟͳΔ let flag = "\u{1F1EF}\u{1F1F5}" // "!" flag.characters.count

    // 1 "ॻهૉΫϥελ͸ 1 flag.unicodeScalars.count // 2 "ίʔυϙΠϯτ2ݸ flag.utf16.count // 4 "2ίʔυϙΠϯτ × 2ίʔυϢχοτ flag.utf8.count // 8 "2ίʔυϙΠϯτ × 4ίʔυϢχοτ (flag as NSString).length // 4 "utf16.countͱҰக 44
  45. ͡Ό͋ String.utf16 Λ࢖͑͹ ͍͍ʁ ͦΕ͸Ͳ͏ͩΖ͏… 4 ͨͱ͑ String.utf16 ͱޓ׵ੑ͕͋Δͱͯ͠΋ɺ Range<String.Index>

    → NSRange ͷม׵͸ਏ͍ 4 String.Index Λ Int ʹม׵͢Δͷखؒͩ͠…5 4 ͱ͍͏͔ɺͦ͜ͰϛεΛ൜ͨ͘͠ͳ͍ 5 ͋͑ͯͦ͏͍͏σβΠϯʹͳ͍ͬͯΔཧ༝Λߟ͑Δࢀߟهࣄ Swift 3ͷStringͷViewʹରͯ͠ɺIntͰsubscriptग़དྷͳ͍ཧ༝ – SwiftɾiOSίϥϜ – Medium https://medium.com/swift-column/swift-string-7147f3f496b1#.x1n3vrh1p 45
  46. ͩͬͨΒɺ࠷ॳ͔Β΋͏ CocoaͷੈքͰॲཧΛดͨ͡ ΄͏͕ྑ͍Ͱ͢Ͷ 46

  47. ❌ let range = str.range(of: target)! // Range<String.Index> Λ NSRange

    ʹม׵ let nsRange = NSRange( location: str.distance(from: str.startIndex, to: range.lowerBound), length: target.characters.count ) ⭕ let nsStr = NSString(string: str) // Ұ౓NSStringͷੈքʹೖΕ͹… let nsRange = nsStr.range(of: target) // Range<String.Index>Λհ͞ͳ͍ͷͰɺม׵ॲཧෆཁ 47
  48. 48

  49. 49

  50. ! 50

  51. ΋͏Ұ౓ɺࠓ೔ͷ݁࿦ 4 String ͱ NSString ͸ผ෺ 4 Range<String.Index> ͱ NSRange

    ΋ผ෺ 4 Cocoaੈք͕ؔΘΔจࣈྻૢ࡞͸ɺඞͣ NSString ͷAPIΛ௨͠·͠ΐ͏ 4 ͋ͱɺ࢓༷ʹʮจࣈ਺ΛΧ΢ϯτͯ͠Ӡʑʯ͕ ग़͖ͯͨΒ਎ߏ͑·͠ΐ͏ 51
  52. ࢀߟϦϯΫ Unicode ͷαϩήʔτϖΞͱ͸Կ͔ - ͻͩ·Γιέοτ͸յΕͳ͍ http://vividcode.hatenablog.com/entry/unicode/surrogate-pair ͳͥSwiftͷจࣈྻAPI͸೉͍͠ͷ͔ | ϓϩάϥϛϯά |

    POSTD http://postd.cc/why-is-swifts-string-api-so-hard/ Unicodeͱ͸ʁ ͦͷྺ࢙ͱਐԽɺ։ൃऀ޲͚جૅ஌ࣝ - Build Insider http://www.buildinsider.net/language/csharpunicode/01 UnicodeͱɺC#Ͱͷจࣈྻͷѻ͍ - Build Insider http://www.buildinsider.net/language/csharpunicode/02 Swift 3ͷStringͷViewʹରͯ͠ɺIntͰsubscriptग़དྷͳ͍ཧ༝ – SwiftɾiOSίϥϜ – Medium https://medium.com/swift-column/swift-string-7147f3f496b1#.x1n3vrh1p 52