Upgrade to Pro — share decks privately, control downloads, hide ads and more …

「文字数」ってなぁに?〜String, NSString, Unicodeの基本〜

takasek
November 16, 2016

「文字数」ってなぁに?〜String, NSString, Unicodeの基本〜

第5回スタートアップiOS勉強会( http://connpass.com/event/43260/ )での発表資料です。

## 参考リンク

Unicode のサロゲートペアとは何か - ひだまりソケットは壊れない
http://vividcode.hatenablog.com/entry/unicode/surrogate-pair

なぜSwiftの文字列APIは難しいのか | プログラミング | POSTD
http://postd.cc/why-is-swifts-string-api-so-hard/

Unicodeとは? その歴史と進化、開発者向け基礎知識 - Build Insider
http://www.buildinsider.net/language/csharpunicode/01

Unicodeと、C#での文字列の扱い - Build Insider
http://www.buildinsider.net/language/csharpunicode/02

Swift 3のStringのViewに対して、Intでsubscript出来ない理由 – Swift・iOSコラム – Medium
https://medium.com/swift-column/swift-string-7147f3f496b1#.x1n3vrh1p

takasek

November 16, 2016
Tweet

More Decks by takasek

Other Decks in Programming

Transcript

  1. 2

  2. 3

  3. ݁࿦ ࠓ೔͸͜Ε͚͍֮ͩ͑ͯͩ͘͞ 4 String ͱ NSString ͸ผ෺ 4 Range<String.Index> ͱ

    NSRange ΋ผ෺ 4 Cocoaੈք͕ؔΘΔจࣈྻૢ࡞͸ɺඞͣ NSString ͷAPIΛ௨͠·͠ΐ͏ 4
  4. 5

  5. 6

  6. String ͱ NSString Կ͕ҧ͏ʁ 4 struct ͱ class? 4 Swift

    ͱ Obj-C? ͦΕҎ্ʹॏཁͳҧ͍͕͋Γ·͢ 9
  7. String ͱ NSString Կ͕ҧ͏ʁ 4 NSString 4 ಺෦తʹ͸UTF-16ͰόΠτྻΛอ࣋ 4 UTF-16ͷόΠτྻΛૢ࡞͢ΔͨΊͷAPIΛఏڙ

    4 String 4 ಺෦ͷόΠτྻ͸Ӆṭ 4 จࣈྻૢ࡞ͷͨΊʹɺॻهૉΫϥελ͓Αͼ֤छ UnicodeͷίʔυϢχοτͷViewΛఏڙ͍ͯ͠Δ 10
  8. 16

  9. 4 ίʔυϙΠϯτ͸ U+(16ਐ਺) Ͱදݱ͢Δ 4 BMP ʢجຊଟݴޠ໘ʣ = U+0000 ʙ

    U+FFFF 4 SMP ʢ௥Ճ໘ʣ = U+10000 ʙ ͔͠͠… 21bitͷίʔυۭؒΛɺ Ͳ͏΍࣮ͬͯࡍͷόΠτྻͱͯ͠ දݱ͢Δ͔ʁ 17
  10. UTF-16 4 1ίʔυϢχοτ = 16bit(2όΠτ) 4 1ίʔυϙΠϯτ = 2όΠτ or

    4όΠτͷՄม௕ ※Unicode 1.0.0ͷࠒ͸ɺ͜ͷൣғ಺ʹੈքதͷจࣈ͕ऩ·Δ૝ఆͩͬͨΜ͚ͩ Ͳ… 20
  11. ·ͱΊ 4 ந৅จࣈͷ࡞Γ͔ͨ 4 ίʔυۭؒ(21bit)಺ʹίʔυϙΠϯτ͕͋ΔͷͰɺ 4 ׂΓ౰ͯΒΕͨූ߸Խจࣈ , Λ૊Έ߹Θͤͯ 4

    ந৅จࣈ / ॻهૉΫϥελ Λ࡞Δ 4 ΤϯίʔσΟϯάͱ͸ 4 ಛఆͷ ίʔυϙΠϯτ (࠷େ21bit)Λ 4 نఆαΠζͷ ίʔυϢχοτ (8bit / 16bit / 32bit) ʹ٧Ίࠐ Ήํ๏ 28
  12. let str = "\u{41}\u{3A9}\u{8A9E}\u{10384}" str.characters.count //4 !ॻهૉΫϥελ str.unicodeScalars.count // 4

    !UTF-32ͰͷίʔυϢχοτ਺ str.utf16.count // 5 !UTF-16ͰͷίʔυϢχοτ਺ str.utf8.count // 10 !UTF-8ͰͷίʔυϢχοτ਺ 32
  13. 40

  14. NSString͸ UTF-16 ॻهૉΫϥελͱ͸Χ΢ϯτ͕ҟͳΔ let flag = "\u{1F1EF}\u{1F1F5}" // "!" flag.characters.count

    // 1 "ॻهૉΫϥελ͸ 1 flag.unicodeScalars.count // 2 "ίʔυϙΠϯτ2ݸ flag.utf16.count // 4 "2ίʔυϙΠϯτ × 2ίʔυϢχοτ flag.utf8.count // 8 "2ίʔυϙΠϯτ × 4ίʔυϢχοτ (flag as NSString).length // 4 "utf16.countͱҰக 44
  15. ͡Ό͋ String.utf16 Λ࢖͑͹ ͍͍ʁ ͦΕ͸Ͳ͏ͩΖ͏… 4 ͨͱ͑ String.utf16 ͱޓ׵ੑ͕͋Δͱͯ͠΋ɺ Range<String.Index>

    → NSRange ͷม׵͸ਏ͍ 4 String.Index Λ Int ʹม׵͢Δͷखؒͩ͠…5 4 ͱ͍͏͔ɺͦ͜ͰϛεΛ൜ͨ͘͠ͳ͍ 5 ͋͑ͯͦ͏͍͏σβΠϯʹͳ͍ͬͯΔཧ༝Λߟ͑Δࢀߟهࣄ Swift 3ͷStringͷViewʹରͯ͠ɺIntͰsubscriptग़དྷͳ͍ཧ༝ – SwiftɾiOSίϥϜ – Medium https://medium.com/swift-column/swift-string-7147f3f496b1#.x1n3vrh1p 45
  16. ❌ let range = str.range(of: target)! // Range<String.Index> Λ NSRange

    ʹม׵ let nsRange = NSRange( location: str.distance(from: str.startIndex, to: range.lowerBound), length: target.characters.count ) ⭕ let nsStr = NSString(string: str) // Ұ౓NSStringͷੈքʹೖΕ͹… let nsRange = nsStr.range(of: target) // Range<String.Index>Λհ͞ͳ͍ͷͰɺม׵ॲཧෆཁ 47
  17. 48

  18. 49

  19. ΋͏Ұ౓ɺࠓ೔ͷ݁࿦ 4 String ͱ NSString ͸ผ෺ 4 Range<String.Index> ͱ NSRange

    ΋ผ෺ 4 Cocoaੈք͕ؔΘΔจࣈྻૢ࡞͸ɺඞͣ NSString ͷAPIΛ௨͠·͠ΐ͏ 4 ͋ͱɺ࢓༷ʹʮจࣈ਺ΛΧ΢ϯτͯ͠Ӡʑʯ͕ ग़͖ͯͨΒ਎ߏ͑·͠ΐ͏ 51
  20. ࢀߟϦϯΫ Unicode ͷαϩήʔτϖΞͱ͸Կ͔ - ͻͩ·Γιέοτ͸յΕͳ͍ http://vividcode.hatenablog.com/entry/unicode/surrogate-pair ͳͥSwiftͷจࣈྻAPI͸೉͍͠ͷ͔ | ϓϩάϥϛϯά |

    POSTD http://postd.cc/why-is-swifts-string-api-so-hard/ Unicodeͱ͸ʁ ͦͷྺ࢙ͱਐԽɺ։ൃऀ޲͚جૅ஌ࣝ - Build Insider http://www.buildinsider.net/language/csharpunicode/01 UnicodeͱɺC#Ͱͷจࣈྻͷѻ͍ - Build Insider http://www.buildinsider.net/language/csharpunicode/02 Swift 3ͷStringͷViewʹରͯ͠ɺIntͰsubscriptग़དྷͳ͍ཧ༝ – SwiftɾiOSίϥϜ – Medium https://medium.com/swift-column/swift-string-7147f3f496b1#.x1n3vrh1p 52