Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

SwiftのStringの文字の数え方を完全理解する

 SwiftのStringの文字の数え方を完全理解する

SwiftはUnicodeの扱いに非常に長けた言語であり、絵文字を含む文字列でも正しい文字数を計算してくれます。
その反面、Unicodeの複雑さに引きずられてしまい、直感的な操作ができない時もあります。たとえば、 string[2] と書いても3番目の文字を取得することはできません。

そんな複雑なところのあるSwiftの文字列処理ですが、複雑なものを受け入れてきちんと理解するのはそこまで難しいものではありません。

このトークでは、Unicodeとの関係を意識しながら、Swiftの文字数の扱い方とその裏にある考え方を解説します。

Takanori Hirobe

September 05, 2019
Tweet

More Decks by Takanori Hirobe

Other Decks in Programming

Transcript

  1. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" let first = array[0]
  2. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" let first = array[0] let first = array[array.startIndex] or better
  3. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" let first = array[0]
  4. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" let first = array[0] Int ഑ྻͷཁૉͷܕ
  5. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int
  6. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let first = string[0]
  7. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let first = string[0]
  8. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let first = string[string.startIndex]
  9. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let first = string[string.startIndex] String.Index Character
  10. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let startIdx = str.startIndex let index = str.index(startIdx, offsetBy: 5) let fifthChar = str[index]
  11. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let startIdx = str.startIndex let index = str.index(startIdx, offsetBy: 5) let fifthChar = str[index]
  12. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character
  13. Collection var array = [ , 20, 30, 40, 50]

    var string = “ bc͍͋͏͓͑” ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character 10 Λ 100 ʹม͑Δ a Λ X ʹม͑Δ a 10
  14. Collection X 100 var array = [ , 20, 30,

    40, 50] var string = “ bc͍͋͏͓͑” ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character
  15. Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character array[0] = 100
  16. Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character array[0] = 100 "
  17. Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character
  18. Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character string[string.startIndex] = "X"
  19. Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character string[string.startIndex] = "X" #
  20. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG
  21. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG
  22. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG
  23. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG count͸5 count͸8
  24. Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG
  25. จࣈ ͺ ͺ Unicode
 Scalar ͺ U+3071 ͸ U+306F 㿇

    U+309A ʮͺʯΛද͢εΧϥ஋(͋Δछͷූ߸)୯ମͰʮͺʯΛද͢
  26. จࣈ ͺ ͺ Unicode
 Scalar ͺ U+3071 ͸ U+306F 㿇

    U+309A ʮ͸ʯͱʮ㿇 ʯΛ૊Έ߹ΘͤͯʮͺʯΛද͢ɻ
  27. U+306F U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ

    a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  28. U+306F U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ

    a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 1 2 3 4
  29. U+306F U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ

    a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  30. U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F
  31. U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  32. U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  33. U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  34. U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  35. U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ Nonspacing Mark
  36. U+61 U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  37. U+61 U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  38. U+61: ʮaʯ U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+0D

    จࣈ ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  39. U+61: ʮaʯ U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+0D

    จࣈ ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ ͺ
  40. U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  41. U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  42. U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  43. U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D: Carriage Return จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  44. U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D: Carriage Return จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 a
  45. U+0A U+1F146 U+200D U+1F692 U+0D: Carriage Return จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  46. U+1F146 U+200D U+1F692 U+0D: Carriage Return U+0A จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  47. U+1F146 U+200D U+1F692 U+0D: Carriage Return U+0A จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  48. U+1F146 U+200D U+1F692 U+0D: Carriage Return U+0A: Line Feed จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  49. U+1F146 U+200D U+1F692 U+0D: Carriage Return U+0A: Line Feed จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 \r\n ͷ૊Έ߹Θͤ͸ ಛผѻ͍
  50. U+1F146 U+200D U+1F692 จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  51. U+200D U+1F692 U+1F1EF จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  52. U+200D U+1F692 U+1F1EF จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  53. U+200D U+1F692 U+1F1EF: จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J
  54. U+200D U+1F692 U+1F1EF: จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 ࠃίʔυͷʮJʯ J
  55. U+200D U+1F1EF: U+1F1F5 จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J
  56. U+200D U+1F1EF: U+1F1F5 จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J
  57. U+200D U+1F1EF: U+1F1F5: จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J P
  58. U+200D U+1F1EF: U+1F1F5: จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J P ࠃίʔυʮJʯ+ʮPʯ Ͱ೔ຊ
  59. U+200D จࣈ ͺ a \r \n & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  60. U+200D จࣈ ͺ a \r \n & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 1 2 3 4
  61. U+200D จࣈ ͺ a \r \n & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 1 2 3 4 ΋͠\nͰ͸ͳͯ͘ʮbʯ ͩͬͨΒ
  62. U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4
  63. U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 \r ͱ b ͸݁߹͠ͳ͍
  64. U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5
  65. U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5 4จࣈ໨ͩͬͨ΋ͷ͕ 5จࣈ໨ʹ
  66. U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5 N൪໨ͷจࣈΛ஌ΔͨΊʹ͸ɺ ͦͷલʹ͋ΔจࣈΛ஌Δඞཁ͕͋Δɻ
  67. U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5 N൪໨ͷจࣈΛ஌ΔͨΊʹ͸ɺ ͦͷલʹ͋ΔจࣈΛ஌Δඞཁ͕͋Δɻ ‑ N൪໨ͷจࣈΛऔಘ͢ΔͨΊͷܭࢉྔ͸O(N)
  68. U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5 N൪໨ͷจࣈΛ஌ΔͨΊʹ͸ɺ ͦͷલʹ͋ΔจࣈΛ஌Δඞཁ͕͋Δɻ ‑ N൪໨ͷจࣈΛऔಘ͢ΔͨΊͷܭࢉྔ͸ O(N)
  69. จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 var string = “͏abc" string[string.startIndex] = "X"
  70. จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 var string = “͏abc" string[string.startIndex] = "X"
  71. จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62

    U+63 var string = “͏abc" string[string.startIndex] = "X"
  72. จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62 U+63
  73. จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 UTF-8 E3 81 85 61 62 63 จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62 U+63 UTF-8 58 61 62 63
  74. จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 UTF-8 E3 81 85 61 62 63 จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62 U+63 UTF-8 58 61 62 63
  75. จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 UTF-8 E3 81 85 61 62 63 όΠτ෯ 3 1 1 1 จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62 U+63 UTF-8 58 61 62 63 όΠυ෯ 1 1 1 1