SwiftのStringの文字の数え方を完全理解する

 SwiftのStringの文字の数え方を完全理解する

SwiftはUnicodeの扱いに非常に長けた言語であり、絵文字を含む文字列でも正しい文字数を計算してくれます。
その反面、Unicodeの複雑さに引きずられてしまい、直感的な操作ができない時もあります。たとえば、 string[2] と書いても3番目の文字を取得することはできません。

そんな複雑なところのあるSwiftの文字列処理ですが、複雑なものを受け入れてきちんと理解するのはそこまで難しいものではありません。

このトークでは、Unicodeとの関係を意識しながら、Swiftの文字数の扱い方とその裏にある考え方を解説します。

Da1305e5281b5208da85d14a356d01f3?s=128

Takanori Hirobe

September 05, 2019
Tweet

Transcript

  1. 9.
  2. 16.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" let first = array[0]
  3. 17.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" let first = array[0] let first = array[array.startIndex] or better
  4. 18.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" let first = array[0]
  5. 19.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" let first = array[0] Int ഑ྻͷཁૉͷܕ
  6. 20.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int
  7. 21.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let first = string[0]
  8. 22.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let first = string[0]
  9. 23.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let first = string[string.startIndex]
  10. 24.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let first = string[string.startIndex] String.Index Character
  11. 25.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let startIdx = str.startIndex let index = str.index(startIdx, offsetBy: 5) let fifthChar = str[index]
  12. 26.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int let startIdx = str.startIndex let index = str.index(startIdx, offsetBy: 5) let fifthChar = str[index]
  13. 27.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character
  14. 28.

    Collection var array = [ , 20, 30, 40, 50]

    var string = “ bc͍͋͏͓͑” ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character 10 Λ 100 ʹม͑Δ a Λ X ʹม͑Δ a 10
  15. 29.

    Collection X 100 var array = [ , 20, 30,

    40, 50] var string = “ bc͍͋͏͓͑” ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character
  16. 30.

    Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character array[0] = 100
  17. 31.

    Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character array[0] = 100 "
  18. 32.

    Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character
  19. 33.

    Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character string[string.startIndex] = "X"
  20. 34.

    Collection var array = [100, 20, 30, 40, 50] var

    string = "Xbc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character string[string.startIndex] = "X" #
  21. 35.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG
  22. 36.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG
  23. 37.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG
  24. 38.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG count͸5 count͸8
  25. 39.

    Collection let array = [10, 20, 30, 40, 50] let

    string = "abc͍͋͏͓͑" ఴࣈΞΫηε: Int ཁૉͷܕ: Int ఴࣈʹΑΔมߋ: OK ఴࣈΞΫηε: String.Index ཁૉͷܕ: Character ఴࣈʹΑΔมߋ: NG
  26. 43.
  27. 44.
  28. 47.

    จࣈ ͺ ͺ Unicode
 Scalar ͺ U+3071 ͸ U+306F 㿇

    U+309A ʮͺʯΛද͢εΧϥ஋(͋Δछͷූ߸)୯ମͰʮͺʯΛද͢
  29. 49.

    จࣈ ͺ ͺ Unicode
 Scalar ͺ U+3071 ͸ U+306F 㿇

    U+309A ʮ͸ʯͱʮ㿇 ʯΛ૊Έ߹ΘͤͯʮͺʯΛද͢ɻ
  30. 61.
  31. 62.
  32. 63.
  33. 64.
  34. 79.

    U+306F U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ

    a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  35. 80.

    U+306F U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ

    a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 1 2 3 4
  36. 81.

    U+306F U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ

    a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  37. 82.

    U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F
  38. 83.

    U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  39. 84.

    U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  40. 85.

    U+309A U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  41. 86.

    U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  42. 87.

    U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+61 U+0D จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ Nonspacing Mark
  43. 88.

    U+61 U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  44. 89.

    U+61 U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  45. 90.

    U+61: ʮaʯ U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+0D

    จࣈ ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ
  46. 91.

    U+61: ʮaʯ U+309A: ʮ ʄʯ U+0A U+1F146 U+200D U+1F692 U+0D

    จࣈ ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 U+306F: ͻΒ͕ͳͷʮ͸ʯ ͺ
  47. 92.

    U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  48. 93.

    U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  49. 94.

    U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  50. 95.

    U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D: Carriage Return จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  51. 96.

    U+61: ʮaʯ U+0A U+1F146 U+200D U+1F692 U+0D: Carriage Return จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 a
  52. 97.

    U+0A U+1F146 U+200D U+1F692 U+0D: Carriage Return จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  53. 98.

    U+1F146 U+200D U+1F692 U+0D: Carriage Return U+0A จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  54. 99.

    U+1F146 U+200D U+1F692 U+0D: Carriage Return U+0A จࣈ ͺ a

    \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  55. 100.

    U+1F146 U+200D U+1F692 U+0D: Carriage Return U+0A: Line Feed จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  56. 101.

    U+1F146 U+200D U+1F692 U+0D: Carriage Return U+0A: Line Feed จࣈ

    ͺ a \r \n & Unicode
 Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 \r\n ͷ૊Έ߹Θͤ͸ ಛผѻ͍
  57. 102.

    U+1F146 U+200D U+1F692 จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  58. 103.

    U+200D U+1F692 U+1F1EF จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  59. 104.

    U+200D U+1F692 U+1F1EF จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  60. 105.

    U+200D U+1F692 U+1F1EF: จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J
  61. 106.

    U+200D U+1F692 U+1F1EF: จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 ࠃίʔυͷʮJʯ J
  62. 107.

    U+200D U+1F1EF: U+1F1F5 จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J
  63. 108.

    U+200D U+1F1EF: U+1F1F5 จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J
  64. 109.

    U+200D U+1F1EF: U+1F1F5: จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J P
  65. 110.

    U+200D U+1F1EF: U+1F1F5: จࣈ ͺ a \r \n & Unicode


    Scalar U+306F U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 J P ࠃίʔυʮJʯ+ʮPʯ Ͱ೔ຊ
  66. 111.

    U+200D จࣈ ͺ a \r \n & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5
  67. 112.

    U+200D จࣈ ͺ a \r \n & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 1 2 3 4
  68. 113.

    U+200D จࣈ ͺ a \r \n & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+0A U+1F1EF U+1F1F5 1 2 3 4 ΋͠\nͰ͸ͳͯ͘ʮbʯ ͩͬͨΒ
  69. 114.

    U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4
  70. 115.

    U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 \r ͱ b ͸݁߹͠ͳ͍
  71. 116.

    U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5
  72. 117.

    U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5 4จࣈ໨ͩͬͨ΋ͷ͕ 5จࣈ໨ʹ
  73. 118.

    U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5 N൪໨ͷจࣈΛ஌ΔͨΊʹ͸ɺ ͦͷલʹ͋ΔจࣈΛ஌Δඞཁ͕͋Δɻ
  74. 119.

    U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5 N൪໨ͷจࣈΛ஌ΔͨΊʹ͸ɺ ͦͷલʹ͋ΔจࣈΛ஌Δඞཁ͕͋Δɻ ‑ N൪໨ͷจࣈΛऔಘ͢ΔͨΊͷܭࢉྔ͸O(N)
  75. 120.

    U+200D จࣈ ͺ a \r b & Unicode
 Scalar U+306F

    U+309A U+61 U+0D U+62 U+1F1EF U+1F1F5 1 2 3 4 5 N൪໨ͷจࣈΛ஌ΔͨΊʹ͸ɺ ͦͷલʹ͋ΔจࣈΛ஌Δඞཁ͕͋Δɻ ‑ N൪໨ͷจࣈΛऔಘ͢ΔͨΊͷܭࢉྔ͸ O(N)
  76. 142.

    จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 var string = “͏abc" string[string.startIndex] = "X"
  77. 143.

    จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 var string = “͏abc" string[string.startIndex] = "X"
  78. 144.

    จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62

    U+63 var string = “͏abc" string[string.startIndex] = "X"
  79. 145.

    จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62 U+63
  80. 146.

    จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 UTF-8 E3 81 85 61 62 63 จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62 U+63 UTF-8 58 61 62 63
  81. 147.

    จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 UTF-8 E3 81 85 61 62 63 จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62 U+63 UTF-8 58 61 62 63
  82. 148.

    จࣈ ͏ a b c Unicode
 Scalar U+3046 U+61 U+62

    U+63 UTF-8 E3 81 85 61 62 63 όΠτ෯ 3 1 1 1 จࣈ X a b c Unicode
 Scalar U+58 U+61 U+62 U+63 UTF-8 58 61 62 63 όΠυ෯ 1 1 1 1
  83. 149.
  84. 150.
  85. 151.