Natural Language Framework

2594ac7ce91fd7d9a3ce71ca7cc2d0c0?s=47 d_date
June 11, 2018

Natural Language Framework

This slide only contains public information.

Publication
2018/06/11 🦍
2018/06/14 STT

2594ac7ce91fd7d9a3ce71ca7cc2d0c0?s=128

d_date

June 11, 2018
Tweet

Transcript

  1. Natural Language Framework Daiki Matsudate, Freelancer #WWDC18 WWDCΰϦΰϦΩϟονΞοϓձ

  2. Daiki Matsudate @d_date

  3. Note • WWDC 2018Ͱެද͞Εͨbeta൛Ͱఏڙ͞Ε͍ͯΔAPIͷ৘ใͰߏ੒ͯ͠ ͍·͢ • ࠓޙͷbeta releaseͰAPIͷڍಈ͕มΘΔ͔΋͠Ε·ͤΜɻࢀর͢Δࡍ͸͝ ஫ҙ͍ͩ͘͞

  4. NaturalLanguage.framework

  5. Natural Language Intelligence Linguistics Machine Learning Language Identification Tokenization Part

    of Speech Lemmatization Named Entity Recognition Word Sentence Paragraph Natural
 Language Input NaturalLanguage.framework WWDC18 session 713 Introducing Natural Language
  6. • For processing Natural Language • NLTokenizer • NLLanguageRecognizer •

    NLTagger • NLModel NaturalLanguage.framework
  7. NLTokenizer Tokenize text into word, sentence, paragraph or document νϟ΢μʔ৯΂͍ͨʂ

  8. NLTokenizer Tokenize text into word νϟ΢μʔ৯΂͍ͨʂ νϟ΢μʔ ৯΂ ͍ͨ let

    tokenizer = NLTokenizer(unit: .word) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) }
  9. NLTokenizer Tokenize text into word let tokenizer = NLTokenizer(unit: .word)

    tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) }
  10. NLTokenizer Tokenize text into word let tokenizer = NLTokenizer(unit: .word)

    tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } Initialize NLTokenizer with NLTokenUnit Set string to tokenizer Get ranges to tokens Subscript text with range
  11. NLTokenizer Tokenize text into sentence let tokenizer = NLTokenizer(unit: .sentence)

    tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) }
  12. NLTokenizer Tokenize text into sentence νϟ΢μʔ৯΂͍ͨʂ νϟ΢μʔ৯΂͍ͨ let tokenizer =

    NLTokenizer(unit: .sentence) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } ʂ
  13. NLTokenizer Tokenize text into sentence ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻ ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻͰ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔ

    ϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ͏ɻ let tokenizer = NLTokenizer(unit: .sentence) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻaO ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻ Ͱ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ ͏ɻ
  14. NLTokenizer Tokenize text into sentence ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻ ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻͰ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔ

    ϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ͏ɻ let tokenizer = NLTokenizer(unit: .sentence) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻaO ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻ Ͱ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ ͏ɻ
  15. NLTokenizer Tokenize text into paragraph ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻ ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻͰ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔ

    ϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ͏ɻ let tokenizer = NLTokenizer(unit: .paragraph) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻaO ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻ Ͱ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ ͏ɻ
  16. NLTokenizer Tokenize text into document ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻ ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻͰ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔ

    ϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ͏ɻ let tokenizer = NLTokenizer(unit: .document) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻaO ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻ Ͱ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ ͏ɻ
  17. NLLanguageRecognizer Automatically identify the language of text νϟ΢μʔ৯΂͍ͨʂ

  18. νϟ΢μʔ৯΂͍ͨʂ NLLanguage.japanese let recognizer = NLLanguageRecognizer() recognizer.processString("νϟ΢μʔ৯΂͍ͨʂ") print(recognizer.dominantLanguage!) // ja

    NLLanguageRecognizer Automatically identify the language of text
  19. let recognizer = NLLanguageRecognizer() recognizer.processString("νϟ΢μʔ৯΂͍ͨʂ") print(recognizer.dominantLanguage!) // ja NLLanguageRecognizer Automatically

    identify the language of text
  20. let recognizer = NLLanguageRecognizer() recognizer.processString(“Lorem ipsum dolor sit amet”) recognizer.languageHints

    = [.english: 0.3, .portuguese: 0.1] print(recognizer.dominantLanguage!) // en NLLanguageRecognizer Specify language hints with factor Specify language hints [NLLanguage : Double] Check dominant language NLLanguage.english
  21. let recognizer = NLLanguageRecognizer() recognizer.processString(“Lorem ipsum dolor sit amet”) recognizer.languageHints

    = [.french: 0.3, .english: 0.00001] print(recognizer.dominantLanguage!) // en NLLanguageRecognizer Specify language hints with factor Specify language hints [NLLanguage : Double] Check dominant language NLLanguage.french
  22. let recognizer = NLLanguageRecognizer() recognizer.processString(“Lorem ipsum dolor sit amet”) recognizer.languageHints

    = [.french: 0.3, .english: 0.1] print(recognizer.dominantLanguage!) // en NLLanguageRecognizer Specify language hints with factor Specify language probabilities [NLLanguage : Double] Check dominant language NLLanguage.english
  23. let recognizer = NLLanguageRecognizer() recognizer.processString(“νϟ΢μʔ৯΂͍ͨʂ”) recognizer.languageHints = [.french: 0.3, .english:

    0.1] print(recognizer.dominantLanguage!) // en NLLanguageRecognizer Specify language hints with factor Specify language probabilities [NLLanguage : Double] Check dominant language NLLanguage.japanese
  24. let recognizer = NLLanguageRecognizer() recognizer.processString(“Lorem ipsum dolor sit amet”) let

    hyphotheses = recognizer.languageHypotheses(withMaximum: 2) print(hyphotheses) NLLanguageRecognizer Hypotheses languages Hypotheses with maximum language count Hypotheses says probably this is English // [pt: 0.32105109095573425, en: 0.4647941291332245]
  25. NLTagger Analyzes natural language text νϟ΢μʔ৯΂͍ͨʂ let text = "νϟ΢μʔ৯΂͍ͨʂ"

    let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } νϟ΢μʔ ৯΂ ͍ͨ
  26. NLTagger Analyzes natural language text νϟ΢μʔ৯΂͍ͨʂ let text = "νϟ΢μʔ৯΂͍ͨʂ"

    let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } νϟ΢μʔ ৯΂ ͍ͨ OtherWord OtherWord OtherWord Japanese not available for tagging text for standard model
  27. NLTagger Analyzes natural language text Chowder time let text =

    “$IPXEFSUJNF” let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } Chowder Time Noun Whitespace Noun
  28. NLTagger Analyzes natural language text let text = “Chowder time”

    let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } Chowder time Noun Whitespace Noun
  29. NLTagger Analyzes natural language text let text = “Chowder time”

    let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } Pass tagSchemes what you need Set text to tagger.string Get tags for range, unit and scheme Chowder time Noun Whitespace Noun
  30. NLTagger Analyzes natural language text - Lexical Class let text

    = “I could not go WWDC in this year” let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in Pronoun Noun could not go WWDC this year Verb Verb Noun Adverb Preposition Determinator
  31. NLTagger Analyzes natural language text - Lemma let text =

    “I could not go WWDC in this year” let tagger = NLTagger(tagSchemes: [.lemma]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in I year could not go WWDC this year can go WWDC not in this
  32. NLTagger Analyzes natural language text - Name type let text

    = “I could not go WWDC in this year” let tagger = NLTagger(tagSchemes: [.nametype]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in Otherword Otherword could not go WWDC this year Otherword Otherword OrganizationName Otherword Otherword Otherword
  33. NLTagger Analyzes natural language text - Name type or lexicalClass

    let text = “I could not go WWDC in this year” let tagger = NLTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in could not go WWDC this year Pronoun Noun Verb Verb OrganizationName Adverb Preposition Determinator
  34. NSLinguisticTagger in iOS 11

  35. NSLinguisticTagger Analyzes natural language text - Name type or lexicalClass

    let text = “I could not go WWDC in this year” let tagger = NSLinguisticTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: NSRange(string: text)!, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in could not go WWDC this year Pronoun Noun Verb Verb OrganizationName Adverb Preposition Determinator
  36. NSLinguisticTagger Analyzes natural language text - Name type or lexicalClass

    let text = “I could not go WWDC in this year” let tagger = NSLinguisticTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: NSRange(string: text)!, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in could not go WWDC this year Pronoun Noun Verb Verb OrganizationName Adverb Preposition Determinator Same?
  37. Compare NLTagger and NSLinguisticTagger func testNLTagIsEqualToNSLinguisticTag() { let tSchemes: [NLTagScheme]

    = [.language, .lemma, .lexicalClass, .nameTypeOrLexicalClass, .nameType, .tokenType, .script] let lSchemes: [NSLinguisticTagScheme] = [.language, .lemma, .lexicalClass, .nameTypeOrLexicalClass, .nameType, .tokenType, .script] let tTagger = NLTagger(tagSchemes: tSchemes) let lTagger = NSLinguisticTagger(tagSchemes: lSchemes) zip(tSchemes, lSchemes).forEach { (tScheme, lScheme) in print("---- \(tScheme.rawValue) ----") XCTAssertEqual(tScheme.rawValue, lScheme.rawValue) let tags = tTagger.tags(text: text, unit: .word, scheme: tScheme, options: [.omitPunctuation, .omitWhitespace]) let lTags = lTagger.tags(text: text, unit: .word, scheme: lScheme, options: [.omitPunctuation, .omitWhitespace]) zip(tags, lTags).forEach({ (tTag, lTag) in print(tTag.0.rawValue, lTag.0.rawValue, tTag.0.rawValue == lTag.0.rawValue) XCTAssertEqual(tTag.0.rawValue, lTag.0.rawValue) XCTAssertEqual(text[tTag.1], text[Range(lTag.1, in: text)!]) let text = "I could not go WWDC in this year."
  38. Compare NLTagger and NSLinguisticTagger func testNLTagIsEqualToNSLinguisticTag() { let tSchemes: [NLTagScheme]

    = [.language, .lemma, .lexicalClass, .nameTypeOrLexicalClass, .nameType, .tokenType, .script] let lSchemes: [NSLinguisticTagScheme] = [.language, .lemma, .lexicalClass, .nameTypeOrLexicalClass, .nameType, .tokenType, .script] let tTagger = NLTagger(tagSchemes: tSchemes) let lTagger = NSLinguisticTagger(tagSchemes: lSchemes) zip(tSchemes, lSchemes).forEach { (tScheme, lScheme) in print("---- \(tScheme.rawValue) ----") XCTAssertEqual(tScheme.rawValue, lScheme.rawValue) let tags = tTagger.tags(text: text, unit: .word, scheme: tScheme, options: [.omitPunctuation, .omitWhitespace]) let lTags = lTagger.tags(text: text, unit: .word, scheme: lScheme, options: [.omitPunctuation, .omitWhitespace]) zip(tags, lTags).forEach({ (tTag, lTag) in print(tTag.0.rawValue, lTag.0.rawValue, tTag.0.rawValue == lTag.0.rawValue) XCTAssertEqual(tTag.0.rawValue, lTag.0.rawValue) XCTAssertEqual(text[tTag.1], text[Range(lTag.1, in: text)!]) let text = "I could not go WWDC in this year."
  39. Compare NLTagger and NSLinguisticTagger token NLTagger NSLinguisticTagger I I I

    ca nil “” n’t not not
  40. Contractions recognition Actural “I / ca / n’t / go

    / WWDC / in / this / year." Ideal “I / can’t / go / WWDC / in / this / year."
  41. Actural “I / can’t / go / WWDC / in

    / this / year." Ideal “I / can’t / go / WWDC / in / this / year." Contractions recognition .joiningContractions
  42. NLTagger with joining contractions let text = “I could not

    go WWDC in this year” let tagger = NLTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in could not go WWDC this year Pronoun Noun Verb Verb OrganizationName Adverb Preposition Determinator Only available on Natural Language Framework
  43. NLTagger with joining contractions let text = “I could not

    go WWDC in this year” let tagger = NLTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: [.joinContractions]) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in couldn’t go WWDC this year Pronoun Noun Verb Verb OrganizationName Preposition Determinator Only available on Natural Language Framework
  44. NLModel A custom model trained to classify or tag natural

    language text. Create ML User Data NLModel NLTagger
  45. NLModel A custom model trained to classify or tag natural

    language text. WIP Waiting for sample from Apple Introducing Create ML WWDC18
  46. • Natural Language • Tokenization … NLTokenizer can separate text

    into tokens • Language Recognition … NLLanguageRecognizer can detect the language in text • Tagging … NLTagger can tag token of lexical class, name type and so on • NLTagger and NSLinguisticTagger have same behavior, but only NLTagger can join contractions into one token • Custom model … NLModel can be used for tagging by setting to tagger. Also, use model independently. Summary
  47. Introducing Natural Language WWDC18 Natural Language Processing and your apps

    WWDC17 Introducing Core ML WWDC17 Core ML in depth WWDC17 Introducing Create ML WWDC18 What’s new in Core ML, Part 1 WWDC18 iOS 11 Programming, ୈ3ষ PEAKS Resources
  48. WWDC18