Natural Language Framework

Natural Language Framework Daiki Matsudate, Freelancer #WWDC18 WWDCΰϦΰϦΩϟονΞοϓձ

Daiki Matsudate @d_date

Note • WWDC 2018Ͱެද͞Εͨbeta൛Ͱఏڙ͞Ε͍ͯΔAPIͷ৘ใͰߏ੒ͯ͠ ͍·͢ • ࠓޙͷbeta releaseͰAPIͷڍಈ͕มΘΔ͔΋͠Ε·ͤΜɻࢀর͢Δࡍ͸͝ ஫ҙ͍ͩ͘͞

NaturalLanguage.framework

Natural Language Intelligence Linguistics Machine Learning Language Identification Tokenization Part
of Speech Lemmatization Named Entity Recognition Word Sentence Paragraph Natural  Language Input NaturalLanguage.framework WWDC18 session 713 Introducing Natural Language

• For processing Natural Language • NLTokenizer • NLLanguageRecognizer •
NLTagger • NLModel NaturalLanguage.framework

NLTokenizer Tokenize text into word, sentence, paragraph or document νϟ΢μʔ৯΂͍ͨʂ

NLTokenizer Tokenize text into word νϟ΢μʔ৯΂͍ͨʂ νϟ΢μʔ ৯΂ ͍ͨ let
tokenizer = NLTokenizer(unit: .word) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) }

NLTokenizer Tokenize text into word let tokenizer = NLTokenizer(unit: .word)
tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) }

NLTokenizer Tokenize text into word let tokenizer = NLTokenizer(unit: .word)
tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } Initialize NLTokenizer with NLTokenUnit Set string to tokenizer Get ranges to tokens Subscript text with range

NLTokenizer Tokenize text into sentence let tokenizer = NLTokenizer(unit: .sentence)
tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) }

NLTokenizer Tokenize text into sentence νϟ΢μʔ৯΂͍ͨʂ νϟ΢μʔ৯΂͍ͨ let tokenizer =
NLTokenizer(unit: .sentence) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } ʂ

NLTokenizer Tokenize text into sentence ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻ ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻͰ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔ
ϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ͏ɻ let tokenizer = NLTokenizer(unit: .sentence) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻaO ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻ Ͱ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ ͏ɻ

NLTokenizer Tokenize text into paragraph ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻ ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻͰ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔ
ϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ͏ɻ let tokenizer = NLTokenizer(unit: .paragraph) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻaO ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻ Ͱ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ ͏ɻ

NLTokenizer Tokenize text into document ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻ ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻͰ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔ
ϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ͏ɻ let tokenizer = NLTokenizer(unit: .document) tokenizer.string = text tokenizer.tokens(for: text.startIndex..<text.endIndex) .forEach { (range) in print(text[range]) } ͋ͷΠʔϋτʔϰΥͷ͖͢ͱ͓ͬͨ෩ɺՆͰ΋ఈʹྫྷͨ͞Λ΋ͭ੨͍ͦΒɺ͏͍ͭ͘͠৿Ͱ০ΒΕͨϞϦʔΦࢢɺ߫֎ͷ͗Β͗ Βͻ͔Δ૲ͷ೾ɻaO ɹ·ͨͦͷͳ͔Ͱ͍ͬ͠ΐʹͳͬͨͨ͘͞ΜͷͻͱͨͪɺϑΝθʔϩͱϩβʔϩɺ༽ࣂͷϛʔϩ΍ɺإͷ੺͍͜Ͳ΋ͨͪɺ஍ओ ͷςʔϞɺࢁೣത࢜ͷϘʔΨϯτɾσετΡύʔΰͳͲɺ͍·͜ͷ҉͍ڊ͖ͳੴͷݐ෺ͷͳ͔Ͱߟ͍͑ͯΔͱɺΈΜͳΉ͔͠෩ ͷͳ͔͍ͭ͠੨͍ݬ౯ͷΑ͏ʹࢥΘΕ·͢ɻ Ͱ͸ɺΘͨ͘͠͸͍͔ͭͷখ͞ͳΈͩ͠Λ͚ͭͳ͕Βɺ͔ͣ͠ʹ͋ͷ೥ͷΠʔϋτʔϰΥͷޒ݄͔Βे݄·ͰΛॻ͖͚ͭ·͠ΐ ͏ɻ

NLLanguageRecognizer Automatically identify the language of text νϟ΢μʔ৯΂͍ͨʂ

νϟ΢μʔ৯΂͍ͨʂ NLLanguage.japanese let recognizer = NLLanguageRecognizer() recognizer.processString("νϟ΢μʔ৯΂͍ͨʂ") print(recognizer.dominantLanguage!) // ja
NLLanguageRecognizer Automatically identify the language of text

let recognizer = NLLanguageRecognizer() recognizer.processString("νϟ΢μʔ৯΂͍ͨʂ") print(recognizer.dominantLanguage!) // ja NLLanguageRecognizer Automatically
identify the language of text

let recognizer = NLLanguageRecognizer() recognizer.processString(“Lorem ipsum dolor sit amet”) recognizer.languageHints
= [.english: 0.3, .portuguese: 0.1] print(recognizer.dominantLanguage!) // en NLLanguageRecognizer Specify language hints with factor Specify language hints [NLLanguage : Double] Check dominant language NLLanguage.english

= [.french: 0.3, .english: 0.00001] print(recognizer.dominantLanguage!) // en NLLanguageRecognizer Specify language hints with factor Specify language hints [NLLanguage : Double] Check dominant language NLLanguage.french

= [.french: 0.3, .english: 0.1] print(recognizer.dominantLanguage!) // en NLLanguageRecognizer Specify language hints with factor Specify language probabilities [NLLanguage : Double] Check dominant language NLLanguage.english

let recognizer = NLLanguageRecognizer() recognizer.processString(“νϟ΢μʔ৯΂͍ͨʂ”) recognizer.languageHints = [.french: 0.3, .english:
0.1] print(recognizer.dominantLanguage!) // en NLLanguageRecognizer Specify language hints with factor Specify language probabilities [NLLanguage : Double] Check dominant language NLLanguage.japanese

let recognizer = NLLanguageRecognizer() recognizer.processString(“Lorem ipsum dolor sit amet”) let
hyphotheses = recognizer.languageHypotheses(withMaximum: 2) print(hyphotheses) NLLanguageRecognizer Hypotheses languages Hypotheses with maximum language count Hypotheses says probably this is English // [pt: 0.32105109095573425, en: 0.4647941291332245]

NLTagger Analyzes natural language text νϟ΢μʔ৯΂͍ͨʂ let text = "νϟ΢μʔ৯΂͍ͨʂ"
let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } νϟ΢μʔ ৯΂ ͍ͨ

NLTagger Analyzes natural language text νϟ΢μʔ৯΂͍ͨʂ let text = "νϟ΢μʔ৯΂͍ͨʂ"
let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } νϟ΢μʔ ৯΂ ͍ͨ OtherWord OtherWord OtherWord Japanese not available for tagging text for standard model

NLTagger Analyzes natural language text Chowder time let text =
“$IPXEFSUJNF” let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } Chowder Time Noun Whitespace Noun

NLTagger Analyzes natural language text let text = “Chowder time”
let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } Chowder time Noun Whitespace Noun

NLTagger Analyzes natural language text let text = “Chowder time”
let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } Pass tagSchemes what you need Set text to tagger.string Get tags for range, unit and scheme Chowder time Noun Whitespace Noun

NLTagger Analyzes natural language text - Lexical Class let text
= “I could not go WWDC in this year” let tagger = NLTagger(tagSchemes: [.lexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in Pronoun Noun could not go WWDC this year Verb Verb Noun Adverb Preposition Determinator

NLTagger Analyzes natural language text - Lemma let text =
“I could not go WWDC in this year” let tagger = NLTagger(tagSchemes: [.lemma]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in I year could not go WWDC this year can go WWDC not in this

NLTagger Analyzes natural language text - Name type let text
= “I could not go WWDC in this year” let tagger = NLTagger(tagSchemes: [.nametype]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in Otherword Otherword could not go WWDC this year Otherword Otherword OrganizationName Otherword Otherword Otherword

NLTagger Analyzes natural language text - Name type or lexicalClass
let text = “I could not go WWDC in this year” let tagger = NLTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in could not go WWDC this year Pronoun Noun Verb Verb OrganizationName Adverb Preposition Determinator

NSLinguisticTagger in iOS 11

NSLinguisticTagger Analyzes natural language text - Name type or lexicalClass
let text = “I could not go WWDC in this year” let tagger = NSLinguisticTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: NSRange(string: text)!, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in could not go WWDC this year Pronoun Noun Verb Verb OrganizationName Adverb Preposition Determinator

NSLinguisticTagger Analyzes natural language text - Name type or lexicalClass
let text = “I could not go WWDC in this year” let tagger = NSLinguisticTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: NSRange(string: text)!, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in could not go WWDC this year Pronoun Noun Verb Verb OrganizationName Adverb Preposition Determinator Same?

Compare NLTagger and NSLinguisticTagger func testNLTagIsEqualToNSLinguisticTag() { let tSchemes: [NLTagScheme]
= [.language, .lemma, .lexicalClass, .nameTypeOrLexicalClass, .nameType, .tokenType, .script] let lSchemes: [NSLinguisticTagScheme] = [.language, .lemma, .lexicalClass, .nameTypeOrLexicalClass, .nameType, .tokenType, .script] let tTagger = NLTagger(tagSchemes: tSchemes) let lTagger = NSLinguisticTagger(tagSchemes: lSchemes) zip(tSchemes, lSchemes).forEach { (tScheme, lScheme) in print("---- \(tScheme.rawValue) ----") XCTAssertEqual(tScheme.rawValue, lScheme.rawValue) let tags = tTagger.tags(text: text, unit: .word, scheme: tScheme, options: [.omitPunctuation, .omitWhitespace]) let lTags = lTagger.tags(text: text, unit: .word, scheme: lScheme, options: [.omitPunctuation, .omitWhitespace]) zip(tags, lTags).forEach({ (tTag, lTag) in print(tTag.0.rawValue, lTag.0.rawValue, tTag.0.rawValue == lTag.0.rawValue) XCTAssertEqual(tTag.0.rawValue, lTag.0.rawValue) XCTAssertEqual(text[tTag.1], text[Range(lTag.1, in: text)!]) let text = "I could not go WWDC in this year."

Compare NLTagger and NSLinguisticTagger token NLTagger NSLinguisticTagger I I I
ca nil “” n’t not not

Contractions recognition Actural “I / ca / n’t / go
/ WWDC / in / this / year." Ideal “I / can’t / go / WWDC / in / this / year."

Actural “I / can’t / go / WWDC / in
/ this / year." Ideal “I / can’t / go / WWDC / in / this / year." Contractions recognition .joiningContractions

NLTagger with joining contractions let text = “I could not
go WWDC in this year” let tagger = NLTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in could not go WWDC this year Pronoun Noun Verb Verb OrganizationName Adverb Preposition Determinator Only available on Natural Language Framework

NLTagger with joining contractions let text = “I could not
go WWDC in this year” let tagger = NLTagger(tagSchemes: [.nameTypeOrLexicalClass]) tagger.string = text tagger.tags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: [.joinContractions]) .forEach { (tag, range) in if let tag = tag { print(tag.rawValue, text[range]) } } I in couldn’t go WWDC this year Pronoun Noun Verb Verb OrganizationName Preposition Determinator Only available on Natural Language Framework

NLModel A custom model trained to classify or tag natural
language text. Create ML User Data NLModel NLTagger

NLModel A custom model trained to classify or tag natural
language text. WIP Waiting for sample from Apple Introducing Create ML WWDC18

• Natural Language • Tokenization … NLTokenizer can separate text
into tokens • Language Recognition … NLLanguageRecognizer can detect the language in text • Tagging … NLTagger can tag token of lexical class, name type and so on • NLTagger and NSLinguisticTagger have same behavior, but only NLTagger can join contractions into one token • Custom model … NLModel can be used for tagging by setting to tagger. Also, use model independently. Summary

Introducing Natural Language WWDC18 Natural Language Processing and your apps
WWDC17 Introducing Core ML WWDC17 Core ML in depth WWDC17 Introducing Create ML WWDC18 What’s new in Core ML, Part 1 WWDC18 iOS 11 Programming, ୈ3ষ PEAKS Resources

WWDC18

Natural Language Framework

Natural Language Framework

More Decks by d_date

Other Decks in Programming

Featured

Transcript