Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language Processing in Objective-C

Natural Language Processing in Objective-C

* Audio of this talk available here: http://soundcloud.com/mattt-thompson/objc-nlp

Apple has provided some truly remarkable language APIs in its frameworks. It's almost unfair how good they are, considering how most languages struggle just to handle Unicode correctly. From tokenizers and part-of-speech taggers, to transcription, data detectors, and document classification using latent semantic analysis; this session will cover the APIs as well as the linguistic theory behind them, so that you may leverage these insanely powerful technologies in your application.

Mattt Thompson

October 27, 2012
Tweet

More Decks by Mattt Thompson

Other Decks in Programming

Transcript

  1. There are two indicators that can tell you (with startling

    accuracy) how nice a language is to use:
  2. PHP

  3. NSMutableString *string = [@"I wîsh the Énġlišh långuãge hađ mørē

    iñteŕêßţing çharäčtèrş" mutableCopy]; NSLog(@"Before: %@", string); CFStringTransform( (__bridge CFMutableStringRef)string, NULL, kCFStringTransformStripCombiningMarks, NO); NSLog(@"After: %@", string);
  4. • đ - d with stroke • ø - o

    with stroke • ß - eszet
  5. NSMutableString *string = [@"" mutableCopy]; NSLog(@"Emoji: %@", string); CFStringTransform( (__bridge

    CFMutableStringRef)string, NULL, kCFStringTransformToUnicodeName, NO); NSLog(@"Unicode Name: %@", string);
  6. NSMutableString *string = [@"য়ࡇ ъթ झఋੌ" mutableCopy]; NSLog(@"Before: %@", string);

    CFStringTransform( (__bridge CFMutableStringRef)string, NULL, kCFStringTransformToLatin, NO); NSLog(@"After: %@", string);
  7. Transformation Input Output kCFStringTransformLatinArabic mrḥbạ !"#$% kCFStringTransformLatinCyrillic privet привет kCFStringTransformLatinGreek

    geiá sou γειά σου kCFStringTransformLatinHangul annyeonghaseyo উ֞ೞࣁਃ kCFStringTransformLatinHebrew şlwm םולש kCFStringTransformLatinHiragana hiragana ͻΒ͕ͳ kCFStringTransformLatinKatakana katakana ΧλΧφ kCFStringTransformLatinThai s̄wạs̄dī สวัสดี kCFStringTransformHiraganaKatakana ʹ΄Μ͝ χϗϯΰ kCFStringTransformMandarinLatin தจ zhōng wén
  8. NSString *string = @"೔ຊޠͷݴ༿Ͱ͢΂͕ͯҰॹʹ͍Δ"; NSMutableArray *mutableTokens = [NSMutableArray array]; CFStringTokenizerRef

    tokenizer = CFStringTokenizerCreate(NULL, (__bridge CFStringRef)(string), CFRangeMake(0, [string length]), kCFStringTokenizerUnitWord, CFLocaleCopyCurrent()); CFStringTokenizerTokenType tokenType = kCFStringTokenizerTokenNone; while((tokenType = CFStringTokenizerAdvanceToNextToken(tokenizer)) != kCFStringTokenizerTokenNone) { CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer); CFStringRef token = CFStringCreateWithSubstring(kCFAllocatorDefault, (__bridge CFStringRef)(string), tokenRange); [mutableTokens addObject:(__bridge NSString *)(token)]; } NSLog(@"Tokens: %@", mutableTokens);
  9. ೔ຊ, ޠ, ͷ, ݴ༿, Ͱ, ͢΂ͯ, ͕, Ұॹ, ʹ, ͍Δ

    ರЧ∽Ƒ࿽∋ƊżƜƉů ၂⇞ƎŧƮ
  10. NSLinguisticTagger • Tokenize • Part of Speech • Word Stem

    • Named Entity Recognition • Language & Script Detection
  11. NSString *question = @"How is the weather in Portland?"; NSLinguisticTaggerOptions

    options = NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerJoinNames; NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes: [NSLinguisticTagger availableTagSchemesForLanguage:@"en"] options:options]; tagger.string = question;
  12. [tagger enumerateTagsInRange: NSMakeRange(0, [question length]) scheme:NSLinguisticTagSchemeNameTypeOrLexicalClass options:options usingBlock:^(NSString *tag, NSRange

    tokenRange, NSRange sentenceRange, BOOL *stop) { NSString *token = [question substringWithRange:tokenRange]; NSLog(@"%@: %@", token, tag); }];
  13. • NSLinguisticTagNoun • NSLinguisticTagVerb • NSLinguisticTagAdjective • NSLinguisticTagAdverb • NSLinguisticTagPronoun

    • NSLinguisticTagDeterminer NSLinguisticTagSchemeLexicalClass -- Snip 20 Other Parts of Speech --
  14. NSString *string = @"Speak at CocoaConf at 7900 82nd Avenue

    Portland, OR 97220 starting 4:00 on October 27, 2012"; NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingAllSyst emTypes error:nil]; [detector enumerateMatchesInString:string options:0 range:NSMakeRange(0, [string length]) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) { NSLog(@"Result: %@", [string substringWithRange:result.range]); }];
  15. ???