SQLiteで高速全文検索〜日本語編〜

A003bc0b49c7e5adf746b6d73cb57956?s=47 shoby
August 20, 2014

 SQLiteで高速全文検索〜日本語編〜

【第8回】potatotips (iOS/Android開発Tips共有会)

A003bc0b49c7e5adf746b6d73cb57956?s=128

shoby

August 20, 2014
Tweet

Transcript

  1. 3.
  2. 5.
  3. 13.

    جຊతͳQuery SELECT * FROM article WHERE body MATCH 'iPhone5s'; term

    match SELECT * FROM article WHERE body MATCH 'iPhone*'; term-prefix match
  4. 14.

    phrase match SELECT * FROM article WHERE body MATCH 'have

    iPhone*'; ◦I have iPhone5s ×I have Android and iPhone5s
  5. 15.

    ಛघͳQuery SELECT * FROM article WHERE article MATCH 'iPhone*'; all

    columns SELECT * FROM article WHERE article MATCH 'title:Apple iPhone*'; specified column
  6. 16.
  7. 22.

    CREATE TABLE article (id, title, body); CREATE VIRTUAL TABLE articleTokens

    USING fts4 (articleId, titleTokens, bodyTokens); ݩσʔλΛೖΕΔςʔϒϧ FTSςʔϒϧ
  8. 23.

    INSERT͍ͨ͠σʔλ INSERT INTO article(id, title, body) VALUES (1, 'Ξοϓϧ', 'ࢲ͸ΞΠϑΥʔϯ4ͱΞΠϑΥʔϯ5sΛ͍࣋ͬͯ·

    ͢'); ! INSERT INTO articleTokens(articleId, titleTokens, bodyTokens) VALUES (1, 'Ξοϓϧ', 'ࢲ͸ ΞΠϑΥʔϯ4 ͱ ΞΠϑΥʔϯ5s Λ ࣋ͬ ͍ͯ·͢');
  9. 24.

    ౤͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId FROM

    articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY article.id;
  10. 28.

    - (NSArray *)tokenArrayWithString:(NSString *)string { NSLocale *locale = [[NSLocale alloc]

    initWithLocaleIdentifier:@"ja"]; CFRange range = CFRangeMake(0, CFStringGetLength((CFStringRef)string)); CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (CFStringRef)string, range, kCFStringTokenizerUnitWordBoundary, (CFLocaleRef)locale); NSMutableArray *tokenArray = [NSMutableArray array]; while(CFStringTokenizerAdvanceToNextToken(tokenizer) != kCFStringTokenizerTokenNone) { CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer); if(range.location != kCFNotFound) { NSString *token = [string substringWithRange:NSMakeRange(tokenRange.location, tokenRange.length)]; [tokenArray addObject:token]; } } CFRelease(tokenizer); return tokenArray; }
  11. 29.
  12. 31.
  13. 32.

    ֤Tokenʹ * Λ௥Ճ͠ ൒֯εϖʔεͰ࿈݁ͯ͠ QueryΛ૊ΈཱͯΔ NSMutableArray *searchTokens = [NSMutableArray arrayWithCapacity:tokenArray.count];

    for (NSString *token in tokenArray) { [searchTokens addObject:[token stringByAppendingString:@"*"]]; } [searchTokens componentsJoinedByString:@" "];
  14. 37.

    ౤͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId, rank(matchinfo(articleTokens,

    'pcnalx')) AS rank FROM articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY rank DESC;
  15. 45.
  16. 46.
  17. 48.