Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SQLiteで高速全文検索〜日本語編〜
Search
shoby
August 20, 2014
Technology
4.1k
3
Share
SQLiteで高速全文検索〜日本語編〜
【第8回】potatotips (iOS/Android開発Tips共有会)
shoby
August 20, 2014
More Decks by shoby
See All by shoby
フリルのCMを支える分析 (メディアバイイング編)
shoby
0
490
FirebaseとBigQueryによるアプリのKPI分析(CMの効果検証編)
shoby
2
2.7k
ユーザーに受け入れられ、問題を起こしづらい大規模リニューアルの進め方
shoby
60
14k
ReactiveCocoaで作る快適な登録フォーム
shoby
0
370
フリルの商品を色で検索できるようにした話
shoby
5
4.3k
Search fashion items by colors
shoby
0
200
iOS 7をサポート対象外にして開発を健全化する
shoby
3
4.1k
Apple Watch Tips
shoby
0
4.2k
WWDC 2015で 発表された新機能と サービスへの活かし方
shoby
2
7.4k
Other Decks in Technology
See All in Technology
Databricksで構築するログ検索基盤とアーキテクチャ設計
cscengineer
0
190
Rebirth of Software Craftsmanship in the AI Era
lemiorhan
PRO
2
390
シン・リスコフの置換原則 〜現代風に考えるSOLIDの原則〜
jinwatanabe
0
210
JEDAI in Osaka 2026イントロ
taka_aki
0
210
昔はシンプルだった_AmazonS3
kawaji_scratch
0
270
BigQuery × dbtでコスト削減した話
rightcode
0
140
AIを共同作業者にして書籍を執筆する方法 / How to Write a Book with AI as a Co-Creator
ama_ch
2
110
インフラを Excel 管理していた組織が 3 ヶ月で IaC 化されるまで
geekplus_tech
3
190
終盤で崩壊させないAI駆動開発
j5ik2o
2
2.1k
会社紹介資料 / Sansan Company Profile
sansan33
PRO
17
410k
AIペネトレーションテスト・ セキュリティ検証「AgenticSec」ご紹介資料
laysakura
0
2.2k
ふりかえりがなかった職能横断チームにふりかえりを導入してみて学んだこと 〜チームのふりかえりを「みんなで未来を考える場」にするプロローグ設計〜
masahiro1214shimokawa
0
410
Featured
See All Featured
Skip the Path - Find Your Career Trail
mkilby
1
100
Marketing to machines
jonoalderson
1
5.2k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
490
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
120
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
97
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
120
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
sira's awesome portfolio website redesign presentation
elsirapls
0
210
Accessibility Awareness
sabderemane
0
98
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
140
Transcript
42-JUFͰ ߴશจݕࡧ ʙຊޠฤʙ @shobyshoby
ࣗݾհ pixivͰΞϓϦͭͬͯ͘·͢
ຊ
iOSΞϓϦͰ SQLiteΛͬͯ ຊޠΛ ߴʹશจݕࡧ͍ͨ͠
֓ཁ
42-JUFͷશจݕࡧ֦ு ຊޠͷτʔΫϯׂ ݕࡧॱҐ σϞ
42-JUFͷ શจݕࡧ֦ு
SQLite FTS3 and FTS4 Extensions http://www.sqlite.org/fts3.html
iOS 6 SDK͔Β σϑΥϧτͰ͑Δ
جຊతͳ͍ํ
CREATE VIRTUAL TABLE article USING fts4 (title, body); FTS4 ςʔϒϧ
INSERT INSERT INTO article(title, body) VALUES ('Apple', 'I have iPhone4
and iPhone5s');
جຊతͳQuery SELECT * FROM article WHERE body MATCH 'iPhone5s'; term
match SELECT * FROM article WHERE body MATCH 'iPhone*'; term-prefix match
phrase match SELECT * FROM article WHERE body MATCH 'have
iPhone*'; ◦I have iPhone5s ×I have Android and iPhone5s
ಛघͳQuery SELECT * FROM article WHERE article MATCH 'iPhone*'; all
columns SELECT * FROM article WHERE article MATCH 'title:Apple iPhone*'; specified column
؆୯ʂ
※ͨͩ͠ӳޠʹݶΔ
ຊޠͷ߹ɺ FTSͷTokenizer͕ ͏·͘tokenʹ͚ΒΕͳ͍
FTSͷTokenizer͕ ͏·͘ѻ͑Δܗࣜʹ Ճͯ͠σʔλΛೖΕΑ͏ʂ
ຊޠͷ τʔΫϯׂ
FTSͷςʔϒϧʹ ֯εϖʔεͰ ׂͨ͠σʔλΛೖΕΔ
CREATE TABLE article (id, title, body); CREATE VIRTUAL TABLE articleTokens
USING fts4 (articleId, titleTokens, bodyTokens); ݩσʔλΛೖΕΔςʔϒϧ FTSςʔϒϧ
INSERT͍ͨ͠σʔλ INSERT INTO article(id, title, body) VALUES (1, 'Ξοϓϧ', 'ࢲΞΠϑΥʔϯ4ͱΞΠϑΥʔϯ5sΛ͍࣋ͬͯ·
͢'); ! INSERT INTO articleTokens(articleId, titleTokens, bodyTokens) VALUES (1, 'Ξοϓϧ', 'ࢲ ΞΠϑΥʔϯ4 ͱ ΞΠϑΥʔϯ5s Λ ࣋ͬ ͍ͯ·͢');
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId FROM
articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY article.id;
τʔΫϯׂ͢Δ
CFStringTokenizer
෦ͰMeCabΛͬͯ ܗଶૉղੳͯ͠ΔΒ͍͠ http://stackoverflow.com/questions/8280824/how-to-use- cfstringtokenizer-with-chinese-and-japanese
- (NSArray *)tokenArrayWithString:(NSString *)string { NSLocale *locale = [[NSLocale alloc]
initWithLocaleIdentifier:@"ja"]; CFRange range = CFRangeMake(0, CFStringGetLength((CFStringRef)string)); CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (CFStringRef)string, range, kCFStringTokenizerUnitWordBoundary, (CFLocaleRef)locale); NSMutableArray *tokenArray = [NSMutableArray array]; while(CFStringTokenizerAdvanceToNextToken(tokenizer) != kCFStringTokenizerTokenNone) { CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer); if(range.location != kCFNotFound) { NSString *token = [string substringWithRange:NSMakeRange(tokenRange.location, tokenRange.length)]; [tokenArray addObject:token]; } } CFRelease(tokenizer); return tokenArray; }
INSERT
tokenArrayΛ ֯εϖʔεͰ࿈݁ͯ͠ INSERT͢Δ [tokenArray componentsJoinedByString:@" "];
ݕࡧ
֤Tokenʹ * ΛՃ͠ ֯εϖʔεͰ࿈݁ͯ͠ QueryΛΈཱͯΔ NSMutableArray *searchTokens = [NSMutableArray arrayWithCapacity:tokenArray.count];
for (NSString *token in tokenArray) { [searchTokens addObject:[token stringByAppendingString:@"*"]]; } [searchTokens componentsJoinedByString:@" "];
ݕࡧॱҐ
Ϛον͕ߴ͍ॱʹ ݁ՌΛฦ͍ͨ͠
SQLite ಠࣗͷSQLؔΛ ఆٛͰ͖Δ
rank()ͱ͍͏ SQL͕ؔ ͋ͬͨͱ͢Δͱ…
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId, rank(matchinfo(articleTokens,
'pcnalx')) AS rank FROM articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY rank DESC;
match_info() ! ϚονใΛฦ͢ FTSͷؔ http://www.sqlite.org/fts3.html#matchinfo
match_info()͔Β ϚονείΞΛฦ͢ SQLؔΛ࡞Ζ͏ʂ
CݴޠͰ…
ϚονείΞͷ ΞϧΰϦζϜ
Okapi BM25 http://en.wikipedia.org/wiki/Okapi_BM25
sqlite-okapi-bm25 https://github.com/rads/sqlite-okapi-bm25
ंྠͷ࠶ൃ໌ ආ͚ΒΕͨ ※ҙ ୯Ұͷcolumnʹ͔͠ରԠ͍ͯ͠ͳ͍ͷͰ ඞཁͳΒFork͢Δ
σϞ
·ͱΊ
42-JUFͷ'54Λ͏ͱɺߴʹ શจݕࡧͰ͖Δ ຊޠࣄલʹ5PLFOׂ͕ඞཁ ݕࡧॱҐಠࣗ42-ؔΛ࡞ͬͯ ରԠ͢Δ
͓͠·͍