Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SQLiteで高速全文検索〜日本語編〜
Search
shoby
August 20, 2014
Technology
3
4.1k
SQLiteで高速全文検索〜日本語編〜
【第8回】potatotips (iOS/Android開発Tips共有会)
shoby
August 20, 2014
Tweet
Share
More Decks by shoby
See All by shoby
フリルのCMを支える分析 (メディアバイイング編)
shoby
0
480
FirebaseとBigQueryによるアプリのKPI分析(CMの効果検証編)
shoby
2
2.7k
ユーザーに受け入れられ、問題を起こしづらい大規模リニューアルの進め方
shoby
60
14k
ReactiveCocoaで作る快適な登録フォーム
shoby
0
370
フリルの商品を色で検索できるようにした話
shoby
5
4.3k
Search fashion items by colors
shoby
0
200
iOS 7をサポート対象外にして開発を健全化する
shoby
3
4.1k
Apple Watch Tips
shoby
0
4.2k
WWDC 2015で 発表された新機能と サービスへの活かし方
shoby
2
7.4k
Other Decks in Technology
See All in Technology
[JAWS DAYS 2026]私の AWS DevOps Agent 推しポイント
furuton
0
120
男(監査)はつらいよ - Policy as CodeからAIエージェントへ
ken5scal
5
780
Claude Codeの進化と各機能の活かし方
oikon48
20
10k
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
11k
Dr. Werner Vogelsの14年のキーノートから紐解くエンジニアリング組織への処方箋@JAWS DAYS 2026
p0n
1
120
オレ達はAWS管理をやりたいんじゃない!開発の生産性を爆アゲしたいんだ!!
wkm2
4
430
マルチプレーンGPUネットワークを実現するシャッフルアーキテクチャの整理と考察
markunet
2
180
開発組織の課題解決を加速するための権限委譲 -する側、される側としての向き合い方-
daitasu
5
320
OpenClawで回す組織運営
jacopen
3
640
[AEON TECH HUB #24] お客様の長期的興味の理解に向けて
alpicola
0
120
Security Diaries of an Open Source IAM
ahus1
0
210
製造業ドメインにおける LLMプロダクト構築: 複雑な文脈へのアプローチ
caddi_eng
1
530
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
The SEO identity crisis: Don't let AI make you average
varn
0
400
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Building Adaptive Systems
keathley
44
2.9k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Typedesign – Prime Four
hannesfritz
42
3k
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
0
230
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
0
2.4k
Joys of Absence: A Defence of Solitary Play
codingconduct
1
300
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.4k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.1k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Transcript
42-JUFͰ ߴશจݕࡧ ʙຊޠฤʙ @shobyshoby
ࣗݾհ pixivͰΞϓϦͭͬͯ͘·͢
ຊ
iOSΞϓϦͰ SQLiteΛͬͯ ຊޠΛ ߴʹશจݕࡧ͍ͨ͠
֓ཁ
42-JUFͷશจݕࡧ֦ு ຊޠͷτʔΫϯׂ ݕࡧॱҐ σϞ
42-JUFͷ શจݕࡧ֦ு
SQLite FTS3 and FTS4 Extensions http://www.sqlite.org/fts3.html
iOS 6 SDK͔Β σϑΥϧτͰ͑Δ
جຊతͳ͍ํ
CREATE VIRTUAL TABLE article USING fts4 (title, body); FTS4 ςʔϒϧ
INSERT INSERT INTO article(title, body) VALUES ('Apple', 'I have iPhone4
and iPhone5s');
جຊతͳQuery SELECT * FROM article WHERE body MATCH 'iPhone5s'; term
match SELECT * FROM article WHERE body MATCH 'iPhone*'; term-prefix match
phrase match SELECT * FROM article WHERE body MATCH 'have
iPhone*'; ◦I have iPhone5s ×I have Android and iPhone5s
ಛघͳQuery SELECT * FROM article WHERE article MATCH 'iPhone*'; all
columns SELECT * FROM article WHERE article MATCH 'title:Apple iPhone*'; specified column
؆୯ʂ
※ͨͩ͠ӳޠʹݶΔ
ຊޠͷ߹ɺ FTSͷTokenizer͕ ͏·͘tokenʹ͚ΒΕͳ͍
FTSͷTokenizer͕ ͏·͘ѻ͑Δܗࣜʹ Ճͯ͠σʔλΛೖΕΑ͏ʂ
ຊޠͷ τʔΫϯׂ
FTSͷςʔϒϧʹ ֯εϖʔεͰ ׂͨ͠σʔλΛೖΕΔ
CREATE TABLE article (id, title, body); CREATE VIRTUAL TABLE articleTokens
USING fts4 (articleId, titleTokens, bodyTokens); ݩσʔλΛೖΕΔςʔϒϧ FTSςʔϒϧ
INSERT͍ͨ͠σʔλ INSERT INTO article(id, title, body) VALUES (1, 'Ξοϓϧ', 'ࢲΞΠϑΥʔϯ4ͱΞΠϑΥʔϯ5sΛ͍࣋ͬͯ·
͢'); ! INSERT INTO articleTokens(articleId, titleTokens, bodyTokens) VALUES (1, 'Ξοϓϧ', 'ࢲ ΞΠϑΥʔϯ4 ͱ ΞΠϑΥʔϯ5s Λ ࣋ͬ ͍ͯ·͢');
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId FROM
articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY article.id;
τʔΫϯׂ͢Δ
CFStringTokenizer
෦ͰMeCabΛͬͯ ܗଶૉղੳͯ͠ΔΒ͍͠ http://stackoverflow.com/questions/8280824/how-to-use- cfstringtokenizer-with-chinese-and-japanese
- (NSArray *)tokenArrayWithString:(NSString *)string { NSLocale *locale = [[NSLocale alloc]
initWithLocaleIdentifier:@"ja"]; CFRange range = CFRangeMake(0, CFStringGetLength((CFStringRef)string)); CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (CFStringRef)string, range, kCFStringTokenizerUnitWordBoundary, (CFLocaleRef)locale); NSMutableArray *tokenArray = [NSMutableArray array]; while(CFStringTokenizerAdvanceToNextToken(tokenizer) != kCFStringTokenizerTokenNone) { CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer); if(range.location != kCFNotFound) { NSString *token = [string substringWithRange:NSMakeRange(tokenRange.location, tokenRange.length)]; [tokenArray addObject:token]; } } CFRelease(tokenizer); return tokenArray; }
INSERT
tokenArrayΛ ֯εϖʔεͰ࿈݁ͯ͠ INSERT͢Δ [tokenArray componentsJoinedByString:@" "];
ݕࡧ
֤Tokenʹ * ΛՃ͠ ֯εϖʔεͰ࿈݁ͯ͠ QueryΛΈཱͯΔ NSMutableArray *searchTokens = [NSMutableArray arrayWithCapacity:tokenArray.count];
for (NSString *token in tokenArray) { [searchTokens addObject:[token stringByAppendingString:@"*"]]; } [searchTokens componentsJoinedByString:@" "];
ݕࡧॱҐ
Ϛον͕ߴ͍ॱʹ ݁ՌΛฦ͍ͨ͠
SQLite ಠࣗͷSQLؔΛ ఆٛͰ͖Δ
rank()ͱ͍͏ SQL͕ؔ ͋ͬͨͱ͢Δͱ…
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId, rank(matchinfo(articleTokens,
'pcnalx')) AS rank FROM articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY rank DESC;
match_info() ! ϚονใΛฦ͢ FTSͷؔ http://www.sqlite.org/fts3.html#matchinfo
match_info()͔Β ϚονείΞΛฦ͢ SQLؔΛ࡞Ζ͏ʂ
CݴޠͰ…
ϚονείΞͷ ΞϧΰϦζϜ
Okapi BM25 http://en.wikipedia.org/wiki/Okapi_BM25
sqlite-okapi-bm25 https://github.com/rads/sqlite-okapi-bm25
ंྠͷ࠶ൃ໌ ආ͚ΒΕͨ ※ҙ ୯Ұͷcolumnʹ͔͠ରԠ͍ͯ͠ͳ͍ͷͰ ඞཁͳΒFork͢Δ
σϞ
·ͱΊ
42-JUFͷ'54Λ͏ͱɺߴʹ શจݕࡧͰ͖Δ ຊޠࣄલʹ5PLFOׂ͕ඞཁ ݕࡧॱҐಠࣗ42-ؔΛ࡞ͬͯ ରԠ͢Δ
͓͠·͍