Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
SQLiteで高速全文検索〜日本語編〜
Search
shoby
August 20, 2014
Technology
3
4k
SQLiteで高速全文検索〜日本語編〜
【第8回】potatotips (iOS/Android開発Tips共有会)
shoby
August 20, 2014
Tweet
Share
More Decks by shoby
See All by shoby
フリルのCMを支える分析 (メディアバイイング編)
shoby
0
460
FirebaseとBigQueryによるアプリのKPI分析(CMの効果検証編)
shoby
2
2.6k
ユーザーに受け入れられ、問題を起こしづらい大規模リニューアルの進め方
shoby
60
13k
ReactiveCocoaで作る快適な登録フォーム
shoby
0
350
フリルの商品を色で検索できるようにした話
shoby
5
4.3k
Search fashion items by colors
shoby
0
170
iOS 7をサポート対象外にして開発を健全化する
shoby
3
4k
Apple Watch Tips
shoby
0
4.1k
WWDC 2015で 発表された新機能と サービスへの活かし方
shoby
2
7.4k
Other Decks in Technology
See All in Technology
Eight Engineering Unit 紹介資料
sansan33
PRO
0
5.6k
信頼性が求められる業務のAIAgentのアーキテクチャ設計の勘所と課題
miyatakoji
0
190
How native lazy objects will change Doctrine and Symfony forever
beberlei
1
220
SRE視点で振り返るメルカリのアーキテクチャ変遷と普遍的な考え
foostan
2
3k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
45k
一億総業務改善を支える社内AIエージェント基盤の要諦
yukukotani
6
2.1k
オープンデータの内製化から分かったGISデータを巡る行政の課題
naokim84
1
440
メッセージ駆動が可能にする結合の最適化
j5ik2o
9
1.7k
Kill the Vibe?Architecture in the age of AI
stoth
1
120
確実に伝えるHealth通知 〜半自動システムでほどよく漏れなく / JAWS-UG 神戸 #9 神戸へようこそ!LT会
genda
0
160
翻訳・対話・越境で強いチームワークを作ろう! / Building Strong Teamwork through Interpretation, Dialogue, and Border-Crossing
ar_tama
1
550
ローカルVLM OCRモデル + Gemini 3.0 Proで日本語性能を試す
gotalab555
1
230
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
690
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.8k
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.6k
Making Projects Easy
brettharned
120
6.5k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
It's Worth the Effort
3n
187
29k
Balancing Empowerment & Direction
lara
5
770
Become a Pro
speakerdeck
PRO
30
5.6k
Transcript
42-JUFͰ ߴશจݕࡧ ʙຊޠฤʙ @shobyshoby
ࣗݾհ pixivͰΞϓϦͭͬͯ͘·͢
ຊ
iOSΞϓϦͰ SQLiteΛͬͯ ຊޠΛ ߴʹશจݕࡧ͍ͨ͠
֓ཁ
42-JUFͷશจݕࡧ֦ு ຊޠͷτʔΫϯׂ ݕࡧॱҐ σϞ
42-JUFͷ શจݕࡧ֦ு
SQLite FTS3 and FTS4 Extensions http://www.sqlite.org/fts3.html
iOS 6 SDK͔Β σϑΥϧτͰ͑Δ
جຊతͳ͍ํ
CREATE VIRTUAL TABLE article USING fts4 (title, body); FTS4 ςʔϒϧ
INSERT INSERT INTO article(title, body) VALUES ('Apple', 'I have iPhone4
and iPhone5s');
جຊతͳQuery SELECT * FROM article WHERE body MATCH 'iPhone5s'; term
match SELECT * FROM article WHERE body MATCH 'iPhone*'; term-prefix match
phrase match SELECT * FROM article WHERE body MATCH 'have
iPhone*'; ◦I have iPhone5s ×I have Android and iPhone5s
ಛघͳQuery SELECT * FROM article WHERE article MATCH 'iPhone*'; all
columns SELECT * FROM article WHERE article MATCH 'title:Apple iPhone*'; specified column
؆୯ʂ
※ͨͩ͠ӳޠʹݶΔ
ຊޠͷ߹ɺ FTSͷTokenizer͕ ͏·͘tokenʹ͚ΒΕͳ͍
FTSͷTokenizer͕ ͏·͘ѻ͑Δܗࣜʹ Ճͯ͠σʔλΛೖΕΑ͏ʂ
ຊޠͷ τʔΫϯׂ
FTSͷςʔϒϧʹ ֯εϖʔεͰ ׂͨ͠σʔλΛೖΕΔ
CREATE TABLE article (id, title, body); CREATE VIRTUAL TABLE articleTokens
USING fts4 (articleId, titleTokens, bodyTokens); ݩσʔλΛೖΕΔςʔϒϧ FTSςʔϒϧ
INSERT͍ͨ͠σʔλ INSERT INTO article(id, title, body) VALUES (1, 'Ξοϓϧ', 'ࢲΞΠϑΥʔϯ4ͱΞΠϑΥʔϯ5sΛ͍࣋ͬͯ·
͢'); ! INSERT INTO articleTokens(articleId, titleTokens, bodyTokens) VALUES (1, 'Ξοϓϧ', 'ࢲ ΞΠϑΥʔϯ4 ͱ ΞΠϑΥʔϯ5s Λ ࣋ͬ ͍ͯ·͢');
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId FROM
articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY article.id;
τʔΫϯׂ͢Δ
CFStringTokenizer
෦ͰMeCabΛͬͯ ܗଶૉղੳͯ͠ΔΒ͍͠ http://stackoverflow.com/questions/8280824/how-to-use- cfstringtokenizer-with-chinese-and-japanese
- (NSArray *)tokenArrayWithString:(NSString *)string { NSLocale *locale = [[NSLocale alloc]
initWithLocaleIdentifier:@"ja"]; CFRange range = CFRangeMake(0, CFStringGetLength((CFStringRef)string)); CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (CFStringRef)string, range, kCFStringTokenizerUnitWordBoundary, (CFLocaleRef)locale); NSMutableArray *tokenArray = [NSMutableArray array]; while(CFStringTokenizerAdvanceToNextToken(tokenizer) != kCFStringTokenizerTokenNone) { CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer); if(range.location != kCFNotFound) { NSString *token = [string substringWithRange:NSMakeRange(tokenRange.location, tokenRange.length)]; [tokenArray addObject:token]; } } CFRelease(tokenizer); return tokenArray; }
INSERT
tokenArrayΛ ֯εϖʔεͰ࿈݁ͯ͠ INSERT͢Δ [tokenArray componentsJoinedByString:@" "];
ݕࡧ
֤Tokenʹ * ΛՃ͠ ֯εϖʔεͰ࿈݁ͯ͠ QueryΛΈཱͯΔ NSMutableArray *searchTokens = [NSMutableArray arrayWithCapacity:tokenArray.count];
for (NSString *token in tokenArray) { [searchTokens addObject:[token stringByAppendingString:@"*"]]; } [searchTokens componentsJoinedByString:@" "];
ݕࡧॱҐ
Ϛον͕ߴ͍ॱʹ ݁ՌΛฦ͍ͨ͠
SQLite ಠࣗͷSQLؔΛ ఆٛͰ͖Δ
rank()ͱ͍͏ SQL͕ؔ ͋ͬͨͱ͢Δͱ…
͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId, rank(matchinfo(articleTokens,
'pcnalx')) AS rank FROM articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY rank DESC;
match_info() ! ϚονใΛฦ͢ FTSͷؔ http://www.sqlite.org/fts3.html#matchinfo
match_info()͔Β ϚονείΞΛฦ͢ SQLؔΛ࡞Ζ͏ʂ
CݴޠͰ…
ϚονείΞͷ ΞϧΰϦζϜ
Okapi BM25 http://en.wikipedia.org/wiki/Okapi_BM25
sqlite-okapi-bm25 https://github.com/rads/sqlite-okapi-bm25
ंྠͷ࠶ൃ໌ ආ͚ΒΕͨ ※ҙ ୯Ұͷcolumnʹ͔͠ରԠ͍ͯ͠ͳ͍ͷͰ ඞཁͳΒFork͢Δ
σϞ
·ͱΊ
42-JUFͷ'54Λ͏ͱɺߴʹ શจݕࡧͰ͖Δ ຊޠࣄલʹ5PLFOׂ͕ඞཁ ݕࡧॱҐಠࣗ42-ؔΛ࡞ͬͯ ରԠ͢Δ
͓͠·͍