Slide 1

Slide 1 text

42-JUFͰ ߴ଎શจݕࡧ ʙ೔ຊޠฤʙ @shobyshoby

Slide 2

Slide 2 text

ࣗݾ঺հ pixivͰΞϓϦͭͬͯ͘·͢

Slide 3

Slide 3 text

ຊ୊

Slide 4

Slide 4 text

iOSΞϓϦͰ SQLiteΛ࢖ͬͯ ೔ຊޠΛ ߴ଎ʹશจݕࡧ͍ͨ͠

Slide 5

Slide 5 text

֓ཁ

Slide 6

Slide 6 text

42-JUFͷશจݕࡧ֦ு ೔ຊޠͷτʔΫϯ෼ׂ ݕࡧॱҐ σϞ

Slide 7

Slide 7 text

42-JUFͷ શจݕࡧ֦ு

Slide 8

Slide 8 text

SQLite FTS3 and FTS4 Extensions http://www.sqlite.org/fts3.html

Slide 9

Slide 9 text

iOS 6 SDK͔Β σϑΥϧτͰ࢖͑Δ

Slide 10

Slide 10 text

جຊతͳ࢖͍ํ

Slide 11

Slide 11 text

CREATE VIRTUAL TABLE article USING fts4 (title, body); FTS4 ςʔϒϧ

Slide 12

Slide 12 text

INSERT INSERT INTO article(title, body) VALUES ('Apple', 'I have iPhone4 and iPhone5s');

Slide 13

Slide 13 text

جຊతͳQuery SELECT * FROM article WHERE body MATCH 'iPhone5s'; term match SELECT * FROM article WHERE body MATCH 'iPhone*'; term-prefix match

Slide 14

Slide 14 text

phrase match SELECT * FROM article WHERE body MATCH 'have iPhone*'; ○I have iPhone5s ×I have Android and iPhone5s

Slide 15

Slide 15 text

ಛघͳQuery SELECT * FROM article WHERE article MATCH 'iPhone*'; all columns SELECT * FROM article WHERE article MATCH 'title:Apple iPhone*'; specified column

Slide 16

Slide 16 text

؆୯ʂ

Slide 17

Slide 17 text

※ͨͩ͠ӳޠʹݶΔ

Slide 18

Slide 18 text

೔ຊޠͷ৔߹ɺ FTSͷTokenizer͕ ͏·͘tokenʹ෼͚ΒΕͳ͍

Slide 19

Slide 19 text

FTSͷTokenizer͕ ͏·͘ѻ͑Δܗࣜʹ Ճ޻ͯ͠σʔλΛೖΕΑ͏ʂ

Slide 20

Slide 20 text

೔ຊޠͷ τʔΫϯ෼ׂ

Slide 21

Slide 21 text

FTSͷςʔϒϧʹ͸ ൒֯εϖʔεͰ ෼ׂͨ͠σʔλΛೖΕΔ

Slide 22

Slide 22 text

CREATE TABLE article (id, title, body); CREATE VIRTUAL TABLE articleTokens USING fts4 (articleId, titleTokens, bodyTokens); ݩσʔλΛೖΕΔςʔϒϧ FTSςʔϒϧ

Slide 23

Slide 23 text

INSERT͍ͨ͠σʔλ INSERT INTO article(id, title, body) VALUES (1, 'Ξοϓϧ', 'ࢲ͸ΞΠϑΥʔϯ4ͱΞΠϑΥʔϯ5sΛ͍࣋ͬͯ· ͢'); ! INSERT INTO articleTokens(articleId, titleTokens, bodyTokens) VALUES (1, 'Ξοϓϧ', 'ࢲ͸ ΞΠϑΥʔϯ4 ͱ ΞΠϑΥʔϯ5s Λ ࣋ͬ ͍ͯ·͢');

Slide 24

Slide 24 text

౤͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId FROM articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY article.id;

Slide 25

Slide 25 text

τʔΫϯ෼ׂ͢Δ

Slide 26

Slide 26 text

CFStringTokenizer

Slide 27

Slide 27 text

಺෦ͰMeCabΛ࢖ͬͯ ܗଶૉղੳͯ͠ΔΒ͍͠ http://stackoverflow.com/questions/8280824/how-to-use- cfstringtokenizer-with-chinese-and-japanese

Slide 28

Slide 28 text

- (NSArray *)tokenArrayWithString:(NSString *)string { NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:@"ja"]; CFRange range = CFRangeMake(0, CFStringGetLength((CFStringRef)string)); CFStringTokenizerRef tokenizer = CFStringTokenizerCreate(kCFAllocatorDefault, (CFStringRef)string, range, kCFStringTokenizerUnitWordBoundary, (CFLocaleRef)locale); NSMutableArray *tokenArray = [NSMutableArray array]; while(CFStringTokenizerAdvanceToNextToken(tokenizer) != kCFStringTokenizerTokenNone) { CFRange tokenRange = CFStringTokenizerGetCurrentTokenRange(tokenizer); if(range.location != kCFNotFound) { NSString *token = [string substringWithRange:NSMakeRange(tokenRange.location, tokenRange.length)]; [tokenArray addObject:token]; } } CFRelease(tokenizer); return tokenArray; }

Slide 29

Slide 29 text

INSERT

Slide 30

Slide 30 text

tokenArrayΛ ൒֯εϖʔεͰ࿈݁ͯ͠ INSERT͢Δ [tokenArray componentsJoinedByString:@" "];

Slide 31

Slide 31 text

ݕࡧ

Slide 32

Slide 32 text

֤Tokenʹ * Λ௥Ճ͠ ൒֯εϖʔεͰ࿈݁ͯ͠ QueryΛ૊ΈཱͯΔ NSMutableArray *searchTokens = [NSMutableArray arrayWithCapacity:tokenArray.count]; for (NSString *token in tokenArray) { [searchTokens addObject:[token stringByAppendingString:@"*"]]; } [searchTokens componentsJoinedByString:@" "];

Slide 33

Slide 33 text

ݕࡧॱҐ

Slide 34

Slide 34 text

Ϛον౓͕ߴ͍ॱʹ ݁ՌΛฦ͍ͨ͠

Slide 35

Slide 35 text

SQLite͸ ಠࣗͷSQLؔ਺Λ ఆٛͰ͖Δ

Slide 36

Slide 36 text

rank()ͱ͍͏ SQLؔ਺͕ ͋ͬͨͱ͢Δͱ…

Slide 37

Slide 37 text

౤͍͛ͨSELECT SELECT * FROM article JOIN ( SELECT articleId, rank(matchinfo(articleTokens, 'pcnalx')) AS rank FROM articleTokens WHERE bodyTokens MATCH 'ΞΠϑΥʔϯ*' LIMIT 100; ) AS result ON article.id = result.articleId ORDER BY rank DESC;

Slide 38

Slide 38 text

match_info() ! Ϛον৘ใΛฦ͢
 FTSͷؔ਺ http://www.sqlite.org/fts3.html#matchinfo

Slide 39

Slide 39 text

match_info()͔Β ϚονείΞΛฦ͢ SQLؔ਺Λ࡞Ζ͏ʂ

Slide 40

Slide 40 text

CݴޠͰ…

Slide 41

Slide 41 text

ϚονείΞͷ ΞϧΰϦζϜ

Slide 42

Slide 42 text

Okapi BM25 http://en.wikipedia.org/wiki/Okapi_BM25

Slide 43

Slide 43 text

sqlite-okapi-bm25 https://github.com/rads/sqlite-okapi-bm25

Slide 44

Slide 44 text

ंྠͷ࠶ൃ໌͸ ආ͚ΒΕͨ ※஫ҙ ୯Ұͷcolumnʹ͔͠ରԠ͍ͯ͠ͳ͍ͷͰ ඞཁͳΒFork͢Δ

Slide 45

Slide 45 text

σϞ

Slide 46

Slide 46 text

·ͱΊ

Slide 47

Slide 47 text

42-JUFͷ'54Λ࢖͏ͱɺߴ଎ʹ શจݕࡧͰ͖Δ ೔ຊޠ͸ࣄલʹ5PLFO෼ׂ͕ඞཁ ݕࡧॱҐ͸ಠࣗ42-ؔ਺Λ࡞ͬͯ ରԠ͢Δ

Slide 48

Slide 48 text

͓͠·͍