Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
FM-index による全文検索
Search
Sho Iizuka
February 02, 2015
Programming
53
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
FM-index による全文検索
https://kujira16.hateblo.jp/entry/2015/02/06/210630
Sho Iizuka
February 02, 2015
More Decks by Sho Iizuka
See All by Sho Iizuka
半年前の自分に教えたい systemd のハマりどころ
arosh
19
18k
Osaka.Stan#5 LT プログラミングコンテストのデータを分析した話
arosh
1
7.9k
簡潔データ構造輪講資料(順列)
arosh
1
9.3k
Pythonにおける日本語処理
arosh
1
2.2k
円と円の外接線の求め方
arosh
0
81
円と円の交点の求め方
arosh
0
48
Other Decks in Programming
See All in Programming
AIで効率化できた業務・日常
ochtum
0
140
Strategic Design in the Frontend: Moduliths & Micro Frontends @DDDEurope
manfredsteyer
PRO
0
130
AI時代のUIはどこへ行く?その2!
yusukebe
22
7.4k
jQueryをバージョンアップする前に使いたいjQuery Migrate
matsuo_atsushi
0
580
フロントエンドとバックエンドで「1文字」を揃えよう
youkidearitai
PRO
0
730
dRuby over BLE
makicamel
2
390
TSKaigi Night Talks 2026_TypeScriptでサプライチェーンの整合性を型に閉じ込める
geekplus_tech
0
400
作って学ぶ、 JSX (TSX) ランタイムの基本
syumai
7
1.7k
ローカルLLMでどこまでコードが書けるか -拡張版 / How much code can be written on a local LLM Extended
kishida
12
4.4k
Go1.27で導入されるジェネリクスメソッドでできること
mackee
0
170
鹿野さんに聞く!『TypeScriptコードレシピ集』で磨く実践力
tonkotsuboy_com
2
240
Spec Driven Development | AI Summit Lisbon
danielsogl
PRO
0
200
Featured
See All Featured
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
210
A better future with KSS
kneath
240
18k
Documentation Writing (for coders)
carmenintech
77
5.4k
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
1
1.8k
The SEO Collaboration Effect
kristinabergwall1
1
490
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
Into the Great Unknown - MozCon
thekraken
41
2.6k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
540
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
950
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
2
580
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.5k
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
2k
Transcript
FM-IndexʹΑΔશจݕࡧ ܭࢉػ࣮शE ࣗ༝՝
• จॻ͔ΒจࣈྻΛݕࡧ͢Δํ๏2௨ΓʹྨͰ͖Δ A. લॲཧ͕ෆཁͳํ๏ (ྗͤͳํ๏, KMP๏, BM๏) B. લॲཧ͕ඞཁͳํ๏ (సஔΠϯσοΫε,
ඌࣙྻ) • Bલॲཧͷ͕࣌ؒඞཁͳ͔ΘΓʹ, ಉ͡จॻ͔ΒԿճݕࡧ͢Δ߹AΑΓߴ • FM-IndexBʹྨ͞ΕΔํ๏Ͱ, จॻͷ͞ʹґଘ͠ͳ͍࣌ؒͰݕࡧͰ͖Δ
લॲཧ̍ɿඌࣙྻͷߏங จॻ mississippi mississippi$ ΤϯυϚʔΧ$ΛՃ mississippi$ ississippi$ ssissippi$ sissippi$ issippi$
ssippi$ sippi$ ippi$ ppi$ pi$ i$ $ ඌࣙͷྻڍ
લॲཧ̍ɿඌࣙྻͷߏங 0 mississippi$ 1 ississippi$ 2 ssissippi$ 3 sissippi$ 4
issippi$ 5 ssippi$ 6 sippi$ 7 ippi$ 8 ppi$ 9 pi$ 10 i$ 11 $ 11 $ 10 i$ 7 ippi$ 4 issippi$ 1 ississippi$ 0 mississippi$ 9 pi$ 8 ppi$ 6 sippi$ 3 sissippi$ 5 ssippi$ 2 ssissippi$ ࣙॻॱͰιʔτ͢Δ ※$ҙͷΞϧϑΝϕοτΑΓ ॱҐ͕খ͍͞ͱ͢Δ ඌࣙྻSA
લॲཧ̎ɿBWT (Burrows-Wheeler Transform) 11 $ 10 i$ 7 ippi$ 4
issippi$ 1 ississippi$ 0 mississippi$ 9 pi$ 8 ppi$ 6 sippi$ 3 sissippi$ 5 ssippi$ 2 ssissippi$ ݩͷจࣈྻʹ͓͚Δ ͻͱͭલͷจࣈʹ͢Δ i p s s m $ p i s s i i BWTจࣈྻT
ݕࡧॲཧ • BWTจࣈྻT = ipssm$pissii ʹ͍ͭͯ, ࣍ͷؔΛఆٛ͢Δ • Rank(c,p) :
T[0,p)ͷൣғͰ, ΞϧϑΝϕοτcͷग़ݱΛฦ͢ • RankLT(c) : TશମͰ, cΑΓॱҐ͕খ͍͞ ΞϧϑΝϕοτͷग़ݱΛฦ͢
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA 'i'+"ppi$"ͷ ඌࣙྻ্Ͱͷ ग़ݱҐஔʁ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA 'i'+"ppi$"ͷ ඌࣙྻ্Ͱͷ ग़ݱҐஔʁ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA 'i'+"ppi$"ͷ ඌࣙྻ্Ͱͷ ग़ݱҐஔʁ LF-mapping c=T[p] ʹଓ͘จࣈྻͷ SA্Ͱͷग़ݱҐஔ RankLT(c)+Rank(c,p)
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA "ssi"ͷݕࡧ [RankLT('i')+Rank('i', 0), RankLT('i')+Rank('i', 12)) 'i'Ͱ࢝·Δ จࣈྻ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA "ssi"ͷݕࡧ [RankLT('s')+Rank('s', 1), RankLT('s')+Rank('s', 5)) 's'+"i"Ͱ࢝·Δ จࣈྻ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA "ssi"ͷݕࡧ [RankLT('s')+Rank('s', 8), RankLT('s')+Rank('s', 10)) 's'+"si"Ͱ࢝·Δ จࣈྻ
ݕࡧॲཧ • FM-index, ݕࡧจࣈྻʹରԠ͢ΔҐஔͷߜΓࠐΈΛ LF-mappingͷ܁Γฦ͠ʹΑͬͯߦ͏ • LF-mapping Rank ͱ RankLT
Ͱߦ͑Δ • ͜ͷ2ͭͷॲཧ, ΣʔϒϨοτΣʔϒϨοτߦྻΛ͑ O(log σ) ࣌ؒͰՄೳ (σ ΞϧϑΝϕοτͷछྨ) • LF-mappingΛݕࡧจࣈྻQͷ͚ͩ͞܁Γฦ͢ͷͰ, Ұճͷݕࡧ͕O(m log σ) ࣌ؒͰՄೳ (m Q ͷจࣈ) • ݕࡧ͕࣌ؒจॻͷ͞ʹґଘ͠ͳ͍
੍࡞ • ੨ۭจݿͰਓؾ͕͋Δਤॻ500Λରͱͨ͠ Σϒϒϥβ͔Β͑ΔΠϯΫϦϝϯλϧݕࡧΛ੍࡞ • ඌࣙྻͷߏஙsais.hxx (ߴͳϥΠϒϥϦ) Λ༻ • ΣʔϒϨοτߦྻͱFM-IndexࣗͰ࣮
(C++), boost-pythonʹΑΓPython༻ͷ֦ுϞδϡʔϧʹม • Flask (Web App Framework@Python) ͔Βݺͼग़͢
͏·͍͔͘ͳ͔ͬͨͱ͜Ζ • ͍͋·͍ݕࡧΛ࣮͠Α͏ͱͯ͠จݙΛ୳ͯ͠Έͨ → ฤूڑʹରͯ͠ࢦ͔͔࣌ؒΔΒ͍͠… • ࡞ͨ͠ࡧҾΛϑΝΠϧ͔ΒಡΈࠐΉॲཧͰ, طଘͷϥΠϒϥϦΛͬͨΒ༻ϝϞϦͷྔ͕രൃ (ݪҼෆ໌)
·ͱΊ • ߴͳจࣈྻݕࡧͷΞϧΰϦζϜΛ࣮ͯ͠Έͨ • ϒϥβ͔Β͑ΔΑ͏ʹͯ͠Έͨ ! • ࢀߟจݙ • Ԭݪ
େี. ߴจࣈྻղੳͷੈք. ؠॻళ. 2012.
(ิ) ΣʔϒϨοτ 3101212213 1000101101 10111 32223 10111 10001 ԼҐ2Ϗοτ →
ԼҐ1Ϗοτ → 0 1111 222 33 0 1 0 1 0 1