Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
FM-index による全文検索
Search
Sho Iizuka
February 02, 2015
Programming
0
20
FM-index による全文検索
https://kujira16.hateblo.jp/entry/2015/02/06/210630
Sho Iizuka
February 02, 2015
Tweet
Share
More Decks by Sho Iizuka
See All by Sho Iizuka
半年前の自分に教えたい systemd のハマりどころ
arosh
18
14k
Osaka.Stan#5 LT プログラミングコンテストのデータを分析した話
arosh
1
5.9k
簡潔データ構造輪講資料(順列)
arosh
1
7k
Pythonにおける日本語処理
arosh
1
2k
円と円の外接線の求め方
arosh
0
25
円と円の交点の求め方
arosh
0
34
Other Decks in Programming
See All in Programming
Cloudflare Workers x AWS Lambdaの組み合わせユースケース / Cloudflare Workers x AWS Lambda Combination Use Case
seike460
PRO
2
310
Temporalを取り巻く仕様を整理する
sajikix
0
120
feature環境をGitHub ActionsとCloudFormationでいい感じに管理する
nealle
2
310
英語
s_shimotori
1
220
유연한 Composable 설계
l2hyunwoo
0
380
HMSコンペ 11th Solution (team : kansai-kaggler)
t88
1
680
入社1ヶ月でここまでやった!Findy Toolsインフラ支援の最適化
rvirus0817
6
1.4k
CSC307 Lecture 14
javiergs
PRO
0
220
Architectures with Lightweight Stores: New Rules and Options
manfredsteyer
PRO
0
100
Google's Recipe for Scaling (Web) Security – LocoMocoSec 2024
lweichselbaum
0
170
AHC035解説
terryu16
0
730
今こそ始める、CDKコンストラクトライブラリ開発 ― 入門から実践まで
tmokmss
1
930
Featured
See All Featured
In The Pink: A Labor of Love
frogandcode
139
22k
Done Done
chrislema
179
15k
Principles of Awesome APIs and How to Build Them.
keavy
124
16k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
155
14k
Large-scale JavaScript Application Architecture
addyosmani
506
110k
Visualization
eitanlees
139
14k
Product Roadmaps are Hard
iamctodd
PRO
48
10k
Making the Leap to Tech Lead
cromwellryan
127
8.7k
Design by the Numbers
sachag
277
18k
Six Lessons from altMBA
skipperchong
24
3.2k
Web Components: a chance to create the future
zenorocha
307
41k
Designing Experiences People Love
moore
136
23k
Transcript
FM-IndexʹΑΔશจݕࡧ ܭࢉػ࣮शE ࣗ༝՝
• จॻ͔ΒจࣈྻΛݕࡧ͢Δํ๏2௨ΓʹྨͰ͖Δ A. લॲཧ͕ෆཁͳํ๏ (ྗͤͳํ๏, KMP๏, BM๏) B. લॲཧ͕ඞཁͳํ๏ (సஔΠϯσοΫε,
ඌࣙྻ) • Bલॲཧͷ͕࣌ؒඞཁͳ͔ΘΓʹ, ಉ͡จॻ͔ΒԿճݕࡧ͢Δ߹AΑΓߴ • FM-IndexBʹྨ͞ΕΔํ๏Ͱ, จॻͷ͞ʹґଘ͠ͳ͍࣌ؒͰݕࡧͰ͖Δ
લॲཧ̍ɿඌࣙྻͷߏங จॻ mississippi mississippi$ ΤϯυϚʔΧ$ΛՃ mississippi$ ississippi$ ssissippi$ sissippi$ issippi$
ssippi$ sippi$ ippi$ ppi$ pi$ i$ $ ඌࣙͷྻڍ
લॲཧ̍ɿඌࣙྻͷߏங 0 mississippi$ 1 ississippi$ 2 ssissippi$ 3 sissippi$ 4
issippi$ 5 ssippi$ 6 sippi$ 7 ippi$ 8 ppi$ 9 pi$ 10 i$ 11 $ 11 $ 10 i$ 7 ippi$ 4 issippi$ 1 ississippi$ 0 mississippi$ 9 pi$ 8 ppi$ 6 sippi$ 3 sissippi$ 5 ssippi$ 2 ssissippi$ ࣙॻॱͰιʔτ͢Δ ※$ҙͷΞϧϑΝϕοτΑΓ ॱҐ͕খ͍͞ͱ͢Δ ඌࣙྻSA
લॲཧ̎ɿBWT (Burrows-Wheeler Transform) 11 $ 10 i$ 7 ippi$ 4
issippi$ 1 ississippi$ 0 mississippi$ 9 pi$ 8 ppi$ 6 sippi$ 3 sissippi$ 5 ssippi$ 2 ssissippi$ ݩͷจࣈྻʹ͓͚Δ ͻͱͭલͷจࣈʹ͢Δ i p s s m $ p i s s i i BWTจࣈྻT
ݕࡧॲཧ • BWTจࣈྻT = ipssm$pissii ʹ͍ͭͯ, ࣍ͷؔΛఆٛ͢Δ • Rank(c,p) :
T[0,p)ͷൣғͰ, ΞϧϑΝϕοτcͷग़ݱΛฦ͢ • RankLT(c) : TશମͰ, cΑΓॱҐ͕খ͍͞ ΞϧϑΝϕοτͷग़ݱΛฦ͢
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA 'i'+"ppi$"ͷ ඌࣙྻ্Ͱͷ ग़ݱҐஔʁ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA 'i'+"ppi$"ͷ ඌࣙྻ্Ͱͷ ग़ݱҐஔʁ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA 'i'+"ppi$"ͷ ඌࣙྻ্Ͱͷ ग़ݱҐஔʁ LF-mapping c=T[p] ʹଓ͘จࣈྻͷ SA্Ͱͷग़ݱҐஔ RankLT(c)+Rank(c,p)
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA "ssi"ͷݕࡧ [RankLT('i')+Rank('i', 0), RankLT('i')+Rank('i', 12)) 'i'Ͱ࢝·Δ จࣈྻ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA "ssi"ͷݕࡧ [RankLT('s')+Rank('s', 1), RankLT('s')+Rank('s', 5)) 's'+"i"Ͱ࢝·Δ จࣈྻ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA "ssi"ͷݕࡧ [RankLT('s')+Rank('s', 8), RankLT('s')+Rank('s', 10)) 's'+"si"Ͱ࢝·Δ จࣈྻ
ݕࡧॲཧ • FM-index, ݕࡧจࣈྻʹରԠ͢ΔҐஔͷߜΓࠐΈΛ LF-mappingͷ܁Γฦ͠ʹΑͬͯߦ͏ • LF-mapping Rank ͱ RankLT
Ͱߦ͑Δ • ͜ͷ2ͭͷॲཧ, ΣʔϒϨοτΣʔϒϨοτߦྻΛ͑ O(log σ) ࣌ؒͰՄೳ (σ ΞϧϑΝϕοτͷछྨ) • LF-mappingΛݕࡧจࣈྻQͷ͚ͩ͞܁Γฦ͢ͷͰ, Ұճͷݕࡧ͕O(m log σ) ࣌ؒͰՄೳ (m Q ͷจࣈ) • ݕࡧ͕࣌ؒจॻͷ͞ʹґଘ͠ͳ͍
੍࡞ • ੨ۭจݿͰਓؾ͕͋Δਤॻ500Λରͱͨ͠ Σϒϒϥβ͔Β͑ΔΠϯΫϦϝϯλϧݕࡧΛ੍࡞ • ඌࣙྻͷߏஙsais.hxx (ߴͳϥΠϒϥϦ) Λ༻ • ΣʔϒϨοτߦྻͱFM-IndexࣗͰ࣮
(C++), boost-pythonʹΑΓPython༻ͷ֦ுϞδϡʔϧʹม • Flask (Web App Framework@Python) ͔Βݺͼग़͢
͏·͍͔͘ͳ͔ͬͨͱ͜Ζ • ͍͋·͍ݕࡧΛ࣮͠Α͏ͱͯ͠จݙΛ୳ͯ͠Έͨ → ฤूڑʹରͯ͠ࢦ͔͔࣌ؒΔΒ͍͠… • ࡞ͨ͠ࡧҾΛϑΝΠϧ͔ΒಡΈࠐΉॲཧͰ, طଘͷϥΠϒϥϦΛͬͨΒ༻ϝϞϦͷྔ͕രൃ (ݪҼෆ໌)
·ͱΊ • ߴͳจࣈྻݕࡧͷΞϧΰϦζϜΛ࣮ͯ͠Έͨ • ϒϥβ͔Β͑ΔΑ͏ʹͯ͠Έͨ ! • ࢀߟจݙ • Ԭݪ
େี. ߴจࣈྻղੳͷੈք. ؠॻళ. 2012.
(ิ) ΣʔϒϨοτ 3101212213 1000101101 10111 32223 10111 10001 ԼҐ2Ϗοτ →
ԼҐ1Ϗοτ → 0 1111 222 33 0 1 0 1 0 1