Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
FM-index による全文検索
Search
Sho Iizuka
February 02, 2015
Programming
0
46
FM-index による全文検索
https://kujira16.hateblo.jp/entry/2015/02/06/210630
Sho Iizuka
February 02, 2015
Tweet
Share
More Decks by Sho Iizuka
See All by Sho Iizuka
半年前の自分に教えたい systemd のハマりどころ
arosh
19
17k
Osaka.Stan#5 LT プログラミングコンテストのデータを分析した話
arosh
1
7.4k
簡潔データ構造輪講資料(順列)
arosh
1
8.7k
Pythonにおける日本語処理
arosh
1
2.2k
円と円の外接線の求め方
arosh
0
72
円と円の交点の求め方
arosh
0
44
Other Decks in Programming
See All in Programming
Patterns of Patterns
denyspoltorak
0
1.3k
Fluid Templating in TYPO3 14
s2b
0
120
AgentCoreとHuman in the Loop
har1101
5
200
React 19でつくる「気持ちいいUI」- 楽観的UIのすすめ
himorishige
11
5.8k
AIで開発はどれくらい加速したのか?AIエージェントによるコード生成を、現場の評価と研究開発の評価の両面からdeep diveしてみる
daisuketakeda
1
930
Vibe Coding - AI 驅動的軟體開發
mickyp100
0
160
Denoのセキュリティに関する仕組みの紹介 (toranoana.deno #23)
uki00a
0
270
AIによるイベントストーミング図からのコード生成 / AI-powered code generation from Event Storming diagrams
nrslib
2
1.7k
Unicodeどうしてる? PHPから見たUnicode対応と他言語での対応についてのお伺い
youkidearitai
PRO
0
1k
TerraformとStrands AgentsでAmazon Bedrock AgentCoreのSSO認証付きエージェントを量産しよう!
neruneruo
4
2.6k
SourceGeneratorのススメ
htkym
0
160
インターン生でもAuth0で認証基盤刷新が出来るのか
taku271
0
190
Featured
See All Featured
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
85
The Curse of the Amulet
leimatthew05
1
7.8k
The Cult of Friendly URLs
andyhume
79
6.8k
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
59
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
0
300
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.4k
30 Presentation Tips
portentint
PRO
1
190
What does AI have to do with Human Rights?
axbom
PRO
0
2k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Designing Experiences People Love
moore
144
24k
Getting science done with accelerated Python computing platforms
jacobtomlinson
1
100
Un-Boring Meetings
codingconduct
0
200
Transcript
FM-IndexʹΑΔશจݕࡧ ܭࢉػ࣮शE ࣗ༝՝
• จॻ͔ΒจࣈྻΛݕࡧ͢Δํ๏2௨ΓʹྨͰ͖Δ A. લॲཧ͕ෆཁͳํ๏ (ྗͤͳํ๏, KMP๏, BM๏) B. લॲཧ͕ඞཁͳํ๏ (సஔΠϯσοΫε,
ඌࣙྻ) • Bલॲཧͷ͕࣌ؒඞཁͳ͔ΘΓʹ, ಉ͡จॻ͔ΒԿճݕࡧ͢Δ߹AΑΓߴ • FM-IndexBʹྨ͞ΕΔํ๏Ͱ, จॻͷ͞ʹґଘ͠ͳ͍࣌ؒͰݕࡧͰ͖Δ
લॲཧ̍ɿඌࣙྻͷߏங จॻ mississippi mississippi$ ΤϯυϚʔΧ$ΛՃ mississippi$ ississippi$ ssissippi$ sissippi$ issippi$
ssippi$ sippi$ ippi$ ppi$ pi$ i$ $ ඌࣙͷྻڍ
લॲཧ̍ɿඌࣙྻͷߏங 0 mississippi$ 1 ississippi$ 2 ssissippi$ 3 sissippi$ 4
issippi$ 5 ssippi$ 6 sippi$ 7 ippi$ 8 ppi$ 9 pi$ 10 i$ 11 $ 11 $ 10 i$ 7 ippi$ 4 issippi$ 1 ississippi$ 0 mississippi$ 9 pi$ 8 ppi$ 6 sippi$ 3 sissippi$ 5 ssippi$ 2 ssissippi$ ࣙॻॱͰιʔτ͢Δ ※$ҙͷΞϧϑΝϕοτΑΓ ॱҐ͕খ͍͞ͱ͢Δ ඌࣙྻSA
લॲཧ̎ɿBWT (Burrows-Wheeler Transform) 11 $ 10 i$ 7 ippi$ 4
issippi$ 1 ississippi$ 0 mississippi$ 9 pi$ 8 ppi$ 6 sippi$ 3 sissippi$ 5 ssippi$ 2 ssissippi$ ݩͷจࣈྻʹ͓͚Δ ͻͱͭલͷจࣈʹ͢Δ i p s s m $ p i s s i i BWTจࣈྻT
ݕࡧॲཧ • BWTจࣈྻT = ipssm$pissii ʹ͍ͭͯ, ࣍ͷؔΛఆٛ͢Δ • Rank(c,p) :
T[0,p)ͷൣғͰ, ΞϧϑΝϕοτcͷग़ݱΛฦ͢ • RankLT(c) : TશମͰ, cΑΓॱҐ͕খ͍͞ ΞϧϑΝϕοτͷग़ݱΛฦ͢
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA 'i'+"ppi$"ͷ ඌࣙྻ্Ͱͷ ग़ݱҐஔʁ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA 'i'+"ppi$"ͷ ඌࣙྻ্Ͱͷ ग़ݱҐஔʁ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA 'i'+"ppi$"ͷ ඌࣙྻ্Ͱͷ ग़ݱҐஔʁ LF-mapping c=T[p] ʹଓ͘จࣈྻͷ SA্Ͱͷग़ݱҐஔ RankLT(c)+Rank(c,p)
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA "ssi"ͷݕࡧ [RankLT('i')+Rank('i', 0), RankLT('i')+Rank('i', 12)) 'i'Ͱ࢝·Δ จࣈྻ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA "ssi"ͷݕࡧ [RankLT('s')+Rank('s', 1), RankLT('s')+Rank('s', 5)) 's'+"i"Ͱ࢝·Δ จࣈྻ
ݕࡧॲཧ $ i$ ippi$ issippi$ ississippi$ mississippi$ pi$ ppi$ sippi$
sissippi$ ssippi$ ssissippi$ i p s s m $ p i s s i i BWTจࣈྻT ඌࣙྻSA "ssi"ͷݕࡧ [RankLT('s')+Rank('s', 8), RankLT('s')+Rank('s', 10)) 's'+"si"Ͱ࢝·Δ จࣈྻ
ݕࡧॲཧ • FM-index, ݕࡧจࣈྻʹରԠ͢ΔҐஔͷߜΓࠐΈΛ LF-mappingͷ܁Γฦ͠ʹΑͬͯߦ͏ • LF-mapping Rank ͱ RankLT
Ͱߦ͑Δ • ͜ͷ2ͭͷॲཧ, ΣʔϒϨοτΣʔϒϨοτߦྻΛ͑ O(log σ) ࣌ؒͰՄೳ (σ ΞϧϑΝϕοτͷछྨ) • LF-mappingΛݕࡧจࣈྻQͷ͚ͩ͞܁Γฦ͢ͷͰ, Ұճͷݕࡧ͕O(m log σ) ࣌ؒͰՄೳ (m Q ͷจࣈ) • ݕࡧ͕࣌ؒจॻͷ͞ʹґଘ͠ͳ͍
੍࡞ • ੨ۭจݿͰਓؾ͕͋Δਤॻ500Λରͱͨ͠ Σϒϒϥβ͔Β͑ΔΠϯΫϦϝϯλϧݕࡧΛ੍࡞ • ඌࣙྻͷߏஙsais.hxx (ߴͳϥΠϒϥϦ) Λ༻ • ΣʔϒϨοτߦྻͱFM-IndexࣗͰ࣮
(C++), boost-pythonʹΑΓPython༻ͷ֦ுϞδϡʔϧʹม • Flask (Web App Framework@Python) ͔Βݺͼग़͢
͏·͍͔͘ͳ͔ͬͨͱ͜Ζ • ͍͋·͍ݕࡧΛ࣮͠Α͏ͱͯ͠จݙΛ୳ͯ͠Έͨ → ฤूڑʹରͯ͠ࢦ͔͔࣌ؒΔΒ͍͠… • ࡞ͨ͠ࡧҾΛϑΝΠϧ͔ΒಡΈࠐΉॲཧͰ, طଘͷϥΠϒϥϦΛͬͨΒ༻ϝϞϦͷྔ͕രൃ (ݪҼෆ໌)
·ͱΊ • ߴͳจࣈྻݕࡧͷΞϧΰϦζϜΛ࣮ͯ͠Έͨ • ϒϥβ͔Β͑ΔΑ͏ʹͯ͠Έͨ ! • ࢀߟจݙ • Ԭݪ
େี. ߴจࣈྻղੳͷੈք. ؠॻళ. 2012.
(ิ) ΣʔϒϨοτ 3101212213 1000101101 10111 32223 10111 10001 ԼҐ2Ϗοτ →
ԼҐ1Ϗοτ → 0 1111 222 33 0 1 0 1 0 1