Save 37% off PRO during our Black Friday Sale! »

検索エンジン自作入門 Go Conference 2021 Spring

検索エンジン自作入門 Go Conference 2021 Spring

Go Conference 2021 Springの登壇資料です

アウトライン
1. 検索エンジンとは ~ 一般的な検索エンジンの仕組みと構成要素
2. 自作した検索エンジンの紹介 ~ 具体的に自作した検索エンジンの構成要素と動作例
3. 自作した検索エンジンの実装 ~ アルゴリズムとデータ構造、ライブラリ
4. おわりに ~ 検索エンジンを自作した感想

Fd223ed912b754bc4c4338d148a69c78?s=128

kotaroooo0

April 24, 2021
Tweet

Transcript

  1. Go Conference 2021 Spring 2021/04/24 @kotaroooo0 ݕࡧΤϯδϯ ࣗ࡞ೖ໳

  2. ҆ୡޫଠ࿠ @kotaroooo0 αʔόʔαΠυΤϯδχΞ CLIπʔϧ, Bot, ݕࡧΤϯδϯ ࣗݾ঺հ 2

  3. ͜ͷτʔΫ: ࣗ࡞ݕࡧΤϯδϯͷ࿩ https://kotaroooo0-dev.hatenablog.com/ 3

  4. λʔήοτ ݕࡧΤϯδϯʹڵຯ͋Δ ݕࡧΤϯδϯΛࣗ࡞͍ͨ͠ ΰʔϧ ݕࡧΤϯδϯͷ࢓૊Έ͕෼͔Δ GoͰݕࡧΤϯδϯΛࣗ࡞࢝͠ΊΒΕΔ 4

  5. Ξ΢τϥΠϯ 1. ݕࡧΤϯδϯͱ͸ ҰൠతͳݕࡧΤϯδϯͷ࢓૊Έͱߏ੒ཁૉ 2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ ۩ମతʹࣗ࡞ͨ͠ݕࡧΤϯδϯͷߏ੒ཁૉͱಈ࡞ྫ 3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷ ΞϧΰϦζϜͱσʔλߏ଄ɺϥΠϒϥϦ

    4. ͓ΘΓʹ ݕࡧΤϯδϯΛࣗ࡞ͨ͠ײ૝ 5
  6. Ξ΢τϥΠϯ 1. ݕࡧΤϯδϯͱ͸ ҰൠతͳݕࡧΤϯδϯͷ࢓૊Έͱߏ੒ཁૉ 2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ 3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷ 4. ͓ΘΓʹ

    6
  7. ݕࡧΤϯδϯͱ͸ υΩϡϝϯτͷू߹͔ΒΫΤϦʹద͢ΔυΩϡϝϯτΛฦ͢ ݕࡧΤϯδϯ ΫΤϦೖྗ υΩϡϝϯτฦ٫ υΩϡϝϯτొ࿥ 🤖 👦 Ϣʔβʔଆ ఏڙଆ

    ಛʹෳ਺υΩϡϝϯτ͔ΒಛఆͷจࣈྻΛݕࡧ͢Δ͜ͱΛશจݕࡧ 7 (GoogleݕࡧͰ͸Webϖʔδ) (GoogleݕࡧͰ͸ΫϩʔϦϯά)
  8. શจݕࡧͷछྨ G O R U B Y J S J

    S J S J S J S J S G O R U B Y G O R U B Y J S Grepܕ ΠϯσοΫεܕ ྫ: υΩϡϝϯτʮGO RUBY JSʯʹจࣈྻʮJSʯؚ͕·Ε͍ͯΔ͔ݕࡧ 8 ࣄલʹݕࡧର৅ͷυΩϡϝϯτͷࡧҾσʔλΛ࡞੒
  9. ΠϯσοΫεܕݕࡧΤϯδϯͷߏ੒ ΠϯσοΫε ΫΤϦೖྗ υΩϡϝϯτొ࿥ Analyzer Analyzer Indexer Searcher υΩϡϝϯτฦ٫ 9

    Sorter ҎԼͷཁૉͰߏ੒͞Ε͍ͯ·͢(ৄࡉ͸͜ͷޙઆ໌͠·͢) ※Sorter͸࣮૷͍ͯ͠ͳ͍ͨΊɺҎޙͷઆ໌Ͱ͸লུ
  10. ΠϯσοΫε సஔΠϯσοΫε(ࣙॻ+ϙεςΟϯάϦετ)͕Ұൠత go js ruby 1 1 2 3 4

    5 5 ࣙॻ ϙεςΟϯάϦετ 10
  11. Indexing “I have pens.” “We have a desk.” Indexer Analyzer

    จࣈྻΛ୯ޠ΁෼ղ సஔΠϯσοΫε࡞੒ 11 1 2 have pen we desk 1 1 2 2 2 సஔΠϯσοΫε υΩϡϝϯτΛసஔΠϯσοΫεʹొ࿥͢Δ
  12. Search ΫΤϦ “PENS” υΩϡϝϯτ1͕ώοτ Searcher Analzyer จࣈྻΛ୯ޠ΁෼ղ “PENS” → “pen”

    సஔΠϯσοΫεΛ୳ࡧ 12 have pen we desk 1 1 2 2 2 సஔΠϯσοΫεΛ୳ࡧͯ͠ద߹͢ΔυΩϡϝϯτΛ୳͢
  13. Analyzer Char Filter จࣈྻΛ୯ޠ΁෼ׂ͢Δલͷจࣈͷௐ੔ ྫ: & → & 13 Token

    Filter ෼ׂͨ͠୯ޠΛௐ੔ ྫ: খจࣈ΁౷Ұɺετοϓϫʔυআڈ Tokenizer จࣈྻΛ୯ޠʹ෼ׂɺදه༳Εٵऩ ྫ: I have pens → I, have, pens Char Filter Tokenizer Token Filter 3ͭ߹ΘͤͯAnalyzer จࣈྻ ෳ਺୯ޠ දه༳ΕΛٵऩ &จࣈྻΛ୯ޠʹ෼ׂ Indexing, Search྆ํͰར༻ ※྆ํͰಉ͡AnalyzerΛ࢖Θͳ͍ͱޮՌ͸ബ͍
  14. Ξ΢τϥΠϯ 1. ݕࡧΤϯδϯͱ͸ 2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ ۩ମతʹࣗ࡞ͨ͠ݕࡧΤϯδϯͷߏ੒ཁૉͱಈ࡞ྫ 3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷ 4. ͓ΘΓʹ

    14
  15. ͳͥݕࡧΤϯδϯΛࣗ࡞͍ͯ͠Δͷ͔ Elasticsearch͕ٕज़తɺϓϩμΫτతʹ໘ന͍ → શจݕࡧΤϯδϯͷ࢓૊ΈΛ஌Γ͍ͨ GoݚमΛडߨ → GoͰԿ͔࡞Γ͍ͨ GoͰݕࡧΤϯδϯΛ࣮૷͠Α͏💪 15

  16. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ 16 ※Sorter͸ະ࣮૷

  17. ಈ࡞ྫ Analyzer Analyzeલ MappingCharFilter StandardTokenizer LowercaseFilter StemmerFilter StopWordFilter I feel

    TIRED :( I feel TIRED sad I, feel, TIRED, sad i, feel, tired, sad i, feel, tire, sad feel, tire, sad ॲཧͷྲྀΕ 17
  18. Analyzeલ MorphologicalTokenizer ReadingformFilter ౦ژͱژ౎ ౦ژ, ͱ, ژ౎ tokyo, to, kyoto

    ॲཧͷྲྀΕ ಈ࡞ྫ Analyzer 18
  19. ಈ࡞ྫ Indexing & Search 19 ΠϯσοΫε΁௥Ճ 1: “Go Ruby PHP”

    2: “Go PHP Python” ϑϨʔζݕࡧ ”go php” ANDݕࡧ ”go php” 2ͷΈώοτ 1,2͕ώοτ 1⃣ 2⃣ 3⃣ 1⃣ 2⃣ 3⃣ 3: “Go Python Ruby”
  20. ಈ࡞ྫ ઃఆ΍ॳظԽ • సஔΠϯσοΫεΛӬଓԽ͢ΔMySQLઃఆ • MySQL΁ΞΫηε͢ΔӬଓԽ૚ͷॳظԽ • ΞφϥΠβͷॳظԽ • CharFilter:

    ͳ͠ • Tokenizer: εϖʔεͰ۠੾Δ(Standard Tokenizer) • Token Filter: খจࣈԽ 20
  21. ಈ࡞ྫ Indexing • సஔΠϯσοΫεΛ࡞੒͢ΔIndexerͷॳظԽ • ҎԼͷυΩϡϝϯτΛIndexing(సஔΠϯσοΫεΛੜ੒͠MySQL΁อଘ) 1: “Go Ruby PHP”

    • Analyzeޙ: go, ruby, php 2: “Go PHP Python” • Analyzeޙ: go, php, python 3: “Go Python Ruby” • Analyzeޙ: go, python, ruby 21 go 1 1:1 ruby php 2:1 3:1 1:2 1:3 3:3 2:2 ɾ ɾ ɾ υΩϡϝϯτID:ग़ݱҐஔ
  22. ಈ࡞ྫ Search ANDݕࡧ(“GO PHP”) • ΞφϥΠβͰେจࣈখจࣈͷදه༳ΕΛٵऩ • “Go”ͱ”PHP”͕྆ํؚ·ΕΔυΩϡϝϯτ1,2͕ώοτ 22 1:

    “Go Ruby PHP” 2: “Go PHP Python”
  23. ಈ࡞ྫ Search ϑϨʔζݕࡧ(“GO PHP”) • ΞφϥΠβͰେจࣈখจࣈͷදه༳ΕΛٵऩ • ϑϨʔζ”Go PHP”ΛؚΉυΩϡϝϯτ2ͷΈώοτ 23

    2: “Go PHP Python”
  24. Ξ΢τϥΠϯ 1. ݕࡧΤϯδϯͱ͸ 2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ 3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷ ΞϧΰϦζϜͱσʔλߏ଄ɺϥΠϒϥϦ 4. ͓ΘΓʹ

    24
  25. Analyzerͷ࣮૷ 25 Analyzerߏ଄ମ • CharFilter InterfaceͷSlice • Tokenizer Interface(1ͭͷΈ) •

    ToknenFilter InterfaceͷSlice Analyzeϝιου • string → TokenStream(TokenͷSlice) • CharFilter, Tokenizer, TokenFilterͷॱʹ దԠ
  26. Char Filterͷ࣮૷ 26 CharFilter Interface • Filterϝιου ࣮૷ྫ: MappingCharFilter •

    Key → Value΁Ϛοϐϯά͢Δ
  27. Tokenizerͷ࣮૷ 27 Tokenizer Interface • Tokenizeϝιου ࣮૷ྫ: StandardTokenizer • String

    → TokenStream • จࣈͰ΋਺ࣈͰ΋ͳ͚Ε͹෼ׂ
  28. Token Filterͷ࣮૷ 28 TokenFilter Interface • Filterϝιου ࣮૷ྫ: LowercaseFilter •

    strings.ToLowerͰখจࣈԽ ࣮૷ྫ: StopWordFilter • ର৅ͷจࣈΛTokenStream͔Βল͘
  29. Token Filterͷ࣮૷ ReadingformFilter • ׽ࣈ → ϩʔϚࣈ΁ม׵͢ΔFilter • ׽ࣈ →

    ฏԾ໊ → ϩʔϚࣈ • ׽ࣈ → ฏԾ໊(ܗଶૉղੳث͕ߦ͏) • ฏԾ໊ → ϩʔϚࣈ(ࣗ࡞) ࣮૷ • ۪௚ͳϚοϐϯά+ ϔϘϯࣜͷϧʔϧ 29
  30. సஔΠϯσοΫεͷ࣮૷ సஔΠϯσοΫε(ࣙॻ+ϙεςΟϯάϦετ) • Ϛοϓ: τʔΫϯID → ϙεςΟϯάϦετ ϙεςΟϯάϦετ • ϙεςΟϯάͷϦϯΫτϦετ

    ϙεςΟϯά • υΩϡϝϯτID • υΩϡϝϯτதͷҐஔ৘ใ(ϑϨʔζݕࡧ༻) • ࣍ͷϙεςΟϯά΁ͷϙΠϯλ 30
  31. సஔΠϯσοΫεͷ࣮૷ 31 0 TokenID(mapͷkey) 3 PostingListͷPostings Postings 1 6 17

    90 198 5 8 14 Postings Postings nil Next Next Next Next ※ऴ୺͸nilΛࢦ͢ TokenID 0ͷτʔΫϯ͕ υΩϡϝϯτID3,5,8,14ʹؚ·Ε͍ͯΔ͜ͱΛࣔ͢
  32. సஔΠϯσοΫεͷ࣮૷ 32 0 TokenID 3 PostingListͷPostings Postings 1 3 7

    90 198 5 8 14 Postings Postings nil Next Next Next Next ※ऴ୺͸nilΛࢦ͢ 7 Postings Next ϦϯΫτϦετ: ͋Δϊʔυ௚ޙͷσʔλͷ௥Ճ࡟আO(1) ϙεςΟϯάϦετͰ͸υΩϡϝϯτIDͷঢॱΛΩʔϓ͢ΔͨΊ ͜ͷૢ࡞͕ଟ͘ϦϯΫτϦετ͕޲͍͍ͯΔ ϙΠϯλͷ෇͚ସ͑ͷΈ
  33. Indexerͷ࣮૷ ( ͷ۩ମྫ) ϝϞϦ্ͷసஔΠϯσοΫε ௥Ճ͍ͨ͠υΩϡϝϯτ go php go php MySQL্ͷసஔΠϯσοΫε

    సஔΠϯσοΫε΁௥Ճ ϝϞϦͱMySQLͷసஔΠϯσοΫε Ϛʔδ ӬଓԽ ”Go PHP” go, php Analyze 33 Analyzer Indexer MySQL͔ΒRead 1⃣ 2⃣ 3⃣ ϝϞϦ্ͷసஔΠϯσοΫεͷ αΠζ͕ᮢ஋Λ௒͑Δͱ… 4⃣ 3⃣ MySQL΁Write (ᮢ஋͸ετϨʔδ΁ͷΞΫηεճ਺ͱϝϞϦ࢖༻ྔͷτϨʔυΦϑ)
  34. Indexerͷ࣮૷ Indexerͷιʔείʔυ͸ ϙεςΟϯάϦετͷ ϦϯΫτϦετΛ૸͕ࠪओ 34

  35. Searcherͷ࣮૷ 35 go php go php MySQL্ͷసஔΠϯσοΫε ݕࡧΫΤϦ ”Go PHP”

    go, php Analyzer Analyze Searcher ద߹ͨ͠υΩϡϝϯτ τʔΫϯʹରԠ͢ΔϙεςΟϯάϦετΛ MySQL͔ΒϝϞϦ΁Read ϙεςΟϯάϦετ ૸ࠪ
  36. ద߹͢ΔυΩϡϝϯτΛ୳ͨ͢ΊʹͲ͏ϙεςΟϯάϦετΛͲ͏૸ࠪ͢Δ͔? DAAT(Document At A Time) • ΫΤϦʹؚ·ΕΔτʔΫϯͷϙεςΟϯάϦετͷΧʔιϧΛ͢΂ͯ։͖ಉ ࣌ʹॲཧ • υΩϡϝϯτ͝ͱʹ૸ࠪ͢Δ

    • Elasticsearch(Lucene)Ͱ΋࠾༻ ※DAATʹରͯ͠TAAT(Team At A Time)͕͋Δ͕ࠓճ͸ϊʔλον ࢀߟࢿྉΛఴ෇ https://www.slideshare.net/tsubosaka/top-kquery Searcherͷ࣮૷ 36
  37. DAATͰͷANDݕࡧ Χʔιϧ͕ࢦ͢υΩϡϝϯτ͕ҟͳΔͷͰΧʔιϧΛਐΊΔ Result = [] 37 ྫ “PHP” AND “Ruby”

    ݕࡧ ruby php 1 2 5 5 2 4
  38. DAATͰͷANDݕࡧ Χʔιϧ͕ࢦ͢υΩϡϝϯτ౳͍͠ͷͰ݁ՌʹυΩϡϝϯτΛՃ͑Δ Result = [2] 38 ྫ “PHP” AND “Ruby”

    ݕࡧ ruby php 1 5 5 2 4 2
  39. DAATͰͷANDݕࡧ Χʔιϧ͕ࢦ͢υΩϡϝϯτ͕ҟͳΔͷͰΧʔιϧΛਐΊΔ Result = [2] 39 ྫ “PHP” AND “Ruby”

    ݕࡧ ruby php 1 5 2 4 2 5
  40. DAATͰͷANDݕࡧ Χʔιϧ͕ࢦ͢υΩϡϝϯτ౳͍͠ͷͰ݁ՌʹυΩϡϝϯτΛՃ͑Δ Result = [2, 5] 40 ྫ “PHP” AND

    “Ruby” ݕࡧ ruby php 1 5 2 4 2 5
  41. DAATͰͷORݕࡧ ΧʔιϧΛಈ͔ͯ͠ɺશͯͷIDΛॏෳͳ͘௥Ճɹ ✌︎ (‘ω' ✌︎ ) ݪ࢝త ( ✌︎ ’ω')

    ✌︎ ※࣮ࡍͷݕࡧΤϯδϯͰ͸ORݕࡧ͸࠷దԽ͕ߦΘΕ͍ͯΔ(͜ͷτʔΫͰ͸ϊʔλον) 41 ྫ “PHP” OR “Ruby” ݕࡧ ruby php 1 2 5 5 2 4
  42. ҎԼೋͭͷυΩϡϝϯτ΁“Amazon Prime”ͱݕࡧ • D1: “Amazon Prime movies” • D2: “a

    prime concern of Amazon” ϙεςΟϯάʹυΩϡϝϯτதͷҐஔ৘ใ ୯ޠͷॱংΛߟྀͰ͖Δ D1ͷΈΛώοτͤ͞ΒΕΔ 42 ϑϨʔζݕࡧ
  43. ΫΤϦ“Go PHP”Ͱݕࡧ͢Δ৔߹ 43 ϑϨʔζݕࡧ go 1:1 php 2:1 1:3 2:2

    υΩϡϝϯτID:ग़ݱҐஔ go 1:1-0 php 2:1-0 1:3-1 2:2-1 υΩϡϝϯτID:૬ରग़ݱҐஔ go 1:1 php 2:1 1:2 2:1 υΩϡϝϯτID:૬ରग़ݱҐஔ ૬ରग़ݱҐஔ͕౳͍͠ → ϑϨʔζΛؚΜͰ͍Δ 1: “Go Ruby PHP” 2: “Go PHP Python” ΫΤϦ্ͰGo͸0൪໨ɺPHP͸1൪໨ͳͷͰ૬ରग़ݱҐஔΛܭࢉ͢Δ సஔΠϯσοΫεʹొ࿥ͨ͠υΩϡϝϯτ
  44. Searcherͷιʔείʔυ΋ ϙεςΟϯάϦετͷ ϦϯΫτϦετͷૢ࡞͕ओ 44 Searcherͷ࣮૷

  45. Storageͷ࣮૷ ӬଓԽʹMySQLΛ࠾༻ • τʔΫϯ΍υΩϡϝϯτͷ࠾൪Λ؆୯ʹߦ͍͔ͨͬͨ • AUTO_INCREMENT • ӬଓԽʹ࿑ྗΛׂ͖ͨ͘ͳ͍ • ࢖͍׳Ε͍ͯΔMySQL

    • औΓସ͑΍͍͢Α͏ʹӬଓԽ૚Λந৅Խ 45
  46. సஔΠϯσοΫεͷѹॖ సஔΠϯσοΫεΛετϨʔδ͔ΒಡΈग़͕࣌ؒ͢ϘτϧωοΫʹͳΔ͜ͱ͕͋Δ → సஔΠϯσοΫεΛѹॖͯ͠อଘ͢Δ 46 ѹॖࡁͷసஔΠϯσοΫεΛ ετϨʔδ͔ΒಡΈग़࣌ؒ͢ ະѹॖͷసஔΠϯσοΫεΛ ετϨʔδ͔ΒಡΈग़࣌ؒ͢ <

    ্ͷෆ౳͕ࣜ੒ΓཱͭΑ͏ʹѹॖ ※ࠓճ͸ܭଌ·Ͱ͸͍ͯ͠ͳ͍ 600MB 1GB ׬ྃ ಡΈग़͠ ෮ݩ ಡΈग़͠ ׬ྃ సஔΠϯσοΫεͷ෮ݩ࣌ؒ ʴ
  47. సஔΠϯσοΫεͷѹॖ go υΩϡϝϯτID ̍ 4 21 241 412 500 560

    600 888 1324 2000 2321 10940 23131 29898 3001 8090 9012 3001 8090 9012 901 1201 ϙεςΟϯάϦετʹυΩϡϝϯτID͕Կेສͱ͋Δ৔߹΋͋Γ༰ྔ͕େ͖͘ͳΔ → Մม௕ූ߸Խͱࠩ෼ྻΛ૊Έ߹Θͤͯѹॖ 47
  48. సஔΠϯσοΫεͷѹॖ Մม௕ූ߸ ੔਺஋ʹରͯ͠4όΠτ΍8όΠτͷݻఆ௕ූ߸ΛׂΓ౰ͯΔͷ͕ී௨ ྫ͑͹64ϏοτͷintͰ͸1΋9223372036854775807΋ಉ͡όΠτ਺ ͔͠͠ ϙεςΟϯάϦετ͸খ͍͞੔਺஋Λଟ͘ѻ͏ 48 খ͍͞੔਺஋ΛΑΓখ͍͞όΠτ਺Ͱදݱ͢ΔՄม௕ූ߸ԽΛߦ͏

  49. సஔΠϯσοΫεͷѹॖ ࠩ෼ྻ go ̍ 5 50 250 500 800 1000

    2500 2800 υΩϡϝϯτID͸ঢॱʹιʔτ͞Ε͍ͯΔ ࠩ෼ΛͱΔ → Մม௕ූ߸Խʹ͓͍ͯ৘ใྔ͕গͳ͘ͳΔ go ̍ 4 45 200 250 300 200 1500 300 υΩϡϝϯτIDࠩ෼ 49
  50. సஔΠϯσοΫεͷѹॖ Մม௕ූ߸Խ • Մม௕ූ߸Խ͸ࣗ࡞ͤͣʹඪ४ύοέʔδ”encoding/gob"Λར༻ • JSON΍XML, Protocol buffersͳͲ͋Δ͕Goʹดͨ͡؀ڥͰ͸gobͷํ͕ศར • json:”hoge"ͷΑ͏ͳλά͕ෆཁ

    • ςΩετϕʔεͰͳ͘όΠφϦʔͰແବ͕ͳ͍ • ࣗݾهड़త • Goͷม਺ͳΒstructͰ΋nilͰ΋ΤϯίʔυͰ͖Δ(ϦϯΫτϦετ΋Ͱ͖ͨ) • gobͷunsigned intͷΤϯίʔυ • 0 → 00000000(1byte) • 7 → 00000111(1byte) • 256 → 11111110 000000001 00000000(3byte) https://blog.golang.org/gob https://golang.org/pkg/encoding/gob/ 50
  51. సஔΠϯσοΫεͷѹॖ 51

  52. Ξ΢τϥΠϯ 1. ݕࡧΤϯδϯͱ͸ 2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ 3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷ 4. ͓ΘΓʹ ݕࡧΤϯδϯΛࣗ࡞ͨ͠ײ૝

    52
  53. ͓ΘΓʹ 53 • ंྠͷ࠶ൃ໌ͰΠϯϓοτͱΞ΢τϓοτͷαΠΫϧΛճͤͨ • ࣗ࡞ָ͕͍͠(Ϟνϕʔγϣϯ) • ৄ͘͠ͳ͍ٕज़ʹ৮ΕΔඞཁ(Πϯϓοτ) • (ྫ)ΤϯίʔσΟϯά

    • encoding/gob • encoding/binary • protobuf • धཁ͕͋Δ(Ξ΢τϓοτ) • ϒϩά౤ߘ→ొஃ • Ξ΢τϓοτۦಈ։ൃ INPUT OUTPUT
  54. ࢀߟ 54 ݕࡧΤϯδϯࣗ࡞ೖ໳ɹʙखΛಈ͔͠ͳ͕Βݟ౉͢ݕࡧͷ෣୆ཪ ৘ใݕࡧ ݕࡧΤϯδϯͷ࣮૷ͱධՁ https://logmi.jp/tech/articles/322211 https://www.slideshare.net/tsubosaka/top-kquery https://artem.krylysov.com/blog/2020/07/28/lets-build-a-full-text-search-engine/ https://github.com/blevesearch/bleve ϦϙδτϦ:

    https://github.com/kotaroooo0/stalefish ϒϩάهࣄ: https://kotaroooo0-dev.hatenablog.com/entry/toy-search-engine
  55. Appendix 55

  56. • DAATͱରʹͳΔϙεςΟϯάϦετͷ૸ࠪํ๏ • ϙεςΟϯάϦετΛ̍ͭͣͭॲཧ • ಉ࣌ʹ։͘ϙεςΟϯάϦετͷΧʔιϧ͸̍ͭͷΈ • ୯ޠ͝ͱʹ૸ࠪ 56 TAAT(Term

    At A Time)
  57. TAATͰͷANDݕࡧ 1. ϙεςΟϯάϦετͷαΠζ͕࠷খͷ΋ͷ”ruby”Λબ୒͠ɺAccumulator࡞੒ [2, 5] 2. “php”ͷϦετΛ૸ࠪ ɾID2͸ؚ·Ε͍ͯΔ͔?ID5͸ؚ·Ε͍ͯΔ͔?ͷΈͷνΣοΫͰOK go ruby

    php 1 1 2 4 5 5 2 57 “PHP” AND “Ruby” ݕࡧ
  58. TAATͰͷORݕࡧ 1. ͲͷΩʔͰ΋ྑ͍ͷͰAccumlatorΛ࡞੒ [1,2,5] (php) 2. ॏෳ͠ͳ͍શͯͷIDΛAccumlatorʹ௥Ճ ✌︎ (‘ω' ✌︎

    ) ݪ࢝త ( ✌︎ ’ω') ✌︎ ※࣮ࡍͷݕࡧΤϯδϯͰ͸ORݕࡧ͸࠷దԽ͕ߦΘΕ͍ͯΔ(͜ͷτʔΫͰ͸৮Ε·ͤΜ) 58 “PHP” OR “Ruby” ݕࡧ ruby php 1 2 5 5 2