Upgrade to Pro — share decks privately, control downloads, hide ads and more …

検索エンジン自作入門 Go Conference 2021 Spring

検索エンジン自作入門 Go Conference 2021 Spring

Go Conference 2021 Springの登壇資料です

アウトライン
1. 検索エンジンとは ~ 一般的な検索エンジンの仕組みと構成要素
2. 自作した検索エンジンの紹介 ~ 具体的に自作した検索エンジンの構成要素と動作例
3. 自作した検索エンジンの実装 ~ アルゴリズムとデータ構造、ライブラリ
4. おわりに ~ 検索エンジンを自作した感想

kotaroooo0

April 24, 2021
Tweet

More Decks by kotaroooo0

Other Decks in Programming

Transcript

  1. Go Conference 2021 Spring


    2021/04/24


    @kotaroooo0
    ݕࡧΤϯδϯ


    ࣗ࡞ೖ໳

    View Slide

  2. ҆ୡޫଠ࿠ @kotaroooo0


    αʔόʔαΠυΤϯδχΞ


    CLIπʔϧ, Bot, ݕࡧΤϯδϯ
    ࣗݾ঺հ
    2

    View Slide

  3. ͜ͷτʔΫ: ࣗ࡞ݕࡧΤϯδϯͷ࿩
    https://kotaroooo0-dev.hatenablog.com/
    3

    View Slide

  4. λʔήοτ
    ݕࡧΤϯδϯʹڵຯ͋Δ


    ݕࡧΤϯδϯΛࣗ࡞͍ͨ͠
    ΰʔϧ
    ݕࡧΤϯδϯͷ࢓૊Έ͕෼͔Δ


    GoͰݕࡧΤϯδϯΛࣗ࡞࢝͠ΊΒΕΔ
    4

    View Slide

  5. Ξ΢τϥΠϯ
    1. ݕࡧΤϯδϯͱ͸


    ҰൠతͳݕࡧΤϯδϯͷ࢓૊Έͱߏ੒ཁૉ


    2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ


    ۩ମతʹࣗ࡞ͨ͠ݕࡧΤϯδϯͷߏ੒ཁૉͱಈ࡞ྫ


    3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷


    ΞϧΰϦζϜͱσʔλߏ଄ɺϥΠϒϥϦ


    4. ͓ΘΓʹ


    ݕࡧΤϯδϯΛࣗ࡞ͨ͠ײ૝
    5

    View Slide

  6. Ξ΢τϥΠϯ
    1. ݕࡧΤϯδϯͱ͸


    ҰൠతͳݕࡧΤϯδϯͷ࢓૊Έͱߏ੒ཁૉ


    2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ


    3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷


    4. ͓ΘΓʹ
    6

    View Slide

  7. ݕࡧΤϯδϯͱ͸
    υΩϡϝϯτͷू߹͔ΒΫΤϦʹద͢ΔυΩϡϝϯτΛฦ͢
    ݕࡧΤϯδϯ
    ΫΤϦೖྗ
    υΩϡϝϯτฦ٫ υΩϡϝϯτొ࿥
    🤖
    👦
    Ϣʔβʔଆ ఏڙଆ
    ಛʹෳ਺υΩϡϝϯτ͔ΒಛఆͷจࣈྻΛݕࡧ͢Δ͜ͱΛશจݕࡧ
    7
    (GoogleݕࡧͰ͸Webϖʔδ)
    (GoogleݕࡧͰ͸ΫϩʔϦϯά)

    View Slide

  8. શจݕࡧͷछྨ
    G O R U B Y J S
    J S
    J S
    J S
    J S
    J S
    G O
    R U B Y G O R U B Y J S
    Grepܕ
    ΠϯσοΫεܕ
    ྫ: υΩϡϝϯτʮGO RUBY JSʯʹจࣈྻʮJSʯؚ͕·Ε͍ͯΔ͔ݕࡧ
    8
    ࣄલʹݕࡧର৅ͷυΩϡϝϯτͷࡧҾσʔλΛ࡞੒

    View Slide

  9. ΠϯσοΫεܕݕࡧΤϯδϯͷߏ੒
    ΠϯσοΫε
    ΫΤϦೖྗ
    υΩϡϝϯτొ࿥
    Analyzer
    Analyzer
    Indexer
    Searcher
    υΩϡϝϯτฦ٫
    9
    Sorter
    ҎԼͷཁૉͰߏ੒͞Ε͍ͯ·͢(ৄࡉ͸͜ͷޙઆ໌͠·͢)


    ※Sorter͸࣮૷͍ͯ͠ͳ͍ͨΊɺҎޙͷઆ໌Ͱ͸লུ

    View Slide

  10. ΠϯσοΫε
    సஔΠϯσοΫε(ࣙॻ+ϙεςΟϯάϦετ)͕Ұൠత
    go
    js
    ruby
    1
    1
    2 3
    4
    5
    5
    ࣙॻ ϙεςΟϯάϦετ
    10

    View Slide

  11. Indexing
    “I have pens.”


    “We have a desk.”
    Indexer
    Analyzer
    จࣈྻΛ୯ޠ΁෼ղ సஔΠϯσοΫε࡞੒
    11
    1
    2
    have
    pen
    we
    desk
    1
    1
    2
    2
    2
    సஔΠϯσοΫε
    υΩϡϝϯτΛసஔΠϯσοΫεʹొ࿥͢Δ

    View Slide

  12. Search
    ΫΤϦ


    “PENS”
    υΩϡϝϯτ1͕ώοτ
    Searcher
    Analzyer
    จࣈྻΛ୯ޠ΁෼ղ


    “PENS” → “pen”
    సஔΠϯσοΫεΛ୳ࡧ
    12
    have
    pen
    we
    desk
    1
    1
    2
    2
    2
    సஔΠϯσοΫεΛ୳ࡧͯ͠ద߹͢ΔυΩϡϝϯτΛ୳͢

    View Slide

  13. Analyzer
    Char Filter


    จࣈྻΛ୯ޠ΁෼ׂ͢Δલͷจࣈͷௐ੔


    ྫ: & → &
    13
    Token Filter


    ෼ׂͨ͠୯ޠΛௐ੔


    ྫ: খจࣈ΁౷Ұɺετοϓϫʔυআڈ
    Tokenizer


    จࣈྻΛ୯ޠʹ෼ׂɺදه༳Εٵऩ


    ྫ: I have pens → I, have, pens


    Char
    Filter
    Tokenizer Token
    Filter
    3ͭ߹ΘͤͯAnalyzer
    จࣈྻ ෳ਺୯ޠ
    දه༳ΕΛٵऩ &จࣈྻΛ୯ޠʹ෼ׂ


    Indexing, Search྆ํͰར༻


    ※྆ํͰಉ͡AnalyzerΛ࢖Θͳ͍ͱޮՌ͸ബ͍

    View Slide

  14. Ξ΢τϥΠϯ
    1. ݕࡧΤϯδϯͱ͸


    2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ


    ۩ମతʹࣗ࡞ͨ͠ݕࡧΤϯδϯͷߏ੒ཁૉͱಈ࡞ྫ


    3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷


    4. ͓ΘΓʹ
    14

    View Slide

  15. ͳͥݕࡧΤϯδϯΛࣗ࡞͍ͯ͠Δͷ͔
    Elasticsearch͕ٕज़తɺϓϩμΫτతʹ໘ന͍


    → શจݕࡧΤϯδϯͷ࢓૊ΈΛ஌Γ͍ͨ
    GoݚमΛडߨ


    → GoͰԿ͔࡞Γ͍ͨ
    GoͰݕࡧΤϯδϯΛ࣮૷͠Α͏💪
    15

    View Slide

  16. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ
    16
    ※Sorter͸ະ࣮૷

    View Slide

  17. ಈ࡞ྫ Analyzer
    Analyzeલ MappingCharFilter StandardTokenizer LowercaseFilter StemmerFilter StopWordFilter
    I feel TIRED :( I feel TIRED sad I, feel, TIRED, sad i, feel, tired, sad i, feel, tire, sad feel, tire, sad
    ॲཧͷྲྀΕ
    17

    View Slide

  18. Analyzeલ MorphologicalTokenizer ReadingformFilter
    ౦ژͱژ౎ ౦ژ, ͱ, ژ౎ tokyo, to, kyoto
    ॲཧͷྲྀΕ
    ಈ࡞ྫ Analyzer
    18

    View Slide

  19. ಈ࡞ྫ Indexing & Search
    19
    ΠϯσοΫε΁௥Ճ
    1: “Go Ruby PHP”
    2: “Go PHP Python”
    ϑϨʔζݕࡧ ”go php”
    ANDݕࡧ ”go php”
    2ͷΈώοτ
    1,2͕ώοτ
    1⃣
    2⃣
    3⃣
    1⃣
    2⃣
    3⃣
    3: “Go Python Ruby”

    View Slide

  20. ಈ࡞ྫ ઃఆ΍ॳظԽ
    • సஔΠϯσοΫεΛӬଓԽ͢ΔMySQLઃఆ


    • MySQL΁ΞΫηε͢ΔӬଓԽ૚ͷॳظԽ


    • ΞφϥΠβͷॳظԽ


    • CharFilter: ͳ͠


    • Tokenizer: εϖʔεͰ۠੾Δ(Standard Tokenizer)


    • Token Filter: খจࣈԽ
    20

    View Slide

  21. ಈ࡞ྫ Indexing
    • సஔΠϯσοΫεΛ࡞੒͢ΔIndexerͷॳظԽ


    • ҎԼͷυΩϡϝϯτΛIndexing(సஔΠϯσοΫεΛੜ੒͠MySQL΁อଘ)


    1: “Go Ruby PHP”


    • Analyzeޙ: go, ruby, php


    2: “Go PHP Python”


    • Analyzeޙ: go, php, python


    3: “Go Python Ruby”


    • Analyzeޙ: go, python, ruby
    21
    go
    1
    1:1
    ruby
    php
    2:1 3:1
    1:2
    1:3
    3:3
    2:2
    ɾ


    ɾ


    ɾ
    υΩϡϝϯτID:ग़ݱҐஔ

    View Slide

  22. ಈ࡞ྫ Search
    ANDݕࡧ(“GO PHP”)


    • ΞφϥΠβͰେจࣈখจࣈͷදه༳ΕΛٵऩ


    • “Go”ͱ”PHP”͕྆ํؚ·ΕΔυΩϡϝϯτ1,2͕ώοτ
    22
    1: “Go Ruby PHP”
    2: “Go PHP Python”

    View Slide

  23. ಈ࡞ྫ Search
    ϑϨʔζݕࡧ(“GO PHP”)


    • ΞφϥΠβͰେจࣈখจࣈͷදه༳ΕΛٵऩ


    • ϑϨʔζ”Go PHP”ΛؚΉυΩϡϝϯτ2ͷΈώοτ
    23
    2: “Go PHP Python”

    View Slide

  24. Ξ΢τϥΠϯ
    1. ݕࡧΤϯδϯͱ͸


    2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ


    3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷


    ΞϧΰϦζϜͱσʔλߏ଄ɺϥΠϒϥϦ


    4. ͓ΘΓʹ
    24

    View Slide

  25. Analyzerͷ࣮૷
    25
    Analyzerߏ଄ମ


    • CharFilter InterfaceͷSlice


    • Tokenizer Interface(1ͭͷΈ)


    • ToknenFilter InterfaceͷSlice


    Analyzeϝιου


    • string → TokenStream(TokenͷSlice)


    • CharFilter, Tokenizer, TokenFilterͷॱʹ
    దԠ

    View Slide

  26. Char Filterͷ࣮૷
    26
    CharFilter Interface


    • Filterϝιου


    ࣮૷ྫ: MappingCharFilter


    • Key → Value΁Ϛοϐϯά͢Δ

    View Slide

  27. Tokenizerͷ࣮૷
    27
    Tokenizer Interface


    • Tokenizeϝιου


    ࣮૷ྫ: StandardTokenizer


    • String → TokenStream


    • จࣈͰ΋਺ࣈͰ΋ͳ͚Ε͹෼ׂ

    View Slide

  28. Token Filterͷ࣮૷
    28
    TokenFilter Interface


    • Filterϝιου


    ࣮૷ྫ: LowercaseFilter


    • strings.ToLowerͰখจࣈԽ


    ࣮૷ྫ: StopWordFilter


    • ର৅ͷจࣈΛTokenStream͔Βল͘

    View Slide

  29. Token Filterͷ࣮૷
    ReadingformFilter


    • ׽ࣈ → ϩʔϚࣈ΁ม׵͢ΔFilter


    • ׽ࣈ → ฏԾ໊ → ϩʔϚࣈ


    • ׽ࣈ → ฏԾ໊(ܗଶૉղੳث͕ߦ͏)


    • ฏԾ໊ → ϩʔϚࣈ(ࣗ࡞)


    ࣮૷


    • ۪௚ͳϚοϐϯά+ ϔϘϯࣜͷϧʔϧ
    29

    View Slide

  30. సஔΠϯσοΫεͷ࣮૷
    సஔΠϯσοΫε(ࣙॻ+ϙεςΟϯάϦετ)


    • Ϛοϓ: τʔΫϯID → ϙεςΟϯάϦετ


    ϙεςΟϯάϦετ


    • ϙεςΟϯάͷϦϯΫτϦετ


    ϙεςΟϯά


    • υΩϡϝϯτID


    • υΩϡϝϯτதͷҐஔ৘ใ(ϑϨʔζݕࡧ༻)


    • ࣍ͷϙεςΟϯά΁ͷϙΠϯλ
    30

    View Slide

  31. సஔΠϯσοΫεͷ࣮૷
    31
    0
    TokenID(mapͷkey)
    3
    PostingListͷPostings Postings
    1
    6
    17
    90
    198
    5 8 14
    Postings Postings
    nil
    Next Next Next Next
    ※ऴ୺͸nilΛࢦ͢
    TokenID 0ͷτʔΫϯ͕


    υΩϡϝϯτID3,5,8,14ʹؚ·Ε͍ͯΔ͜ͱΛࣔ͢

    View Slide

  32. సஔΠϯσοΫεͷ࣮૷
    32
    0
    TokenID
    3
    PostingListͷPostings Postings
    1
    3
    7
    90
    198
    5 8 14
    Postings Postings
    nil
    Next
    Next
    Next Next
    ※ऴ୺͸nilΛࢦ͢
    7
    Postings
    Next
    ϦϯΫτϦετ: ͋Δϊʔυ௚ޙͷσʔλͷ௥Ճ࡟আO(1)


    ϙεςΟϯάϦετͰ͸υΩϡϝϯτIDͷঢॱΛΩʔϓ͢ΔͨΊ


    ͜ͷૢ࡞͕ଟ͘ϦϯΫτϦετ͕޲͍͍ͯΔ
    ϙΠϯλͷ෇͚ସ͑ͷΈ

    View Slide

  33. Indexerͷ࣮૷ ( ͷ۩ମྫ)
    ϝϞϦ্ͷసஔΠϯσοΫε
    ௥Ճ͍ͨ͠υΩϡϝϯτ
    go
    php
    go
    php
    MySQL্ͷసஔΠϯσοΫε
    సஔΠϯσοΫε΁௥Ճ
    ϝϞϦͱMySQLͷసஔΠϯσοΫε


    Ϛʔδ
    ӬଓԽ
    ”Go PHP”
    go, php
    Analyze
    33
    Analyzer Indexer
    MySQL͔ΒRead
    1⃣ 2⃣
    3⃣ ϝϞϦ্ͷసஔΠϯσοΫεͷ


    αΠζ͕ᮢ஋Λ௒͑Δͱ…
    4⃣
    3⃣
    MySQL΁Write
    (ᮢ஋͸ετϨʔδ΁ͷΞΫηεճ਺ͱϝϞϦ࢖༻ྔͷτϨʔυΦϑ)

    View Slide

  34. Indexerͷ࣮૷
    Indexerͷιʔείʔυ͸


    ϙεςΟϯάϦετͷ


    ϦϯΫτϦετΛ૸͕ࠪओ
    34

    View Slide

  35. Searcherͷ࣮૷
    35
    go
    php
    go
    php
    MySQL্ͷసஔΠϯσοΫε
    ݕࡧΫΤϦ


    ”Go PHP”
    go, php
    Analyzer
    Analyze
    Searcher
    ద߹ͨ͠υΩϡϝϯτ
    τʔΫϯʹରԠ͢ΔϙεςΟϯάϦετΛ


    MySQL͔ΒϝϞϦ΁Read
    ϙεςΟϯάϦετ


    ૸ࠪ

    View Slide

  36. ద߹͢ΔυΩϡϝϯτΛ୳ͨ͢ΊʹͲ͏ϙεςΟϯάϦετΛͲ͏૸ࠪ͢Δ͔?


    DAAT(Document At A Time)


    • ΫΤϦʹؚ·ΕΔτʔΫϯͷϙεςΟϯάϦετͷΧʔιϧΛ͢΂ͯ։͖ಉ
    ࣌ʹॲཧ


    • υΩϡϝϯτ͝ͱʹ૸ࠪ͢Δ


    • Elasticsearch(Lucene)Ͱ΋࠾༻


    ※DAATʹରͯ͠TAAT(Team At A Time)͕͋Δ͕ࠓճ͸ϊʔλον


    ࢀߟࢿྉΛఴ෇ https://www.slideshare.net/tsubosaka/top-kquery
    Searcherͷ࣮૷
    36

    View Slide

  37. DAATͰͷANDݕࡧ
    Χʔιϧ͕ࢦ͢υΩϡϝϯτ͕ҟͳΔͷͰΧʔιϧΛਐΊΔ


    Result = []
    37
    ྫ “PHP” AND “Ruby” ݕࡧ
    ruby
    php
    1
    2
    5
    5
    2
    4

    View Slide

  38. DAATͰͷANDݕࡧ
    Χʔιϧ͕ࢦ͢υΩϡϝϯτ౳͍͠ͷͰ݁ՌʹυΩϡϝϯτΛՃ͑Δ


    Result = [2]
    38
    ྫ “PHP” AND “Ruby” ݕࡧ
    ruby
    php
    1 5
    5
    2
    4
    2

    View Slide

  39. DAATͰͷANDݕࡧ
    Χʔιϧ͕ࢦ͢υΩϡϝϯτ͕ҟͳΔͷͰΧʔιϧΛਐΊΔ


    Result = [2]
    39
    ྫ “PHP” AND “Ruby” ݕࡧ
    ruby
    php
    1 5
    2
    4
    2 5

    View Slide

  40. DAATͰͷANDݕࡧ
    Χʔιϧ͕ࢦ͢υΩϡϝϯτ౳͍͠ͷͰ݁ՌʹυΩϡϝϯτΛՃ͑Δ


    Result = [2, 5]
    40
    ྫ “PHP” AND “Ruby” ݕࡧ
    ruby
    php
    1 5
    2
    4
    2 5

    View Slide

  41. DAATͰͷORݕࡧ
    ΧʔιϧΛಈ͔ͯ͠ɺશͯͷIDΛॏෳͳ͘௥Ճɹ
    ✌︎
    (‘ω'
    ✌︎
    ) ݪ࢝త (
    ✌︎
    ’ω')
    ✌︎


    ※࣮ࡍͷݕࡧΤϯδϯͰ͸ORݕࡧ͸࠷దԽ͕ߦΘΕ͍ͯΔ(͜ͷτʔΫͰ͸ϊʔλον)
    41
    ྫ “PHP” OR “Ruby” ݕࡧ
    ruby
    php
    1
    2
    5
    5
    2
    4

    View Slide

  42. ҎԼೋͭͷυΩϡϝϯτ΁“Amazon Prime”ͱݕࡧ


    • D1: “Amazon Prime movies”


    • D2: “a prime concern of Amazon”
    ϙεςΟϯάʹυΩϡϝϯτதͷҐஔ৘ใ


    ୯ޠͷॱংΛߟྀͰ͖Δ


    D1ͷΈΛώοτͤ͞ΒΕΔ
    42
    ϑϨʔζݕࡧ

    View Slide

  43. ΫΤϦ“Go PHP”Ͱݕࡧ͢Δ৔߹
    43
    ϑϨʔζݕࡧ
    go
    1:1
    php
    2:1
    1:3 2:2
    υΩϡϝϯτID:ग़ݱҐஔ
    go
    1:1-0
    php
    2:1-0
    1:3-1 2:2-1
    υΩϡϝϯτID:૬ରग़ݱҐஔ
    go
    1:1
    php
    2:1
    1:2 2:1
    υΩϡϝϯτID:૬ରग़ݱҐஔ
    ૬ରग़ݱҐஔ͕౳͍͠ → ϑϨʔζΛؚΜͰ͍Δ
    1: “Go Ruby PHP”
    2: “Go PHP Python”
    ΫΤϦ্ͰGo͸0൪໨ɺPHP͸1൪໨ͳͷͰ૬ରग़ݱҐஔΛܭࢉ͢Δ
    సஔΠϯσοΫεʹొ࿥ͨ͠υΩϡϝϯτ

    View Slide

  44. Searcherͷιʔείʔυ΋


    ϙεςΟϯάϦετͷ


    ϦϯΫτϦετͷૢ࡞͕ओ
    44
    Searcherͷ࣮૷

    View Slide

  45. Storageͷ࣮૷
    ӬଓԽʹMySQLΛ࠾༻


    • τʔΫϯ΍υΩϡϝϯτͷ࠾൪Λ؆୯ʹߦ͍͔ͨͬͨ


    • AUTO_INCREMENT


    • ӬଓԽʹ࿑ྗΛׂ͖ͨ͘ͳ͍


    • ࢖͍׳Ε͍ͯΔMySQL


    • औΓସ͑΍͍͢Α͏ʹӬଓԽ૚Λந৅Խ
    45

    View Slide

  46. సஔΠϯσοΫεͷѹॖ
    సஔΠϯσοΫεΛετϨʔδ͔ΒಡΈग़͕࣌ؒ͢ϘτϧωοΫʹͳΔ͜ͱ͕͋Δ


    → సஔΠϯσοΫεΛѹॖͯ͠อଘ͢Δ
    46
    ѹॖࡁͷసஔΠϯσοΫεΛ


    ετϨʔδ͔ΒಡΈग़࣌ؒ͢
    ະѹॖͷసஔΠϯσοΫεΛ


    ετϨʔδ͔ΒಡΈग़࣌ؒ͢
    <
    ্ͷෆ౳͕ࣜ੒ΓཱͭΑ͏ʹѹॖ ※ࠓճ͸ܭଌ·Ͱ͸͍ͯ͠ͳ͍
    600MB 1GB
    ׬ྃ
    ಡΈग़͠ ෮ݩ ಡΈग़͠ ׬ྃ
    సஔΠϯσοΫεͷ෮ݩ࣌ؒ
    ʴ

    View Slide

  47. సஔΠϯσοΫεͷѹॖ
    go
    υΩϡϝϯτID


    ̍ 4 21 241 412 500 560 600 888
    1324 2000 2321 10940 23131 29898
    3001 8090 9012 3001 8090 9012
    901 1201
    ϙεςΟϯάϦετʹυΩϡϝϯτID͕Կेສͱ͋Δ৔߹΋͋Γ༰ྔ͕େ͖͘ͳΔ


    → Մม௕ූ߸Խͱࠩ෼ྻΛ૊Έ߹Θͤͯѹॖ
    47

    View Slide

  48. సஔΠϯσοΫεͷѹॖ Մม௕ූ߸
    ੔਺஋ʹରͯ͠4όΠτ΍8όΠτͷݻఆ௕ූ߸ΛׂΓ౰ͯΔͷ͕ී௨


    ྫ͑͹64ϏοτͷintͰ͸1΋9223372036854775807΋ಉ͡όΠτ਺


    ͔͠͠


    ϙεςΟϯάϦετ͸খ͍͞੔਺஋Λଟ͘ѻ͏
    48
    খ͍͞੔਺஋ΛΑΓখ͍͞όΠτ਺Ͱදݱ͢ΔՄม௕ූ߸ԽΛߦ͏

    View Slide

  49. సஔΠϯσοΫεͷѹॖ ࠩ෼ྻ
    go
    ̍ 5 50 250 500 800 1000 2500 2800
    υΩϡϝϯτID͸ঢॱʹιʔτ͞Ε͍ͯΔ


    ࠩ෼ΛͱΔ → Մม௕ූ߸Խʹ͓͍ͯ৘ใྔ͕গͳ͘ͳΔ
    go
    ̍ 4 45 200 250 300 200 1500 300
    υΩϡϝϯτIDࠩ෼
    49

    View Slide

  50. సஔΠϯσοΫεͷѹॖ Մม௕ූ߸Խ
    • Մม௕ූ߸Խ͸ࣗ࡞ͤͣʹඪ४ύοέʔδ”encoding/gob"Λར༻


    • JSON΍XML, Protocol buffersͳͲ͋Δ͕Goʹดͨ͡؀ڥͰ͸gobͷํ͕ศར


    • json:”hoge"ͷΑ͏ͳλά͕ෆཁ


    • ςΩετϕʔεͰͳ͘όΠφϦʔͰແବ͕ͳ͍


    • ࣗݾهड़త


    • Goͷม਺ͳΒstructͰ΋nilͰ΋ΤϯίʔυͰ͖Δ(ϦϯΫτϦετ΋Ͱ͖ͨ)


    • gobͷunsigned intͷΤϯίʔυ


    • 0 → 00000000(1byte)


    • 7 → 00000111(1byte)


    • 256 → 11111110 000000001 00000000(3byte)
    https://blog.golang.org/gob


    https://golang.org/pkg/encoding/gob/ 50

    View Slide

  51. సஔΠϯσοΫεͷѹॖ
    51

    View Slide

  52. Ξ΢τϥΠϯ
    1. ݕࡧΤϯδϯͱ͸


    2. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ঺հ


    3. ࣗ࡞ͨ͠ݕࡧΤϯδϯͷ࣮૷


    4. ͓ΘΓʹ


    ݕࡧΤϯδϯΛࣗ࡞ͨ͠ײ૝
    52

    View Slide

  53. ͓ΘΓʹ
    53
    • ंྠͷ࠶ൃ໌ͰΠϯϓοτͱΞ΢τϓοτͷαΠΫϧΛճͤͨ


    • ࣗ࡞ָ͕͍͠(Ϟνϕʔγϣϯ)


    • ৄ͘͠ͳ͍ٕज़ʹ৮ΕΔඞཁ(Πϯϓοτ)


    • (ྫ)ΤϯίʔσΟϯά


    • encoding/gob


    • encoding/binary


    • protobuf


    • धཁ͕͋Δ(Ξ΢τϓοτ)


    • ϒϩά౤ߘ→ొஃ


    • Ξ΢τϓοτۦಈ։ൃ
    INPUT OUTPUT

    View Slide

  54. ࢀߟ
    54
    ݕࡧΤϯδϯࣗ࡞ೖ໳ɹʙखΛಈ͔͠ͳ͕Βݟ౉͢ݕࡧͷ෣୆ཪ


    ৘ใݕࡧ ݕࡧΤϯδϯͷ࣮૷ͱධՁ


    https://logmi.jp/tech/articles/322211


    https://www.slideshare.net/tsubosaka/top-kquery


    https://artem.krylysov.com/blog/2020/07/28/lets-build-a-full-text-search-engine/


    https://github.com/blevesearch/bleve


    ϦϙδτϦ: https://github.com/kotaroooo0/stalefish


    ϒϩάهࣄ: https://kotaroooo0-dev.hatenablog.com/entry/toy-search-engine


    View Slide

  55. Appendix
    55

    View Slide

  56. • DAATͱରʹͳΔϙεςΟϯάϦετͷ૸ࠪํ๏


    • ϙεςΟϯάϦετΛ̍ͭͣͭॲཧ


    • ಉ࣌ʹ։͘ϙεςΟϯάϦετͷΧʔιϧ͸̍ͭͷΈ


    • ୯ޠ͝ͱʹ૸ࠪ
    56
    TAAT(Term At A Time)

    View Slide

  57. TAATͰͷANDݕࡧ
    1. ϙεςΟϯάϦετͷαΠζ͕࠷খͷ΋ͷ”ruby”Λબ୒͠ɺAccumulator࡞੒ [2, 5]


    2. “php”ͷϦετΛ૸ࠪ


    ɾID2͸ؚ·Ε͍ͯΔ͔?ID5͸ؚ·Ε͍ͯΔ͔?ͷΈͷνΣοΫͰOK
    go
    ruby
    php
    1
    1
    2
    4
    5
    5
    2
    57
    “PHP” AND “Ruby” ݕࡧ

    View Slide

  58. TAATͰͷORݕࡧ
    1. ͲͷΩʔͰ΋ྑ͍ͷͰAccumlatorΛ࡞੒ [1,2,5] (php)


    2. ॏෳ͠ͳ͍શͯͷIDΛAccumlatorʹ௥Ճ


    ✌︎
    (‘ω'
    ✌︎
    ) ݪ࢝త (
    ✌︎
    ’ω')
    ✌︎


    ※࣮ࡍͷݕࡧΤϯδϯͰ͸ORݕࡧ͸࠷దԽ͕ߦΘΕ͍ͯΔ(͜ͷτʔΫͰ͸৮Ε·ͤΜ)
    58
    “PHP” OR “Ruby” ݕࡧ
    ruby
    php
    1
    2
    5
    5
    2

    View Slide