$30 off During Our Annual Pro Sale. View Details »

Create my own search engine.

Create my own search engine.

RubyKaigi 2022 day 2.

seki at druby.org

September 11, 2022
Tweet

More Decks by seki at druby.org

Other Decks in Programming

Transcript

  1. Create my own search engine.


    [email protected]


    ࣗ෼༻ͷݕࡧγεςϜΛ࡞Δ࿩

    View Slide

  2. Pokémon TCG similar deck search
    ϙέΧͷσοΩΛݕࡧ͢ΔγεςϜΛ࡞ͬͨΑʂ
    2

    View Slide

  3. Pokémon TCG similar deck search
    https://github.com/seki/Masaki


    https://hamana.herokuapp.com/
    3
    Heroku App Web Browser
    deck

    similarity
    Heroku Scheduler
    Crowler Search Engine

    View Slide

  4. Pokémon TCG similar deck search
    πΠʔτ͞Εͨ৽ணσοΩΛݟΒΕΔͧʂࣅ͍ͯΔσοΩͱͷdiff΋Θ͔Δʂʂ
    4

    View Slide

  5. Pokémon TCG similar deck search
    ࣗ෼ͷπΠʔτͨ͠σοΩΛ୳ͤΔ
    5

    View Slide

  6. Pokémon TCG similar deck search
    ࣗ෼ͷมߋཤྺΛௐ΂Δ͜ͱ΋Ͱ͖Δʂ
    6

    View Slide

  7. Pokémon TCG similar deck search
    ʮηΩʯΛ࠾༻͍ͯ͠ΔσοΩΛ୳͢
    7

    View Slide

  8. Agenda
    About me, ruby and Pokémon TCG


    Pokémon TCG


    deck similarity


    whole system
    8

    View Slide

  9. Agenda
    About me, ruby and Pokémon TCG


    Pokémon TCG


    implementation (engine) - deck vector, cos similarity


    implementation (system)
    9

    View Slide

  10. About me, ruby and Pokémon TCG
    Masatoshi Seki, Ruby Core Committer (dRuby, Rinda, ERB), Programmer
    2010 WCS ಢ໦ݝ༧બ༏উ͕།ҰތΕΔ੒੷
    10
    Ruby Pokémon TCG
    1996 ruby-1.0⭐ Pokémon Red/Blue, Pokémon TCG⭐
    1999 ruby-1.4.0 ERB, dRuby
    2006 RubyKaigi 2006 @m_seki started Pokémon TCG⭐
    2010 WCS Tochigi pref. winner⭐
    2022 RubyKaigi 2022
    2023 WCS Yokohama
    Γͬͪ͘ΌΜωϧɺ֏ΠϯλϏϡʔճ

    View Slide

  11. About me, ruby and Pokémon TCG
    ϙέΧ׆ಈʹ͍ͭͯɺΑΓ͘Θ͘͠͸ͪ͜Β΁
    11
    Γͬͪ͘ΌΜωϧɺ֏ΠϯλϏϡʔճ

    View Slide

  12. Agenda
    About me, ruby and Pokémon TCG


    Pokémon TCG


    deck similarity


    whole system
    12

    View Slide

  13. Pokémon
    Trainer Energy
    Pokémon TCG
    Build a deck with 60 cards

    View Slide

  14. www.pokemon-card.com - card search
    15717cards


    card-id = 1..42091
    card-id͸42091·Ͱͷ੔਺Ͱɺࣃൈ͚͕͋Γɺશ෦Ͱ15717छྨ
    14

    View Slide

  15. www.pokemon-card.com - deck build
    kkFFfv-WaK14L-VkFkdv


    deck-code
    ϋογϡ஋ʁཚ਺ʁ
    15

    View Slide

  16. deck internal
    card-idͱຕ਺ͷλϓϧͰදݱ͞Ε͍ͯΔ ... ͳΜ͔ݟͨ͜ͱ͋Δͧ
    16
    [[40942, 4],


    [41111, 2],


    [40616, 2],


    [41486, 2],


    [40966, 2],


    [38020, 2],


    [39193, 4],


    [41490, 1],


    [40992, 2],


    [40292, 4],


    [38377, 4],


    [38128, 1],


    [39728, 1],


    [38392, 2],


    [41340, 2],


    [40998, 1],


    [40137, 2],


    [40304, 2],


    [40995, 2],


    [41295, 1],


    [39652, 3],


    [40885, 8],


    [37980, 2],


    [38002, 4]]

    View Slide

  17. I studied with NLP textbooks!
    Bag-of-Cards?


    Bag-of-Words?


    Vectorized Text !
    ࣗવݴޠॲཧͰݟͨ͜ͱ͋Δ΍ͭʁ
    17
    [[40942, 4],


    [41111, 2],


    [40616, 2],


    [41486, 2],


    [40966, 2],


    [38020, 2],


    [39193, 4],


    [41490, 1],


    [40992, 2],


    [40292, 4],


    [38377, 4],


    [38128, 1],


    [39728, 1],


    [38392, 2],


    [41340, 2],


    [40998, 1],


    [40137, 2],


    [40304, 2],


    [40995, 2],


    [41295, 1],


    [39652, 3],


    [40885, 8],


    [37980, 2],


    [38002, 4]]
    💡

    View Slide

  18. Agenda
    About me, ruby and Pokémon TCG


    Pokémon TCG


    deck similarity


    whole system
    ϕΫτϧԽ͞ΕͨจॻΈ͍ͨͩ
    18

    View Slide

  19. ɹsimilar document search (NLP)


    word segmentation


    vectorize


    cosine similarity
    ॾઆ͋Γ·͢
    📖
    {"I" => 2,
    "like" => 3,
    "ruby" => 12,
    ... }
    v = [0, 0, 0, 0, 4, 2, 24, 0, 0, 1, ...]
    cos = v1.dot(v2) / (v1.norm * v2.norm)

    View Slide

  20. deck ⊆ natural language text
    card ≒ word


    60 words


    unordered
    Θ͔ͣ60୯ޠɺޠኮ15717ɺॱংͳ͠ɺ୯ޠ෼ׂෆཁ
    20

    View Slide

  21. Vectorization is easy
    no word segmentation required


    TF-IDF


    TF - number of copies of the card


    IDF - infrequently used cards have a higher weight
    σοΩͷϕΫτϧԽ͸NLPΑΓ؆୯ɻTF-IDFΛϕΫτϧͷ੒෼ʹ࢖͏ͷ΋ಉ͡ɻ
    21
    v = [0, 0, 0, 0, 4, 2, 24, 0, 0, 1, ...]

    View Slide

  22. Generate deck titles with IDF
    σοΩͷಛ௃ͷઆ໌ʢ͋·Γ࢖ΘΕ͍ͯͳ͍ΧʔυΛ༏ઌͯ͠දࣔʣʹ΋࢖͏Α
    22
    sort_by -IDF

    View Slide

  23. Normalization


    ಉ͡ҙຯͷΧʔυ͕͋ΔͷͰਖ਼نԽ͢Δඞཁ͕͋ΔΑ
    23

    View Slide

  24. Normalization


    ಉ͡ҙຯͷΧʔυ͕͋ΔͷͰਖ਼نԽ͢Δඞཁ͕͋ΔΑ
    24
    Foil Card
    Full Art Card

    View Slide

  25. Pokémon
    Trainer Energy
    identify the card
    identify by attribute
    identify by name

    View Slide

  26. card-id normalize dictionary
    ਖ਼نԽ͢Δͱ9275छྨͷΧʔυʹͳΔɻ
    26
    [[22032, "SPΤωϧΪʔ"],


    [22064, 22032],


    [22261, 22032],


    [23027, 22032],


    [23054, 22032],


    [23253, 22032],


    [23268, 22032],


    [39130, "͍͖ͪ͛ΤωϧΪʔ"],


    [39242, 39130],


    [39556, 39130],


    [40329, 39130],


    [40879, 39130],


    [39200, "ΕΜ͖͛ΤωϧΪʔ"],


    [39245, 39200],


    [39580, 39200],


    [40333, 39200],


    [40483, 39200], {22032=>"SPΤωϧΪʔ",


    39130=>"͍͖ͪ͛ΤωϧΪʔ",


    39200=>"ΕΜ͖͛ΤωϧΪʔ", ....
    {22032=>22032,


    22064=>22032,


    22261=>22032,


    23027=>22032,


    23054=>22032,


    23253=>22032,


    23268=>22032,


    39130=>39130,


    39242=>39130,


    39556=>39130,


    40329=>39130,


    40879=>39130,


    39200=>39200,


    39245=>39200,


    39580=>39200,


    40333=>39200,


    40483=>39200,


    40881=>39200, ...
    Intermediate data (source)


    data/uniq_pokemon.txt


    data/uniq_energy_trainer_all.txt
    @id_norm


    card-id to normalized-card-id


    @name


    normalized-card-id to name


    download card page (HTML)
    scraping and sort
    in-memory

    View Slide

  27. card-id normalize dictionary


    diff͕ಡΈ΍͍͢ॻࣜʹͨ͠
    27
    [[22032, "SPΤωϧΪʔ"],


    [22064, 22032],


    [22261, 22032],


    [23027, 22032],


    [23054, 22032],


    [23253, 22032],


    [23268, 22032],


    [39130, "͍͖ͪ͛ΤωϧΪʔ"],


    [39242, 39130],


    [39556, 39130],


    [40329, 39130],


    [40879, 39130],


    [39200, "ΕΜ͖͛ΤωϧΪʔ"],


    [39245, 39200],


    [39580, 39200],


    [40333, 39200],


    [40483, 39200],
    Intermediate data (source)


    data/uniq_pokemon.txt


    data/uniq_energy_trainer_all.txt
    download card page (HTML)
    scraping and sort
    @@ -1,4 +1,6 @@


    -[[39130, "͍͖ͪ͛ΤωϧΪʔ"],


    +[[42029, "VΨʔυΤωϧΪʔ"],


    + [42075, 42029],


    + [39130, "͍͖ͪ͛ΤωϧΪʔ"],


    [39242, 39130],


    [39556, 39130],


    [40329, 39130],


    @@ -36,6 +38,8 @@


    [40323, 37978],


    [41002, "μϒϧλʔϘΤωϧΪʔ"],


    [41319, 41002],


    + [42028, 41002],


    + [42074, 41002],


    [37867, "πΠϯΤωϧΪʔ"],


    [38399, 37867],


    [38482, 37867],


    @@ -557,6 +561,7 @@


    [35704, 34759],


    [36967, 34759],


    [41900, 34759],


    + [42089, 34759],


    [38476, "͓ͱͳͷ͓Ͷ͑͞Μ"],
    git diff

    View Slide

  28. Deck vector
    9275-dimensional vector
    28
    Vector[0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

    View Slide

  29. Deck vector implementation


    sorted array of tuple (normalized-card-id, TF)


    @idf : normalized-card-id → IDF


    @norm : deck-code → norm
    29
    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  30. inner_product implementation


    intersection of two sets


    move the smaller cursor


    matched, move both cursor
    ϚʔδιʔτͷϚʔδ෦෼ͰΑ͘ݟ͔͚Δॲཧ
    30
    [[4, 8],👈


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],👈


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  31. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    31
    [[4, 8],👈


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],👈


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  32. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    32
    [[4, 8],


    [2111, 2],👈


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],👈


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  33. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    33
    [[4, 8],


    [2111, 2],👈


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],👈


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  34. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    34
    [[4, 8],


    [2111, 2],


    [27549, 4]👈


    [37497, 2],


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2]👈


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  35. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    35
    [[4, 8],


    [2111, 2],


    [27549, 4]👈


    [37497, 2],


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4]👈


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  36. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    36
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2]👈


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2]👈


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  37. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    37
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2]👈


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1]👈


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  38. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    38
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2]👈


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3]👈


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  39. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    39
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2]👈


    [37667, 4],


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3]👈


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  40. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    40
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4]👈


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3]👈


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  41. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    41
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4]👈


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3]👈


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  42. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    42
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4]👈


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4]👈


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  43. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    43
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4]👈


    [37976, 2],


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2]👈


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  44. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    44
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4],


    [37976, 2]👈


    [37978, 4],


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2]👈


    [38131, 3],


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  45. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    45
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4]👈


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3]👈


    [39191, 1],


    [39283, 3],


    [39285, 3],


    [39334, 1],




    View Slide

  46. inner_product implementation


    move the smaller cursor


    matched, move both cursor
    ͓ΘΓ
    46
    [[4, 8],


    [2111, 2],


    [27549, 4],


    [37497, 2],


    [37501, 2],


    [37667, 4],


    [37976, 2],


    [37978, 4]👈


    [37980, 2],


    [38128, 1],


    [38230, 4],


    [38232, 2],


    [39652, 3],


    [39728, 1],


    [39762, 2],


    [39763, 2],


    [40942, 4],




    [[4, 9],


    [1114, 2],


    [2111, 2],


    [25254, 2],


    [27549, 4],


    [36161, 2],


    [36356, 1],


    [37497, 3],


    [37515, 3],


    [37628, 3],


    [37632, 4],


    [37976, 2],


    [38131, 3],


    [39191, 1],


    [39283, 3]👈


    [39285, 3],


    [39334, 1],




    View Slide

  47. inner_product implementation


    intersection of two sets


    explained it in my book.
    ͦ͏͍͑͹લʹઆ໌ͨ͜͠ͱ͋ͬͨ
    47
    Start
    Word
    def
    initialize
    Line number
    3 7 8 13 16
    2 3 12 13
    fwd([‘initialize’, fname, 3)
    Word
    def
    initialize
    Line number
    3 7 8 13 16
    2 3 12 13
    Forward Both
    Word
    def
    initialize
    Line number
    3 7 8 13 16
    2 3 12 13
    fwd([‘def’, fname, 12])
    Word
    def
    initialize
    Line number
    3 7 8 13 16
    2 3 12 13
    fwd([‘initialize’, fname, 13])
    Word
    def
    initialize
    Line number
    3 7 8 13 16
    2 3 12 13

    View Slide

  48. inner_product implementation


    TF-IDF


    Integer
    idf͸ೋ৐͔͠࢖ͬͯͳ͍ͷͰɺͦͬͪΛϝϞ͓ͯ͘͠΂͖ͩͬͨ
    48
    idf = @idf[a[ia][0]]


    s += (a[ia][1] * b[ib][1] * idf * idf)
    TF

    View Slide

  49. cos implementation


    @norm : deck-code → norm-of-vector


    do not use unit vectors
    ը૾ॲཧͳͲͰ͸ϕΫτϧΛ୯ҐϕΫτϧʹ͓ͯ͘͜͠ͱ͕ଟ͍
    49
    def cos(a, b)


    left = @deck[a]


    right = @deck[b]


    dot(left, right) / (@norm[a] * @norm[b])


    end

    View Slide

  50. Basic energy card
    Basic energy card problem
    - more than 4 copies of in their decks


    - affects similarity
    s += (a[ia][1].clamp(..5) * b[ib][1].clamp(..5) * idf * idf)

    View Slide

  51. Deck similarity
    We can calculate deck similarity
    51

    View Slide

  52. Search


    ্Ґn݅Λฦ͢max(n)Λ_ko1ʹڭ͑ͯ΋Βͬͨ! RubyͳΜͰ΋͋Δͳʔɻ
    52
    max(n) by deck similarity
    def search(v, n=5)


    norm = vec_to_norm(v)


    return [] if norm == 0


    @deck.map do |b, deck_b|


    cos = dot(v, deck_b) / (norm * @norm[b])


    [cos, b]


    end.max(n)


    end

    View Slide

  53. Search by deck, card-id


    Search by deck code


    Search by card-id
    53
    search(@deck[code])
    search([[card_id, 1]])

    View Slide

  54. Agenda
    About me, ruby and Pokémon TCG


    Pokémon TCG


    deck similarity


    whole system
    54

    View Slide

  55. R.I.P. Heroku Free Dyno


    Dyno


    VM 512MB/1core (Free)


    Shut down at least once every 24 hours


    Heroku Postgres: Hobby Basic


    Heroku Scheduler
    @awazekiʢ৘ใσβΠϯʣͷ੡඼Ͱͨ͘͞Μ࢖ͬͯΔʂಢ໦ݝ໼൘ࢢͰ1൪ͷHeroku user
    55

    View Slide

  56. R.I.P. Heroku Free Dyno


    Dyno


    VM 512MB/1core (Free)


    Shut down at least once every 24 hours


    Heroku Postgres: Hobby Basic


    Heroku Scheduler
    Don't measure! Feel.
    56

    View Slide

  57. R.I.P. Heroku Free Dyno


    Dyno


    VM 512MB/1core (Free)


    Shut down at least once every 24 hours


    Heroku Postgres: Hobby Basic


    Heroku Scheduler
    57

    View Slide

  58. R.I.P. Heroku Free Dyno


    Dyno


    VM 512MB/1core (Free)


    Shut down at least once every 24 hours


    Heroku Postgres: Hobby Basic


    Heroku Scheduler
    ͍ͭ΋ͳΒdRubyΛ࢖͏ہ໘
    58

    View Slide

  59. System overview


    59
    Heroku App Web Browser
    deck

    similarity
    Heroku Scheduler
    Crowler Search Engine

    View Slide

  60. Data


    60
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  61. Data


    61
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  62. Data


    62
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  63. Data


    63
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  64. Data


    64
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  65. Data


    65
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  66. Data


    66
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  67. Data


    67
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  68. Data


    68
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  69. if ...


    Heroku࢖ͬͯͳ͔ͬͨΒ...
    69
    card normalize map


    data/uniq_*.txt
    new deck
    metadata
    known deck


    S3
    Heroku Scheduler
    Heroku PG
    My MacBook
    Heroku App
    deck

    similarity
    Search
    Initialize
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Update Deck
    Build page

    View Slide

  70. without Heroku


    Heroku࢖ͬͯͳ͔ͬͨΒ...όοΫΞοϓҎ֎ΠϯϝϞϦ
    70
    card normalize map


    data/uniq_*.txt
    known deck


    S3
    cron
    My MacBook
    App
    deck

    similarity
    Search
    Crowler
    Web UI
    @deck


    @idf


    @norm
    @name


    @id_norm
    Make Vector
    Build page
    @meta
    Update Deck
    dRuby

    View Slide

  71. Create my own search engine.


    ࣗ෼ͷͨΊͷݕࡧΤϯδϯΛ࡞Δ࿩Λͨ͠Αʂ
    71
    Ruby Pokémon TCG
    1996 ruby-1.0 Pokémon Red/Blue, Pokémon TCG
    1999 ruby-1.4.0 ERB, dRuby
    2006 RubyKaigi 2006 @m_seki started Pokémon TCG
    2010 WCS Tochigi pref. winner
    2022 RubyKaigi 2022
    2023 WCS Yokohama

    View Slide

  72. Create your own search engine.


    ͓ΘΓ
    72
    Ruby Pokémon TCG
    1996 ruby-1.0 Pokémon Red/Blue, Pokémon TCG
    1999 ruby-1.4.0 ERB, dRuby
    2006 RubyKaigi 2006 @m_seki started Pokémon TCG
    2010 WCS Tochigi pref. winner
    2022 RubyKaigi 2022
    2023 Create your own search engine WCS Yokohama

    View Slide