Upgrade to Pro — share decks privately, control downloads, hide ads and more …

軽量なインデックス機構を用いた全文検索ツールの高速化の検討/wsa6_sifter

 軽量なインデックス機構を用いた全文検索ツールの高速化の検討/wsa6_sifter

2020.04.26 Web System Architecture 研究会 (WSA研) #6
https://websystemarchitecture.hatenablog.jp/entry/2019/12/11/165624

monochromegane

April 26, 2020
Tweet

More Decks by monochromegane

Other Decks in Programming

Transcript

  1. ࡾ୐༔հ / Pepabo R&D Institute, GMO Pepabo, Inc.
    2020.04.26 Web System Architecture ݚڀձ (WSAݚ) #6
    ܰྔͳΠϯσοΫεػߏΛ༻͍ͨ
    શจݕࡧπʔϧͷߴ଎Խͷݕ౼

    View full-size slide

  2. 1SJODJQBMFOHJOFFS
    :VTVLF.*:",&!NPOPDISPNFHBOF
    1FQBCP3%*OTUJUVUF (.01FQBCP *OD
    IUUQTCMPHNPOPDISPNFHBOFDPN

    View full-size slide

  3. 1. ͸͡Ίʹ
    2. શจݕࡧπʔϧͷߴ଎Խͷ՝୊
    3. ܰྔͳΠϯσοΫεػߏΛ༻͍ͨશจݕࡧπʔϧͷߴ଎
    Խͷݕ౼
    4. ධՁ
    5. ·ͱΊ
    3
    ໨࣍

    View full-size slide

  4. • ίϚϯυϥΠϯϕʔεͰ΋ར༻Ͱ͖Δܰྔɾߴ଎ɾ൚༻ͳΠϯσοΫεػߏͷ
    ࣮ݱ͸Մೳ͔ʁ
    5
    ຊݚڀͷʮ໰͍ʯ

    View full-size slide

  5. • ίϚϯυϥΠϯϕʔεͰ΋ར༻Ͱ͖Δܰྔɾߴ଎ɾ൚༻ͳΠϯσοΫεػߏͷ
    ࣮ݱ͸Մೳ͔ʁ
    • ຊใࠂͰ͸ɺίϚϯυϥΠϯͱͯ͠શจݕࡧπʔϧΛ૝ఆ͠ɺ্هͷΠϯ
    σοΫεػߏͷ۩ମతͳ࣮ݱΛݕ౼͢Δɻ
    6
    ຊݚڀͷʮ໰͍ʯ

    View full-size slide

  6. • ίϚϯυϥΠϯϕʔεͰ΋ར༻Ͱ͖Δܰྔɾߴ଎ɾ൚༻ͳΠϯσοΫεػߏͷ
    ࣮ݱ͸Մೳ͔ʁ
    • ຊใࠂͰ͸ɺίϚϯυϥΠϯͱͯ͠શจݕࡧπʔϧΛ૝ఆ͠ɺ্هͷΠϯ
    σοΫεػߏͷ۩ମతͳ࣮ݱΛݕ౼͢Δɻ
    • ·ͨɺ্هͷΠϯσοΫεػߏͱͷ૊Έ߹ΘͤʹΑΓɺશจݕࡧπʔϧͷ
    ༗༻ੑ͕޲্͢Δ͜ͱΛ֬ೝ͢Δɻ
    7
    ຊݚڀͷʮ໰͍ʯ

    View full-size slide

  7. 2.
    શจݕࡧπʔϧͷߴ଎Խͷ՝୊

    View full-size slide

  8. • ͻͱͭɺ͋Δ͍͸ෳ਺ͷςΩετϑΝΠϧ͔Βࢦఆͨ͠จࣈྻΛݕࡧ͢ΔίϚ
    ϯυϥΠϯπʔϧ
    • grep, ag, pt etc…
    • ϓϩδΣΫτ഑Լͷιʔείʔυݕࡧʹར༻͞ΕΔ
    • ଟ༷ͳΦϓγϣϯʹΑΔࠩҟԽ
    • ݁Ռͷ৭෇͚ɺલޙͷߦͷදࣔɺgitignoreͷߟྀɺจࣈίʔυରԠͳͲ
    • ओཁͳࠩҟԽͷཁҼ͸ʮݕࡧ଎౓ʯ
    9
    શจݕࡧπʔϧ

    View full-size slide

  9. • ࠶ؼతͳશจݕࡧ͸ʮfindʯʮgrepʯʮprintʯͷཁૉ͔Β੒Δ
    • ֤ཁૉͰߴ଎Խͷָ͠Έ͕͋Δ[1]
    • find: readdirentʹΑΔstatγεςϜίʔϧͷ࡟ݮɺฒྻԽ
    • grep: ߦ୯ҐͰ͸ͳ͘ݻఆ௕Ͱͷݕࡧͱ෮ݩɺSIMDɺޮ཰తͳΞϧΰϦζ
    ϜɺฒྻԽɺʢOSͷϑΝΠϧΩϟογϡͷԸܙ΋େ͖͍ͱ͜Ζʣ
    • print: όοϑΝϦϯάɺલஈͷॲཧͷϘτϧωοΫʹͳΔ͜ͱΛճආ
    • ฒྻԽ਺ΛؚΊɺܭࢉࢿݯΛޮ཰Α͘࠷େݶʹར༻͢Δ [2]
    10
    શจݕࡧπʔϧͷߴ଎Խ
    <>:VTVLF.JZBLF 0QUJNJ[BUJPOGPS/VNCFSPGHPSPVUJOFT6TJOH'FFECBDL$POUSPM (PQIFS$PO.BSSJPUU.BSRVJT4BO%JFHP.BSJOB $BMJGPSOJB +VMZ
    <>:VTVLF.JZBLF UIF@QMBUJOVN@TFBSDIFS IUUQTHJUIVCDPNNPOPDISPNFHBOFUIF@QMBUJOVN@TFBSDIFS

    View full-size slide

  10. • ըظతͳΞϧΰϦζϜͰ͸ͳ͘஍ಓͳߴ଎Խͷ౒ྗͷੵΈॏͶ
    • ੑೳ޲্͸಄ଧͪͷ܏޲
    • ݕࡧର৅ͷιʔείʔυ͸ৗʹมԽ͠͏Δ͜ͱ͔Βɺ
    ౎౓ɺશϑΝΠϧͷશจΛݕࡧ͢Δඞཁ͕͋ΔͨΊ
    11
    શจݕࡧπʔϧͷߴ଎Խͷ՝୊

    View full-size slide

  11. • ըظతͳΞϧΰϦζϜͰ͸ͳ͘஍ಓͳߴ଎Խͷ౒ྗͷੵΈॏͶ
    • ੑೳ޲্͸಄ଧͪͷ܏޲
    • ݕࡧର৅ͷιʔείʔυ͸ৗʹมԽ͠͏Δ͜ͱ͔Βɺ
    ౎౓ɺશϑΝΠϧͷશจΛݕࡧ͢Δඞཁ͕͋ΔͨΊ
    12
    શจݕࡧπʔϧͷߴ଎Խͷ՝୊
    • Մೳੑͷ͋ΔϑΝΠϧ͔ΒͷΈɺશจΛݕࡧ͢Ε͹ߴ଎Խ͕ظ଴Ͱ͖Δ
    → ΠϯσοΫεΛ࢖ͬͨΞϓϩʔνΛݕ౼

    View full-size slide

  12. • ϓϩάϥϛϯάݴޠͷʮΦϒδΣΫτʢؔ਺΍ߏ଄ମͳͲʣʯͷΠϯσοΫε
    Λੜ੒͢Δʢ࣮͸΄ͱΜͲ࢖͍ͬͯͳ͍ɻˎཁαʔϕΠʣ
    • ΦϒδΣΫτΛʮλάʯͱͯ͠ɺ͜ΕΛఆ͍ٛͯ͠ΔϑΝΠϧ໊Λؔ࿈͚ͮΔ
    • λάϑΝΠϧͷϑΥʔϚοτ͸͍ΘΏΔసஔΠϯσοΫεͷܗࣜ
    • λά໊ɺϑΝΠϧ໊΋ؚΊͨςΩετܗࣜͰ͋ΓαΠζ͕૿Ճ͠΍͍͢
    • λάϑΝΠϧͷϩʔυʹ͕͔͔࣌ؒΔΑ͏ʹͳΔ
    • ιʔείʔυݕࡧ͸ΦϒδΣΫτҎ֎΋ର৅ͱͳΓ͏Δ
    • ίϝϯτ΍ΤϥʔϝοηʔδͰݕࡧ͍ͨ͠ɺͳͲ
    13
    ίϚϯυϥΠϯπʔϧͷΠϯσοΫεʢctagsʣ

    View full-size slide

  13. • ͋Δ༻ޠͱɺͦͷ༻ޠ͕ग़ݱ͢ΔจॻIDͷϦετ͔ΒͳΔࣙॻ
    • ग़ݱස౓΍ग़ݱҐஔͷ؅ཧ͕Մೳ
    • ڞ௨ू߹ʹର͢ΔΫΤϦ΋ಘҙ
    • ༻ޠ਺ɺจॻ਺ʹൺྫͯ͠ΠϯσοΫεͷαΠζ͕େ͖͘ͳΔ
    • ͨͩ͠ɺѹॖͷखཱͯ͸ଟ਺͋Γͦ͏[3]ʢˎཁαʔϕΠʣ
    14
    શจݕࡧΤϯδϯͷߴ଎ԽʢసஔΠϯσοΫεʣ
    <>$ISJTUPQIFS%.BOOJOH 1SBCIBLBS3BHIBWBO )JOSJDI4DIVU[F ؠ໺࿨ੜ ࠇ઒ར໌ ᖛా੣࢘ ଜ্໌ࢠ ৘ใݕࡧͷجૅ ڞཱग़൛

    View full-size slide

  14. • ू߹ͷதʹ೚ҙͷཁૉؚ͕·ΕΔ͔Λ໰͍߹ΘͤΔ֬཰తσʔλߏ଄
    • ϑΟϧλͷαΠζ͕༻ޠ਺ʹґଘ͠ͳ͍
    • ཁૉͷ௥Ճɺཁૉͷ໰͍߹Θͤ΋ݻఆ࣌ؒͰ͢Ή
    • ͨͩ͠ɺཁૉͷ໰͍߹Θͤʹfalse positive͕ൃੜ͢Δ
    • จॻ͝ͱʹϒϧʔϜϑΟϧλΛ࡞੒͠ɺ͜ͷू߹͔Β༻ޠؚ͕·ΕΔจॻΛݕ
    ࡧ͢Δ
    • ͜ͷεʔύʔվળ൛͕BingͷݕࡧΤϯδϯʹ࢖ΘΕͨʢBitFunnelʣ[4][5]
    15
    શจݕࡧΤϯδϯͷߴ଎ԽʢϒϧʔϜϑΟϧλʣ
    <>#JOHݕࡧͷཪଆʕ#JU'VOOFMͷΞϧΰϦζϜ IUUQTEFWFMPQFSIBUFOBTUB⒎DPNFOUSZ
    <>#PC(PPEXJO .JDIBFM)PQDSPGU %BO-VV "MFY$MFNNFS .JIBFMB$VSNFJ 4BNFI&MOJLFUZ BOE:VYJPOH)F#JU'VOOFM3FWJTJUJOH4JHOBUVSFTGPS
    4FBSDI*O1SPDFFEJOHTPGUIFUI*OUFSOBUJPOBM"$.4*(*3$POGFSFODFPO3FTFBSDIBOE%FWFMPQNFOUJO*OGPSNBUJPO3FUSJFWBM 4*(*3`
    "TTPDJBUJPOGPS
    $PNQVUJOH.BDIJOFSZ /FX:PSL /: 64" r%0*IUUQTEPJPSH

    View full-size slide

  15. • ͻͱͭͷϒϧʔϜϑΟϧλ͸ Ϗοτͷ഑ྻ͔Β੒Δ
    m
    16
    ϒϧʔϜϑΟϧλ
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
    Bloom filter(m = 10)

    View full-size slide

  16. • ͻͱͭͷϒϧʔϜϑΟϧλ͸ Ϗοτͷ഑ྻ͔Β੒Δ
    • ཁૉ͸ ݸͷϋογϡؔ਺͔ΒಘΒΕΔ഑ྻͷఴࣈҐஔͷू߹ʹม׵͞ΕΔ
    m
    k
    17
    ϒϧʔϜϑΟϧλʢཁૉͷ௥Ճʣ
    H1
    (element1
    ) = 0
    H2
    (element1
    ) = 9
    element1
    Hash function(k = 2)

    View full-size slide

  17. • ͻͱͭͷϒϧʔϜϑΟϧλ͸ Ϗοτͷ഑ྻ͔Β੒Δ
    • ཁૉ͸ ݸͷϋογϡؔ਺͔ΒಘΒΕΔ഑ྻͷఴࣈҐஔͷू߹ʹม׵͞ΕΔ
    • ू߹͸શཁૉͷ഑ྻͷఴࣈͷ࿨ू߹Λ1ͱ͢Δ഑ྻͱͯ͠දݱ͞ΕΔ
    m
    k
    18
    ϒϧʔϜϑΟϧλʢཁૉͷ௥Ճʣ
    1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
    H1
    (element1
    ) = 0
    H2
    (element1
    ) = 9
    element1
    Bloom filter(m = 10)
    Hash function(k = 2)

    View full-size slide

  18. • ͻͱͭͷϒϧʔϜϑΟϧλ͸ Ϗοτͷ഑ྻ͔Β੒Δ
    • ཁૉ͸ ݸͷϋογϡؔ਺͔ΒಘΒΕΔ഑ྻͷఴࣈҐஔͷू߹ʹม׵͞ΕΔ
    • ू߹͸શཁૉͷ഑ྻͷఴࣈͷ࿨ू߹Λ1ͱ͢Δ഑ྻͱͯ͠දݱ͞ΕΔ
    m
    k
    19
    ϒϧʔϜϑΟϧλʢཁૉͷ௥Ճʣ
    1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1
    H1
    (element2
    ) = 1
    H2
    (element2
    ) = 9
    element2
    Bloom filter(m = 10)
    Hash function(k = 2)

    View full-size slide

  19. • ໰͍߹ΘͤΔཁૉʹରͯ͠kݸͷϋογϡؔ਺͔ΒಘΒΕͨఴࣈҐஔΛ࢖͏
    • ҰͭͰ΋0͕͋Ε͹ʮઈରʹʯؚ·Εͳ͍
    20
    ϒϧʔϜϑΟϧλʢཁૉͷ໰͍߹Θͤʣ
    1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1
    H1
    (element3
    ) = 1
    H2
    (element3
    ) = 8
    element3
    Bloom filter(m = 10)
    Hash function(k = 2)

    View full-size slide

  20. • ໰͍߹ΘͤΔཁૉʹରͯ͠kݸͷϋογϡؔ਺͔ΒಘΒΕͨఴࣈҐஔΛ࢖͏
    • ҰͭͰ΋0͕͋Ε͹ʮઈରʹʯؚ·Εͳ͍
    • શͯ1ʹͳ͍ͬͯΕ͹ʮɹɹʯؚ·ΕΔ
    21
    ϒϧʔϜϑΟϧλʢཁૉͷ໰͍߹Θͤʣ
    1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1
    H1
    (element1
    ) = 0
    H2
    (element1
    ) = 9
    element1
    Bloom filter(m = 10)
    Hash function(k = 2)

    View full-size slide

  21. • ໰͍߹ΘͤΔཁૉʹରͯ͠kݸͷϋογϡؔ਺͔ΒಘΒΕͨఴࣈҐஔΛ࢖͏
    • ҰͭͰ΋0͕͋Ε͹ʮઈରʹʯؚ·Εͳ͍
    • શͯ1ʹͳ͍ͬͯΕ͹ʮଟ෼ʯؚ·ΕΔ
    22
    ϒϧʔϜϑΟϧλʢཁૉͷ໰͍߹Θͤʣ
    1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1
    H1
    (element4
    ) = 0
    H2
    (element4
    ) = 1
    element4
    Bloom filter(m = 10)
    Hash function(k = 2)

    View full-size slide

  22. 3.
    ܰྔͳΠϯσοΫεػߏΛ༻͍ͨ
    શจݕࡧπʔϧͷߴ଎Խͷݕ౼

    View full-size slide

  23. • ܰྔ
    • ΠϯσοΫεͷαΠζ͕খ͍͞΄ͲಡΈࠐΈʢ໰͍߹Θͤͷىಈʣ͕଎͍
    • ߴ଎
    • ߏங࣌: ߴ଎ʹΠϯσοΫεߏங͕Ͱ͖Ε͹ݕࡧର৅΁ෛ୲ͳ͘௥ै
    • ݕࡧ࣌: ߴ଎ʹ໰͍߹Θ͕ͤͰ͖Ε͹શจݕࡧશମͷ࣮࣌ؒΛ୹ॖ
    • ൚༻
    • ಛఆͷπʔϧʹґଘͤͣɺ૊Έ߹Θͤͯར༻Մೳʹ͢Δ͜ͱͰ༗༻ੑ͕޲্
    24
    શจݕࡧπʔϧʹ͓͚ΔΠϯσοΫεػߏͷཁ݅

    View full-size slide

  24. • ܰྔ / ߴ଎
    • ϒϧʔϜϑΟϧλΛ༻͍Δ
    • αΠζ͕༻ޠ਺ʹґଘͤͣɺ໰͍߹Θ͕ͤݻఆ࣌ؒͰࡁΉಛੑΛར༻
    • ൚༻
    • ΠϯσοΫεΛݕࡧ͠ɺ֘౰͢ΔΩʔϫʔυؚ͕·ΕΔϑΝΠϧҰཡΛฦ͢
    ίϚϯυΛఏڙ͢Δ
    • ೚ҙͷશจݕࡧπʔϧ͸ҰཡΛArgsͱͯ͠શจݕࡧΛߦ͏
    • ِཅੑʹΑΔޡݕग़͸શจݕࡧπʔϧʹΑͬͯϑΟϧλ͞ΕΔ
    25
    શจݕࡧπʔϧʹ͓͚ΔΠϯσοΫεػߏͷݕ౼

    View full-size slide

  25. • A lightweight index for full text search tools using bloom filter. [6]
    • ఏҊख๏ͷGo࣮૷ʢWIPʣ
    • “sifter"͸ྉཧ༻ͷ;Δ͍ɺͱ͔ɺબΓ෼͚Δਓɺͷҙ
    26
    monochromegane/sifter
    <>NPOPDISPNFHBOFTJGUFS IUUQTHJUIVCDPNNPOPDISPNFHBOFTJGUFS

    View full-size slide

  26. • σΟϨΫτϦ഑ԼͷςΩετϑΝΠϧʹରͯͦ͠ΕͧΕϒϧʔϜϑΟϧλΛੜ
    ੒͢ΔʢϑΝΠϧ਺ * m bitʣ
    • ࣄલʹݕࡧΩʔϫʔυ͕ෆ໌ɺ͔ͭτʔΫϯԽ͕೉͍͠೔ຊޠ΋ؚ·ΕΔ͜ͱ
    ͔Βɺn-gramΛ࠾༻ͨ͠ʢ࠷େ3-gramʣ
    27
    monochromegane/sifter
    $ sifter -m 5 -k 3 build
    1, 1, 0, 0, 0
    func init() {
    fun
    unc
    nc_
    c_i
    Hk

    View full-size slide

  27. • ໰͍߹Θͤ࣌ʹɺશͯͷϒϧʔϜϑΟϧλΛಡΈࠐΉඞཁ͕͋ΔͨΊɺBit-
    sliced signatureԽ͢Δ͜ͱͰಡΈࠐΉσʔλΛ࡟ݮ͠ɺߴ଎Խ͢Δ
    28
    monochromegane/sifter
    $ sifter -m 5 -k 3 build
    1, 1, 0, 0, 0
    1, 0, 1, 0, 0
    1, 0, 0, 1, 0
    H1
    = 4
    & 00001
    & 00001
    & 00001
    શͯͷϒϧʔϜϑΟϧλʹ
    ରͯ͠໰͍߹Θ͕ͤൃੜ

    View full-size slide

  28. • ໰͍߹Θͤ࣌ʹɺશͯͷϒϧʔϜϑΟϧλΛಡΈࠐΉඞཁ͕͋ΔͨΊɺBit-
    sliced signatureԽ͢Δ͜ͱͰಡΈࠐΉσʔλΛ࡟ݮ͠ɺߴ଎Խ͢Δ
    29
    monochromegane/sifter
    $ sifter -m 5 -k 3 build
    1, 1, 0, 0, 0
    1, 0, 1, 0, 0
    1, 0, 0, 1, 0
    H1
    = 4
    & 00001
    & 00001
    & 00001
    ࣮࣭ɺఴࣈͷ෦෼͔͠࢖ͬ
    ͯͳͦ͞͏

    View full-size slide

  29. • ໰͍߹Θͤ࣌ʹɺશͯͷϒϧʔϜϑΟϧλΛಡΈࠐΉඞཁ͕͋ΔͨΊɺBit-
    sliced signatureԽ͢Δ͜ͱͰಡΈࠐΉσʔλΛ࡟ݮ͠ɺߴ଎Խ͢Δ
    30
    monochromegane/sifter
    $ sifter -m 5 -k 3 build
    1, 1, 0, 0, 0
    1, 0, 1, 0, 0
    1, 0, 0, 1, 0
    1, 1, 1
    1, 0, 0
    0, 1, 0
    0, 0, 1
    0, 0, 0
    H1
    = 4
    ֘౰͢ΔఴࣈͷΈΛूΊΔ
    ʢϒϧʔϜϑΟϧλͷू߹
    ΛߦྻͱݟΔͱసஔͨ͠ܗ
    ʹ૬౰ʣ
    ∣ F ∣ m m ∣ F ∣

    View full-size slide

  30. • ໰͍߹Θͤ࣌ʹɺશͯͷϒϧʔϜϑΟϧλΛಡΈࠐΉඞཁ͕͋ΔͨΊɺBit-
    sliced signatureԽ͢Δ͜ͱͰಡΈࠐΉσʔλΛ࡟ݮ͠ɺߴ଎Խ͢Δ
    31
    monochromegane/sifter
    $ sifter -m 5 -k 3 build
    1, 1, 0, 0, 0
    1, 0, 1, 0, 0
    1, 0, 0, 1, 0
    1, 1, 1
    1, 0, 0
    0, 1, 0
    0, 0, 1
    0, 0, 0 & 111
    H1
    = 4
    ֘౰͢ΔఴࣈͷΈΛूΊͨ
    ෦෼͚ͩʹ໰͍߹ΘͤΕ͹
    ྑ͍

    View full-size slide

  31. • ໰͍߹Θͤ࣌͸ύλʔϯจࣈྻΛ3-gramԽ͠ɺͦΕͧΕͷϋογϡؔ਺͔Β
    ಘΒΕͨఴࣈͷ࿨ू߹Λ΋ͬͯ໰͍߹ΘͤΛߦ͏
    32
    monochromegane/sifter
    $ sifter -m 5 -k 3 find PATTERN
    1, 1, 1
    1, 0, 0
    0, 1, 0
    0, 0, 1
    0, 0, 0
    & 111
    1, 1, 0, 0, 0
    PATTERN
    PAT
    ATT
    TER
    ERN
    Hk
    & 111
    ൪໨ͷϑΝΠϧʹ͸
    ʮଟ෼ʯؚ·ΕͯΔ
    ൪໨ͷϑΝΠϧʹରԠ
    ͢ΔϑΝΠϧ໊Λग़ྗ
    ⭕❌ ❌

    View full-size slide

  32. • શจݕࡧπʔϧ͸sifterʹΑͬͯߜΓࠐ·ΕͨީิͷΈ͔ΒશจݕࡧΛߦ͏
    • ِཅੑʹΑΔޡݕग़͸શจݕࡧπʔϧʹΑͬͯϑΟϧλ͞ΕΔ
    • ِӄੑʹΑΔޡݕग़͸ൃੜ͠ͳ͍ͷͰݕࡧ࿙Ε͸ൃੜ͠ͳ͍
    33
    monochromegane/sifter
    $ pt PATTERN `sifter -m 5 -k 3 find PATTERN`

    View full-size slide

  33. 35
    ධՁ
    • ൚༻
    • શจݕࡧπʔϧͱ૊Έ߹Θͤͨॲཧ࣌ؒͷ୹ॖ
    • ܰྔ
    • ΠϯσοΫεͷαΠζ
    • ߴ଎
    • ໰͍߹Θͤͷ࣌ؒ
    • ΠϯσοΫεߏஙͷ࣌ؒ

    View full-size slide

  34. • CentOS Linux release 8.1.1911 (Core) on Vagrant
    • CPU: 4, Memory: 5,120MB
    • https://github.com/torvalds/linux (c578ddb)
    • ૯ϑΝΠϧ਺: 67,947
    • ϒϧʔϜϑΟϧλ( )
    k = 3, m = 10,000
    36
    ධՁ؀ڥ

    View full-size slide

  35. • ݕࡧΩʔϫʔυ: ‘GPL-2.0-or-later' (8,168/67,947 = ໿12%)
    37
    ධՁ: શจݕࡧπʔϧͱͷ૊Έ߹Θͤ
    Ωϟογϡͳ͠ ඵ
    Ωϟογϡ͋Γ ඵ

    HSFQ
    HSFQTJGUFS
    QU
    QUTJGUFS
    ఏҊख๏ʹΑΔݕࡧର৅ͷࣄલߜΓࠐΈ
    ʹΑͬͯɺTJGUFSͷ࣮ߦ࣌ؒΛࠩ͠Ҿ͍ͯ
    ΋ɺશମͱͯ͠େ෯ͳݕࡧ଎౓ͷվળ͕
    ֬ೝͰ͖ͨɻͳ͓ɺTJGUFS͸ ݅ͷީ
    ิΛTTͰฦ͍ͯ͠Δɻ

    View full-size slide

  36. • ݕࡧΩʔϫʔυ: ‘GPL-2.0-or-later' (8,168/67,947 = ໿12%)
    38
    ධՁ: શจݕࡧπʔϧͱͷ૊Έ߹Θͤ
    Ωϟογϡͳ͠ ඵ
    Ωϟογϡ͋Γ ඵ

    HSFQ
    HSFQTJGUFS
    QU
    QUTJGUFS
    Ωϟογϡͳ͠ ඵ
    Ωϟογϡ͋Γ ඵ

    HSFQ
    HSFQTJGUFS
    QU
    QUTJGUFS
    • ݕࡧΩʔϫʔυ: ‘#define BYT_RT5640_MAP(quirk)' (2/67,947 = ໿0.003%)
    ఏҊख๏ʹΑΔࣄલͷߜΓࠐΈͷޮՌ͕ߴ
    ͍৔߹ʹ͸ɺΑΓݦஶͳ࣮ߦ࣌ؒͷ୹ॖ͕
    ֬ೝ͞ΕͨʢTJGUFS݅TTʣ
    ͳ͓ɺૉͷQUͷվળ͸ύλʔϯʹ߹க͠ͳ
    ͚Ε͹ਫ਼ࠪ͠ͳ͍࣮૷ͷ޻෉ʹΑΔ

    View full-size slide

  37. • 67,947bit=8,494byte*10,000=84.94MB
    • du -h linux 1.2G
    • શମͱͯ͠΋ϦϙδτϦͷαΠζͱൺֱͯ͠े෼ʹখ͍͞
    • ໰͍߹Θͤ࣌ʹ͸ k*8,494byte ͷΈͷಡΈࠐΈͰࡁΉ
    39
    ධՁ: ΠϯσοΫεͷαΠζ

    View full-size slide

  38. • ݱࡏɺ1.2G ͷϦϙδτϦʹରͯ͠20෼ఔ౓͔͔Δ͜ͱ͔Βվળ͕ඞཁ…
    • ϘτϧωοΫ͸ϋογϡؔ਺ [7][8]
    • 1จࣈʹରͯ͠{1,2,3}-gram*k(3)ճͷϋογϡؔ਺͕࣮ߦ͞ΕΔ (=0.01ms)
    • 98KbͷϑΝΠϧͰ͓͓Αͦ1s͔͔Δܭࢉ
    • ΠϯσοΫεߏஙͷߴ଎Խʹ޲͚ͯɺϋογϡ݁ՌͷΩϟογϡɺߴ଎ͳ
    ϋογϡؔ਺[9]ͷద༻ɺޮ཰తͳτʔΫϯԽͷݕ౼ͳͲ͕ඞཁ
    gi
    (x) = h1
    (x) + ih2
    (x) mod m
    40
    ධՁ: ΠϯσοΫεͷߏங
    <>,JSTDI "EBN BOE.JDIBFM.JU[FONBDIFS-FTTIBTIJOH TBNFQFSGPSNBODFCVJMEJOHBCFUUFSCMPPNpMUFS&VSPQFBO4ZNQPTJVNPO"MHPSJUINT
    4QSJOHFS #FSMJO )FJEFMCFSH
    <>(PMBOHͰ#MPPN'JMUFSΛ࣮૷ͯ͠Έͨ IUUQTDJQFQTFSIBUFOBCMPHDPNFOUSZ
    <>.VSNVS)BTI IUUQTUBOKFOUMJWFKPVSOBMDPNIUNM

    View full-size slide

  39. • શจݕࡧπʔϧͰར༻Ͱ͖Δܰྔɾߴ଎ɾ൚༻ͳΠϯσοΫεػߏΛఏҊͨ͠
    • ϒϧʔϜϑΟϧλΛ࠾༻͢Δ͜ͱͰܰྔ͔ͭ໰͍߹Θͤͷߴ଎ԽΛ࣮ݱͨ͠
    • ީิͷΈΛฦ٫͢ΔผπʔϧΛఏڙ͢Δ͜ͱͰ൚༻ੑΛߴΊͨ
    • ҰํͰɺϋογϡؔ਺ͷ࣮ߦ͕࣌ؒϘτϧωοΫͱͳΓେن໛ͳϦϙδτϦʹ
    ର͢ΔΠϯσοΫεͷߏஙʹ͕͔͔࣌ؒΔͨΊࠓޙͷվળ͕ඞཁ
    • ࠓޙɺ໰͍߹ΘͤࣗମʹΦʔόϔου͕ൃੜ͢ΔΞʔΩςΫνϟ[10]ͱͷ࿈ܞ
    ΋ݕ౼͢Δ͜ͱͰWebγεςϜͷ෼໺΁ݚڀΛൃల͍ͤͨ͞
    42
    ·ͱΊ
    <>Ѩ෦ത ౡܚҰ ٶຊେี ؔ୩༐࢘ ੴݪ஌༸ Ԭా࿨໵ தଜྒྷ দӜ஌࢙ ࣰాཅҰ࣌ؒ࣠ݕࡧʹ࠷దԽͨ͠εέʔϧΞ΢τՄೳͳߴ଎ϩάݕࡧΤϯδϯͷ࣮ݱͱධ
    Ձ৘ใॲཧֶձ࿦จࢽ 7PM /P QQr

    View full-size slide