Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Goによる近似最近傍探索の実装/Approximate_nearest_neighbor_search_written_in_Golang.

 Goによる近似最近傍探索の実装/Approximate_nearest_neighbor_search_written_in_Golang.

monochromegane

April 24, 2017
Tweet

More Decks by monochromegane

Other Decks in Programming

Transcript

  1. ࡾ୐༔հ !NPOPDISPNFHBOF
    (.01FQBCP *OD
    'VLVPLBHP
    (PʹΑΔۙࣅ࠷ۙ๣୳ࡧͷ࣮૷

    View Slide

  2. ϓϦϯγύϧΤϯδχΞ
    ࡾ୐༔հ!NPOPDISPNFHBOF
    (.0ϖύϘגࣜձࣾϖύϘݚڀॴ
    IUUQCMPHNPOPDISPNFHBOFDPN

    View Slide

  3. ྨࣅը૾ݕࡧ

    View Slide

  4. ྨࣅը૾ݕࡧ

    View Slide

  5. Ұൠ෺ମೝࣝʁ

    View Slide

  6. Ұൠ෺ମೝࣝ
    -JV 8FJ FUBM44%4JOHMF4IPU.VMUJ#PY%FUFDUPSBS9JWQSFQSJOUBS9JW

    View Slide

  7. Ұൠ෺ମೝࣝ
    ը૾͔ΒϞϊͷಛ௃Λݟग़͢
    ݟग़ͨ͠ಛ௃ͱࣝผର৅ΛϚονϯά͢Δ

    View Slide

  8. %FFQ$//
    "MFY,SJ[IFWTLZBOE4VUTLFWFS *MZBBOE)JOUPO (FP⒎SFZ&*NBHF/FU$MBTTJpDBUJPOXJUI%FFQ
    $POWPMVUJPOBM/FVSBM/FUXPSLT

    View Slide

  9. Ұൠ෺ମೝࣝ
    ը૾͔ΒϞϊͷಛ௃Λݟग़͢
    ݟग़ͨ͠ಛ௃ͱࣝผର৅ΛϚονϯά͢Δ

    View Slide

  10. ಛ௃ྔ

    View Slide

  11. ߴ࣍ݩϕΫτϧͷۙ๣୳ࡧ
    wҰൠʹը૾ͷಛ௃ྔ͸ߴ࣍ݩͷϕΫτϧͱͯ͠දݱ͞ΕΔ
    wߴ࣍ݩͷϕΫτϧू߹͔ΒΫΤϦͱͳΔϕΫτϧͱڑ཭ͷ͍ۙز͔ͭͷ
    ϕΫτϧΛऔΓग़͢͜ͱΛۙ๣୳ࡧͱݺͿ
    wର৅ͷಛ௃ྔϕΫτϧ͕େྔʹ͋Δ৔߹ɺϕΫτϧؒͷڑ཭Λ౎౓શ݅
    ʹରͯ͠ܭࢉ͍ͯͯ͠͸͕͔͔࣌ؒΔͨΊɺπϦʔ΍ϋογϡΛࣄલʹ
    ߏங͢Δ͜ͱͰݕࡧͷߴ଎ԽΛਤΔ

    View Slide

  12. ۙࣅ࠷ۙ๣୳ࡧ

    View Slide

  13. ۙࣅۙ๣୳ࡧ
    wΫΤϦʹ͍ۙάϧʔϓʹਅͷ࠷ۙ๣఺ؚ͕·ΕΔ޻෉͸ܭࢉ࣌ؒΛ૿Ճ
    ͤ͞ɺ࣍ݩ਺͕ଟ͍৔߹͸ɺશ୳ࡧͱมΘΒͳ͘ͳΔ৔߹΋͋Δ
    w্هͷ੍໿Λ؇Ίͨ΋ͷ͕ۙࣅ࠷ۙ๣୳ࡧ
    wਫ਼౓ͱ଎౓ͷόϥϯεΛݟͯ࠾༻͢Δඞཁ͕͋Δ
    w໦ߏ଄Λ༻͍Δ"//ɺ3BOEPNJ[FELEUSFFɺ'-"//ͳͲ
    wϋογϡߏ଄Λ༻͍Δ-4)ͳͲ

    View Slide

  14. 4QPUJGZBOOPZ

    View Slide

  15. 4QPUJGZBOOPZ
    w "QQSPYJNBUF/FBSFTU
    /FJHICPST0I:FBI
    w $ϥΠϒϥϦͱ1ZUIPO౳ͷό
    ΠϯσΟϯάΛఏڙ
    w ϥϯμϜαϯϓϦϯάͨ͠ೋ఺Λ
    ݩʹۭؒΛೋ෼ׂΛ܁Γฦͯ͠໦
    ߏ଄Λෳ਺ߏங
    w ୳ࡧ࣌͸෼ׂϕΫτϧͰৼΓ෼͚

    View Slide

  16. 4QPUJGZBOOPZ
    f = 40
    t = AnnoyIndex(f)
    for i in xrange(1000):
    v = [random.gauss(0, 1) for z in xrange(f)]
    t.add_item(i, v)
    t.build(10)
    t.save('test.ann')
    # ...
    u = AnnoyIndex(f)
    u.load('test.ann')
    print(u.get_nns_by_item(0, 1000))

    View Slide

  17. ϖύϘݚڀॴ ུশʮϖύݚʯ
    ͸ɺࣄۀΛࠩผԽͰ
    ͖Δٕज़Λ࡞Γग़ͨ͢ΊʹʮͳΊΒ͔ͳγεςϜʯ
    ͱ͍͏ίϯηϓτͷԼͰݚڀ։ൃʹऔΓ૊Ή૊৫
    Ͱ͢ɻΞΧσϛοΫͳਫ४ʹ͓͚Δ৽نੑɾ༗ޮ
    ੑɾ৴པੑΛ௥ٻ͢ΔݚڀΛߦ͏ͱͱ΋ʹɺݚڀ
    ։ൃٕͨ͠ज़Λ࣮ࡍͷγεςϜͱ࣮ͯ͠૷ɾఏڙ
    ͢Δ͜ͱΛ௨ͯ͠ɺࣄۀͷ੒௕ʹߩݙ͠·͢ɻ
    ϖύݚ

    View Slide

  18. NPOPDISPNFHBOFNSVCZBOOPZ

    View Slide

  19. NSVCZBOOPZ
    wۙࣅ࠷ۙ๣୳ࡧΛαʔϏεʹಋೖ͢Δʹ͋ͨΓɺσʔ
    λͷҰݩԽͱෳ਺ͷΞϓϦέʔγϣϯαʔό͔ΒͷϦ
    ΫΤετΛॲཧͰ͖ΔΑ͏ʹ)551ϕʔεͷ"1*αʔ
    ό͕ඞཁ
    [email protected]ͱͯ͠ఏڙ
    mruby-annoy
    on ngx_mruby
    NNS

    View Slide

  20. [email protected]
    class NNS
    def call(env)
    params = env['QUERY_STRING'].split('&')
    .map {|kv| kv.split('=') }.to_h
    category_id = params['category_id'].to_i
    product_id = params['product_id'].to_i
    limit = (params['limit'] || 10).to_i
    userdata = Userdata.new "annoy_data_key"
    annoy = userdata.send("category_#{category_id}")
    return not_found unless annoy
    nns = annoy.get_nns_by_item(product_id, limit)
    [200, content_type, [nns.to_json]]
    end
    private
    def not_found
    return [404, content_type,
    [{'error' => 'not_found'}.to_json]]
    end
    def content_type
    {'Content-Type' => 'application/json;charset=utf-8'}
    end
    end
    run NNS.new

    View Slide

  21. 'VLVPLB\NM QZ SC^

    View Slide

  22. "//POTFSWJDF

    View Slide

  23. αʔϏεͰ"//Λ࢖͏
    "1*αʔόʔ
    ಈతΠϯσοΫε
    αʔϏεଆͷ*%ͱͷϚοϐϯά
    ߴ଎Խ

    View Slide

  24. NPOPDISPNFHBOFHBOOPZ

    View Slide

  25. HBOOPZ
    OFUIUUQʹΑΔ)551αʔόʔ
    ࠶࣮૷ͨ͠ϊʔυߏ଄ʹΑΔಈతΠϯσοΫε
    ࠷దԽ༻ͷ಺෦*%ͱαʔϏε*%ͷϚοϐϯάΛඪ४
    ఏڙ
    (PSPVUJOFγεςϜίʔϧΛར༻ͨ͠ߴ଎Խ

    View Slide

  26. HBOOPZ
    # Create database
    $ gannoy -d database -dim=2048 -tree=10
    # Start ANN server
    $ gannoy-server

    View Slide

  27. HBOOPZ
    # Add item
    $ curl \
    ’http://localhost:1323/databases/hoge/features/100’ \
    -H "Content-type: application/json” \
    -X PUT \
    -d '{"features": [1.0, 0.5, 0.2,..]}’
    # Search
    $ curl \
    ’http://localhost:1323/search?database=hoge&id=100’

    View Slide

  28. HBOOPZ(PSPVUJOF
    var wg sync.WaitGroup
    wg.Add(g.tree)
    buildChan := make(chan int, g.tree)
    worker := func(n Node) {
    for index := range buildChan {
    g.build(index, g.meta.roots()[index], n)
    wg.Done()
    }
    }
    for i := 0; i < 3; i++ {
    go worker(n)
    }
    for index, _ := range g.meta.roots() {
    buildChan }
    wg.Wait()
    close(buildChan)

    View Slide

  29. HBOOPZγεςϜίʔϧ
    err := syscall.FcntlFlock(f.file.Fd(), syscall.F_SETLKW,
    &syscall.Flock_t{
    Start: f.offset(index),
    Len: f.nodeSize(),
    Type: syscall.F_RDLCK,
    Whence: io.SeekStart,
    })
    if err != nil {
    fmt.Printf("fcntl error %v\n", err)
    }
    defer syscall.FcntlFlock(f.file.Fd(), syscall.F_SETLKW,
    &syscall.Flock_t{
    Start: f.offset(index),
    Len: f.nodeSize(),
    Type: syscall.F_UNLCK,
    Whence: io.SeekStart,
    })
    b := make([]byte, f.nodeSize())
    syscall.Pread(int(f.file.Fd()), b, f.offset(index))

    View Slide

  30. ·ͱΊ

    View Slide

  31. ·ͱΊ
    wۙࣅ࠷ۙ๣୳ࡧ͸͜Ε͔Βॏཁ
    w৭ʑͳ࣮૷͕͋Δ͚Ͳɺ଎౓ͱਫ਼౓͚ͩͰ͸ͳͯ͘ɺ
    αʔϏεͰͷ࢖͍΍͢͞΋ॏཁ
    w(PݴޠͱγεςϜίʔϧͰγϯϓϧݎ࿚ߴੑೳͳ
    ΞϓϦΛॻ͜͏
    wHBOOPZHBOOPZHBOOPZ

    View Slide

  32. 'VLVPLB
    \NM QZ SC TZTDBMM HP^

    View Slide

  33. ͓ΘΓ

    View Slide

  34. ܅΋ϖύϘͰಇ͔ͳ͍͔ʁ
    ࠷৽ͷ࠾༻৘ใΛνΣοΫˠ [email protected]

    View Slide