Goによる近似最近傍探索の実装/Approximate_nearest_neighbor_search_written_in_Golang.

 Goによる近似最近傍探索の実装/Approximate_nearest_neighbor_search_written_in_Golang.

Cd3d2cb2dadf5488935fe0ddaea7938a?s=128

monochromegane

April 24, 2017
Tweet

Transcript

  1. ࡾ୐༔հ !NPOPDISPNFHBOF (.01FQBCP *OD 'VLVPLBHP (PʹΑΔۙࣅ࠷ۙ๣୳ࡧͷ࣮૷

  2. ϓϦϯγύϧΤϯδχΞ ࡾ୐༔հ!NPOPDISPNFHBOF (.0ϖύϘגࣜձࣾϖύϘݚڀॴ IUUQCMPHNPOPDISPNFHBOFDPN

  3. ྨࣅը૾ݕࡧ

  4. ྨࣅը૾ݕࡧ

  5. Ұൠ෺ମೝࣝʁ

  6. Ұൠ෺ମೝࣝ -JV 8FJ FUBM44%4JOHMF4IPU.VMUJ#PY%FUFDUPSBS9JWQSFQSJOUBS9JW  

  7. Ұൠ෺ମೝࣝ  ը૾͔ΒϞϊͷಛ௃Λݟग़͢  ݟग़ͨ͠ಛ௃ͱࣝผର৅ΛϚονϯά͢Δ

  8. %FFQ$// "MFY,SJ[IFWTLZBOE4VUTLFWFS *MZBBOE)JOUPO (FP⒎SFZ&*NBHF/FU$MBTTJpDBUJPOXJUI%FFQ $POWPMVUJPOBM/FVSBM/FUXPSLT 

  9. Ұൠ෺ମೝࣝ  ը૾͔ΒϞϊͷಛ௃Λݟग़͢  ݟग़ͨ͠ಛ௃ͱࣝผର৅ΛϚονϯά͢Δ

  10. ಛ௃ྔ

  11. ߴ࣍ݩϕΫτϧͷۙ๣୳ࡧ wҰൠʹը૾ͷಛ௃ྔ͸ߴ࣍ݩͷϕΫτϧͱͯ͠දݱ͞ΕΔ wߴ࣍ݩͷϕΫτϧू߹͔ΒΫΤϦͱͳΔϕΫτϧͱڑ཭ͷ͍ۙز͔ͭͷ ϕΫτϧΛऔΓग़͢͜ͱΛۙ๣୳ࡧͱݺͿ wର৅ͷಛ௃ྔϕΫτϧ͕େྔʹ͋Δ৔߹ɺϕΫτϧؒͷڑ཭Λ౎౓શ݅ ʹରͯ͠ܭࢉ͍ͯͯ͠͸͕͔͔࣌ؒΔͨΊɺπϦʔ΍ϋογϡΛࣄલʹ ߏங͢Δ͜ͱͰݕࡧͷߴ଎ԽΛਤΔ

  12. ۙࣅ࠷ۙ๣୳ࡧ

  13. ۙࣅۙ๣୳ࡧ wΫΤϦʹ͍ۙάϧʔϓʹਅͷ࠷ۙ๣఺ؚ͕·ΕΔ޻෉͸ܭࢉ࣌ؒΛ૿Ճ ͤ͞ɺ࣍ݩ਺͕ଟ͍৔߹͸ɺશ୳ࡧͱมΘΒͳ͘ͳΔ৔߹΋͋Δ w্هͷ੍໿Λ؇Ίͨ΋ͷ͕ۙࣅ࠷ۙ๣୳ࡧ wਫ਼౓ͱ଎౓ͷόϥϯεΛݟͯ࠾༻͢Δඞཁ͕͋Δ w໦ߏ଄Λ༻͍Δ"//ɺ3BOEPNJ[FELEUSFFɺ'-"//ͳͲ wϋογϡߏ଄Λ༻͍Δ-4)ͳͲ

  14. 4QPUJGZBOOPZ

  15. 4QPUJGZBOOPZ w "QQSPYJNBUF/FBSFTU /FJHICPST0I:FBI w $ ϥΠϒϥϦͱ1ZUIPO౳ͷό ΠϯσΟϯάΛఏڙ w ϥϯμϜαϯϓϦϯάͨ͠ೋ఺Λ

    ݩʹۭؒΛೋ෼ׂΛ܁Γฦͯ͠໦ ߏ଄Λෳ਺ߏங w ୳ࡧ࣌͸෼ׂϕΫτϧͰৼΓ෼͚
  16. 4QPUJGZBOOPZ f = 40 t = AnnoyIndex(f) for i in

    xrange(1000): v = [random.gauss(0, 1) for z in xrange(f)] t.add_item(i, v) t.build(10) t.save('test.ann') # ... u = AnnoyIndex(f) u.load('test.ann') print(u.get_nns_by_item(0, 1000))
  17. ϖύϘݚڀॴ ུশʮϖύݚʯ ͸ɺࣄۀΛࠩผԽͰ ͖Δٕज़Λ࡞Γग़ͨ͢ΊʹʮͳΊΒ͔ͳγεςϜʯ ͱ͍͏ίϯηϓτͷԼͰݚڀ։ൃʹऔΓ૊Ή૊৫ Ͱ͢ɻΞΧσϛοΫͳਫ४ʹ͓͚Δ৽نੑɾ༗ޮ ੑɾ৴པੑΛ௥ٻ͢ΔݚڀΛߦ͏ͱͱ΋ʹɺݚڀ ։ൃٕͨ͠ज़Λ࣮ࡍͷγεςϜͱ࣮ͯ͠૷ɾఏڙ ͢Δ͜ͱΛ௨ͯ͠ɺࣄۀͷ੒௕ʹߩݙ͠·͢ɻ ϖύݚ

  18. NPOPDISPNFHBOFNSVCZBOOPZ

  19. NSVCZBOOPZ wۙࣅ࠷ۙ๣୳ࡧΛαʔϏεʹಋೖ͢Δʹ͋ͨΓɺσʔ λͷҰݩԽͱෳ਺ͷΞϓϦέʔγϣϯαʔό͔ΒͷϦ ΫΤετΛॲཧͰ͖ΔΑ͏ʹ)551ϕʔεͷ"1*αʔ ό͕ඞཁ wNSVCZBOOPZPOOHY@NSVCZͱͯ͠ఏڙ mruby-annoy on ngx_mruby NNS

  20. NSVCZBOOPZPOOHY@NSVCZ class NNS def call(env) params = env['QUERY_STRING'].split('&') .map {|kv|

    kv.split('=') }.to_h category_id = params['category_id'].to_i product_id = params['product_id'].to_i limit = (params['limit'] || 10).to_i userdata = Userdata.new "annoy_data_key" annoy = userdata.send("category_#{category_id}") return not_found unless annoy nns = annoy.get_nns_by_item(product_id, limit) [200, content_type, [nns.to_json]] end private def not_found return [404, content_type, [{'error' => 'not_found'}.to_json]] end def content_type {'Content-Type' => 'application/json;charset=utf-8'} end end run NNS.new
  21. 'VLVPLB\NM QZ SC^

  22. "//POTFSWJDF

  23. αʔϏεͰ"//Λ࢖͏  "1*αʔόʔ  ಈతΠϯσοΫε  αʔϏεଆͷ*%ͱͷϚοϐϯά  ߴ଎Խ

  24. NPOPDISPNFHBOFHBOOPZ

  25. HBOOPZ  OFUIUUQʹΑΔ)551αʔόʔ  ࠶࣮૷ͨ͠ϊʔυߏ଄ʹΑΔಈతΠϯσοΫε  ࠷దԽ༻ͷ಺෦*%ͱαʔϏε*%ͷϚοϐϯάΛඪ४ ఏڙ  (PSPVUJOFγεςϜίʔϧΛར༻ͨ͠ߴ଎Խ

  26. HBOOPZ # Create database $ gannoy -d database -dim=2048 -tree=10

    # Start ANN server $ gannoy-server
  27. HBOOPZ # Add item $ curl \ ’http://localhost:1323/databases/hoge/features/100’ \ -H

    "Content-type: application/json” \ -X PUT \ -d '{"features": [1.0, 0.5, 0.2,..]}’ # Search $ curl \ ’http://localhost:1323/search?database=hoge&id=100’
  28. HBOOPZ(PSPVUJOF var wg sync.WaitGroup wg.Add(g.tree) buildChan := make(chan int, g.tree)

    worker := func(n Node) { for index := range buildChan { g.build(index, g.meta.roots()[index], n) wg.Done() } } for i := 0; i < 3; i++ { go worker(n) } for index, _ := range g.meta.roots() { buildChan <- index } wg.Wait() close(buildChan)
  29. HBOOPZγεςϜίʔϧ err := syscall.FcntlFlock(f.file.Fd(), syscall.F_SETLKW, &syscall.Flock_t{ Start: f.offset(index), Len: f.nodeSize(),

    Type: syscall.F_RDLCK, Whence: io.SeekStart, }) if err != nil { fmt.Printf("fcntl error %v\n", err) } defer syscall.FcntlFlock(f.file.Fd(), syscall.F_SETLKW, &syscall.Flock_t{ Start: f.offset(index), Len: f.nodeSize(), Type: syscall.F_UNLCK, Whence: io.SeekStart, }) b := make([]byte, f.nodeSize()) syscall.Pread(int(f.file.Fd()), b, f.offset(index))
  30. ·ͱΊ

  31. ·ͱΊ wۙࣅ࠷ۙ๣୳ࡧ͸͜Ε͔Βॏཁ w৭ʑͳ࣮૷͕͋Δ͚Ͳɺ଎౓ͱਫ਼౓͚ͩͰ͸ͳͯ͘ɺ αʔϏεͰͷ࢖͍΍͢͞΋ॏཁ w(PݴޠͱγεςϜίʔϧͰγϯϓϧݎ࿚ߴੑೳͳ ΞϓϦΛॻ͜͏ wHBOOPZHBOOPZHBOOPZ

  32. 'VLVPLB \NM QZ SC TZTDBMM HP^

  33. ͓ΘΓ

  34. ܅΋ϖύϘͰಇ͔ͳ͍͔ʁ ࠷৽ͷ࠾༻৘ใΛνΣοΫˠ !QC@SFDSVJU