$30 off During Our Annual Pro Sale. View Details »

類似画像検索の裏側 - 機械学習と近似近傍探索 - / similar_image_search_inside

monochromegane
September 12, 2018

類似画像検索の裏側 - 機械学習と近似近傍探索 - / similar_image_search_inside

ペパボ研究所 in da house #1

monochromegane

September 12, 2018
Tweet

More Decks by monochromegane

Other Decks in Technology

Transcript

  1. - ػցֶशͱۙࣅۙ๣୳ࡧ -
    ࡾ୐༔հ / Pepabo R&D Institute, GMO Pepabo, Inc.
    2018.09.11 ϖύϘݚڀॴ in da house #1
    ྨࣅը૾ݕࡧͷཪଆ

    View Slide

  2. ϓϦϯγύϧΤϯδχΞ
    ࡾ୐༔հ!NPOPDISPNFHBOF
    (.0ϖύϘגࣜձࣾϖύϘݚڀॴ
    IUUQCMPHNPOPDISPNFHBOFDPN

    View Slide

  3. 1. ͳͥྨࣅը૾ݕࡧ͔
    2. ྨࣅը૾ݕࡧΛࢧ͑Δٕज़
    3. ྨࣅը૾ݕࡧͷಋೖ
    4. ྨࣅը૾ݕࡧͷ͜Ε͔Β
    3
    ໨࣍

    View Slide

  4. 1.
    ͳͥྨࣅը૾ݕࡧ͔

    View Slide

  5. • ৘ใ୳ࡧϓϩηε͸γεςϜଆͷ
    ٕज़ৄࡉ͚ͩͰ͸ͳ͘ɺ͜ͷϓϩ
    ηεΛۦಈ͢ΔओମͰ͋Δར༻ऀ
    ͕ଘࡏ͢Δ
    • ར༻ऀαΠυΛத৺ͱͨ͠৘ใ୳
    ࡧϓϩηεϞσϧͷઌߦݚڀʹΑ
    Γɺ৘ใཁٻ͸ɺ༷ʑͳจ຺ʹΑ
    ΓมԽ͍ͯ͘͜͠ͱ͕Θ͔͍ͬͯ
    Δ
    5
    ͳΊΒ͔ͳϚονϯάʹ޲͚ͯ
    ৘ใཁٻΛऔΓרٕ͘ज़ͱϓϩηεϞσϧؔ࿈ਤ
    จ຺ʹΑΓ࠷దͳԠ౴͕มԽ͢ΔͷͰ͋Ε͹ɺ
    ٕज़ؒͰͷ༏ҐੑධՁͰ͸ͳ͘ɺ
    ঢ়گʹԠͨ͡બ୒͕ྑ͍ͷͰ͸ͳ͍͔

    View Slide

  6. • ར༻ऀͷऔΓ͏Δߦಈ͕จ຺ʹΑΓมԽ͢ΔͱԾఆ্ͨ͠Ͱɺਫ਼៛ͳจ຺ͷ೺
    Ѳฒͼʹ՝୊ઃఆͷ໌֬ԽࢧԉΛ௨ͯ͠ɺ௚ײతͳ໰͍߹ΘͤʹԠ౴Ͱ͖Δ࢓
    ૊Έ
    6
    ͳΊΒ͔ͳϚονϯάʹ޲͚ͯ
    1. จ຺ʹΑΓԠ౴ͷධՁ͕มΘΔ͜ͱͷ֬ೝ
    2. ར༻ऀͷจ຺ͷύλʔϯ෼ྨͱࣝผख๏ͷཱ֬
    3. ඞཁʹԠͯ͡௚ײతͳ໰͍߹ΘͤʹԠ౴Ͱ͖Δݕࡧ৘ใٕज़ͷಋೖ
    4. ՝୊ઃఆͷ໌֬Խࢧԉͷݕ౼

    View Slide

  7. 2.
    ྨࣅը૾ݕࡧΛࢧ͑Δٕज़

    View Slide

  8. • ಛ௃ྔม׵
    • ݸʑͷը૾σʔλͷྨࣅੑΛఆྔతʹൺֱՄೳͳܗࣜʹม׵͢Δ
    • ۙ๣୳ࡧ
    • ม׵ͯ͠ಘΒΕͨಛ௃ྔͲ͏͠Λൺֱ͠ɼྨࣅ͢Δू߹Λಛఆ͢Δ
    8
    ྨࣅը૾ݕࡧΛࢧ͑Δٕज़

    View Slide

  9. ಛ௃ྔม׵

    View Slide

  10. • m࣍ݩͷϕΫτϧxΛn࣍ݩͷϕΫτϧy΁ࣸ૾͢Δؔ਺fΛਪఆ͢Δֶशख๏
    10
    χϡʔϥϧωοτϫʔΫ: Ϟσϧ
    x1
    x2
    xm
    b
    W
    y1
    y2
    yn
    1
    2
    o1
    = σ(
    m

    i=0
    w1i
    xi
    + b1
    )
    h
    3
    o = σ(Wx + b)
    w11
    w12
    . . w1m
    w21
    w22
    . . w2m
    . .
    wh1
    wh2
    . . whm
    x1
    x2
    . .
    xm
    +
    b1
    b2
    . .
    bh
    x
    f
    y h
    wh1
    wh2
    x1
    x2
    ೖྗ͝ͱʹॏΈXΛ৐ࢉόΠΞε
    ͨ͠΋ͷΛ׆ੑԽؔ਺ʹ௨͢
    શϊʔυʹର͢Δॲཧ͸ߦྻͱͯ͠ૢ࡞Ͱ͖Δ

    View Slide

  11. 11
    χϡʔϥϧωοτϫʔΫ: ֶश
    x1
    x2
    xm
    y1
    y2
    yn
    x
    f
    y
    • m࣍ݩͷϕΫτϧxΛn࣍ݩͷϕΫτϧy΁ࣸ૾͢Δؔ਺fΛਪఆ͢Δֶशख๏
    t
    tn
    t2
    t1
    W(1) W(2)
    b(1) b(2)
    ✓(i+1) = ✓(i) ↵rEk
    E =
    1
    2
    X
    n
    (yn tn)2 E =
    X
    n
    tn ln yn
    rEk = (
    @Ek
    @✓1
    , ..
    @Ek
    @✓V
    )
    ग़ྗͱਖ਼ղͷޡࠩΛද͢ଛࣦؔ਺Λఆٛ͢Δɻ
    ೋ৐࿨ޡࠩʢࠨʣɺަࠩΤϯτϩϐʔޡࠩʢӈʣ
    ଛࣦؔ਺ʹର͢Δ֤ύϥϝλͷภඍ෼ͷ஋ͷू·Γʢޯ഑ʣΛ༻
    ͍ͯଛࣦؔ਺ͷ஋͕খ͘͞ͳΔΑ͏ʹύϥϝλΛߋ৽͢Δ͜ͱΛ
    ޯ഑߱Լ๏ͱݺͿɻಛʹҰ෦ͷ݁ՌͷΈΛར༻͢Δ֬཰తޯ഑߱
    Լ๏͕Α͘༻͍ΒΕΔɻ
    Αͬͯޯ഑͕ফ͑ͳ͍Α͏ͳͳΊΒ͔ͳ׆ੑԽؔ਺͕޷·ΕΔɻ
    ·ͨɺ֤ύϥϝλͷภඍ෼ΛݸผʹٻΊΔͷͰ͸ͳ͘ɺχϡʔϥ
    ϧωοτϫʔΫΛදؔ͢਺GΛ߹੒ؔ਺ͱݟཱͯΔ͜ͱͰɺ্ྲྀ
    ͷޡࠩͱࣗ਎ͷೖྗ஋ͷΈΛ࢖ͬͯܭࢉྔΛ཈͑ͳ͕Βޮ཰తʹ
    ޯ഑ΛٻΊΔख๏Λޡࠩٯ఻೻๏ͱݺͿɻ

    View Slide

  12. χϡʔϥϧωοτϫʔΫ: ֶश
    12
    • ࢀߟ
    (PʹΑΔޯ഑߱Լ๏ཧ࿦ͱ࣮ફ
    IUUQTTQFBLFSEFDLDPNNPOPDISPNFHBOFHSBEJFOUEFTDFOUJOHPMBOH
    θϩ͔Β࡞Δ%FFQ-FBSOJOH

    View Slide

  13. ৞ΈࠐΈχϡʔϥϧωοτϫʔΫ
    13








    ੵ࿨ԋࢉ
    ৞ΈࠐΈ





    ࠷େ஋
    ೖྗσʔλʹରͯ͠ੵ࿨ԋࢉΛߦ͏ϑΟϧλΛద༻͠ɺ
    ಛ௃Ϛοϓͱͯ͠ग़ྗ͢Δɻۭؒ৘ใΛߟྀͰ͖Δɻ
    ࠷దͳϑΟϧλͷॏΈ͸ֶशʹΑͬͯ֫ಘ͢Δɻ
    • ۭؒత৘ใΛߟྀͨ͠χϡʔϥϧωοτϫʔΫ
    ϓʔϦϯά
    ೖྗσʔλʹରۭͯؒ͠ͷू໿Λߦ͏ϑΟϧλΛద༻
    ͢Δɻ࠷େ஋΍ฏۉ஋Λར༻͢ΔͷͰֶश͕ෆཁɻ
    ࡉ͔ͳҐஔมԽʹର͢Δؤڧੑʹߩݙ͢Δɻ
    ಛ௃நग़ث ࣝผث

    View Slide

  14. 14

    View Slide

  15. ۙ๣୳ࡧ

    View Slide

  16. ߴ࣍ݩϕΫτϧͷۙ๣୳ࡧ
    wҰൠʹը૾ͷಛ௃ྔ͸ߴ࣍ݩͷϕΫτϧͱͯ͠දݱ͞ΕΔ
    wߴ࣍ݩͷϕΫτϧू߹͔ΒΫΤϦͱͳΔϕΫτϧͱڑ཭ͷ͍ۙز͔ͭͷ
    ϕΫτϧΛऔΓग़͢͜ͱΛۙ๣୳ࡧͱݺͿ
    wର৅ͷಛ௃ྔϕΫτϧ͕େྔʹ͋Δ৔߹ɺϕΫτϧؒͷڑ཭Λ౎౓શ݅
    ʹରͯ͠ܭࢉ͍ͯͯ͠͸͕͔͔࣌ؒΔͨΊɺπϦʔ΍ϋογϡΛࣄલʹ
    ߏங͢Δ͜ͱͰݕࡧͷߴ଎ԽΛਤΔ

    View Slide

  17. ۙࣅۙ๣୳ࡧ
    wΫΤϦʹ͍ۙάϧʔϓʹਅͷ࠷ۙ๣఺ؚ͕·ΕΔ޻෉͸ܭࢉ࣌ؒΛ૿Ճ
    ͤ͞ɺ࣍ݩ਺͕ଟ͍৔߹͸ɺશ୳ࡧͱมΘΒͳ͘ͳΔ৔߹΋͋Δ
    w্هͷ੍໿Λ؇Ίͨ΋ͷ͕ۙࣅ࠷ۙ๣୳ࡧ
    wਫ਼౓ͱ଎౓ͷόϥϯεΛݟͯ࠾༻͢Δඞཁ͕͋Δ
    w໦ߏ଄Λ༻͍Δ"//ɺ3BOEPNJ[FELEUSFFɺ'-"//ͳͲ
    wϋογϡߏ଄Λ༻͍Δ-4)ͳͲ

    View Slide

  18. 4QPUJGZBOOPZ
    w "QQSPYJNBUF/FBSFTU
    /FJHICPST0I:FBI
    w $ϥΠϒϥϦͱ1ZUIPO౳ͷό
    ΠϯσΟϯάΛఏڙ
    w ϥϯμϜαϯϓϦϯάͨ͠ೋ఺Λ
    ݩʹۭؒΛೋ෼ׂΛ܁Γฦͯ͠໦
    ߏ଄Λෳ਺ߏங
    w ୳ࡧ࣌͸෼ׂϕΫτϧͰৼΓ෼͚

    View Slide

  19. 3.
    ྨࣅը૾ݕࡧͷಋೖ

    View Slide

  20. • ಛ௃ྔม׵
    • ݸʑͷը૾σʔλͷྨࣅੑΛఆྔతʹൺֱՄೳͳܗࣜʹม׵͢Δ

    ‎ ఆੑతͳධՁʹ͍ۙ͠ಛ௃ྔม׵ػߏͱͯ͠ͷDeep CNN

    ➕ ֶशͱ༧ଌΛॲཧ͢ΔͨΊͷ֦ுੑͱαʔϏε͔Βͷಠཱੑ
    • ۙ๣୳ࡧ
    • ม׵ͯ͠ಘΒΕͨಛ௃ྔͲ͏͠Λൺֱ͠ɼྨࣅ͢Δू߹Λಛఆ͢Δ

    ‎ ਫ਼౓ͱ଎౓Λཱ྆͢ΔANN(ۙࣅۙ๣୳ࡧ)

    ➕ σʔλͷू໿ͱαʔϏε͔Βͷಠཱੑ
    20
    ྨࣅը૾ݕࡧΛࢧ͑Δٕज़

    View Slide

  21. Similar images search system using ANN
    21
    Feature Similar items
    ANN
    [2048]float64
    query response
    Deep CNN
    index
    Features
    [n][2048]float64
    Deep CNN
    register
    find similar features

    View Slide

  22. • ϩά΍DBͳͲͷαʔϏεࢿ࢈ͱ࿈ܞͰ͖Δ

    ‎ ೖग़ྗ͕Cloud Storageܦ༝
    • ൺֱత༰қʹϞσϧͷߏஙͱࢼߦ͕ߦ͑Δ

    ‎ TensorFlowΛ࠾༻
    • ֶश݁ՌΛར༻͢ΔͨΊͷखஈͱͯ͠APIΛఏڙ͢Δ

    ‎ ΦϯϥΠϯ༧ଌαʔϏεʹΑΓϞσϧͷAPIԽ
    • ্هͷ࢓૊Έ͕εέʔϥϒϧͰ͋Δ͜ͱ

    ‎ ෼ࢄܕτϨʔχϯάΠϯϑϥͱෛՙ෼ࢄαʔϏεͷ࿈ܞ
    22
    ػցֶशج൫ͱͯ͠ͷGoogle Cloud ML

    View Slide

  23. ಛ௃ྔม׵
    w͋Δ࣌఺·Ͱͷ࡞඼ը૾ҰཡΛಛ௃ྔʹม׵͢Δ
    wม׵ͨ͠ಛ௃ྔҰཡΛۙࣅۙ๣୳ࡧσʔλϕʔεʹ౤ೖ͢Δ
    Service Object Storage
    GCP
    image to data
    data to feature
    vectorizer by
    Deep CNN
    ANN
    Workers

    View Slide



  24. 5FOTPS'MPXʹΑΔ
    ಛ௃ྔม׵ͷ࣮૷
    ֶशࡁΈωοτϫʔΫͷಡΈࠐΈ
    தؒ૚Λग़ྗ૚ͱͯ͠ಛ௃நग़ثͱ͢Δ
    ը૾Λೖྗͱͯ͠ಛ௃ྔʹม׵

    View Slide

  25. • ۙࣅۙ๣୳ࡧΛαʔϏεʹಋೖ͢Δʹ͋ͨΓɺσʔλͷҰݩԽͱෳ਺ͷΞϓϦ
    έʔγϣϯαʔό͔ΒͷϦΫΤετΛॲཧͰ͖ΔΑ͏ʹHTTPϕʔεͷAPIαʔ
    ό͕ඞཁ
    • mruby-annoy on ngx_mrubyͱͯ͠ఏڙ
    25
    monochromegane/mruby-annoy + ngx_mruby

    View Slide

  26. mruby_annoy on ngx_mruby
    26
    class NNS
    def call(env)
    params = env['QUERY_STRING'].split('&')
    .map {|kv| kv.split('=') }.to_h
    category_id = params['category_id'].to_i
    product_id = params['product_id'].to_i
    limit = (params['limit'] || 10).to_i
    userdata = Userdata.new "annoy_data_key"
    annoy = userdata.send("category_#{category_id}")
    return not_found unless annoy
    nns = annoy.get_nns_by_item(product_id, limit)
    [200, content_type, [nns.to_json]]
    end
    private
    def not_found
    return [404, content_type,
    [{'error' => 'not_found'}.to_json]]
    end
    def content_type
    {'Content-Type' => 'application/json;charset=utf-8'}
    end
    end
    run NNS.new

    View Slide

  27. • ECαΠτʹ͓͍ͯ঎඼ͷ௥Ճߋ৽͸සൟʹൃੜ͢ΔͨΊஞ࣍·ͨ͸ࠩ෼Ͱͷ
    ΠϯσοΫεߋ৽͕༗༻
    • ΠϯσοΫε಺෦IDͱαʔϏεଆͷIDͱͷϚοϐϯά
    • ͜ΕΒΛຬͨ͠ɼ͔ͭɼΑΓߴ଎ͳۙࣅۙ๣୳ࡧAPIαʔόʔΛgannoyͱͯ͠
    ఏڙ
    • ݱ࣌఺͸ओʹஞ࣍ߋ৽࣌ͷੑೳ໰୊ͷͨΊόοΫΤϯυΛyahoo/NGTʹࠩ
    ͠ସ͑ͯӡ༻த
    27
    monochromegane/gannoy(-yahoo/NGT)

    View Slide

  28. 28
    monochromegane/gannoy(-yahoo/NGT)
    # Add item
    $ curl \
    ’http://localhost:1323/databases/hoge/features/100’ \
    -H "Content-type: application/json” \
    -X PUT \
    -d '{"features": [1.0, 0.5, 0.2,..]}’
    # Search
    $ curl \
    ’http://localhost:1323/search?database=hoge&id=100’

    View Slide

  29. ྨࣅը૾ݕࡧ
    29
    Nyah
    products#show
    product_id
    nearest products
    products#update
    Gannoy
    Object Storage
    GCP
    data to feature
    vectorizer by
    Inception-V3
    ˞ۙࣅۙ๣୳ࡧ࣌ʹେ෦෼ͷ
    ΠϯσοΫε΁ͷΞΫηε͕
    ൃੜ͢ΔͨΊ࣮༻తͳ଎౓Λ
    ಘΔͨΊʹ͸σʔλϕʔεϑΝ
    Πϧ͕શͯϖʔδΩϟογϡ
    ʹࡌΔαΠζͷϝϞϦ͕ඞཁ
    • ݕࡧ࣌͸࡞඼IDΛݩʹొ࿥͞Εͨಛ௃ྔͷۙ๣ू߹Λฦ٫͢Δ
    • ߋ৽࣌͸ߋ৽ޙͷಛ௃ྔΛҰ࣌తʹอଘɼఆظతʹࠩ෼൓өͯ͠੾Γସ͑

    View Slide

  30. ྨࣅը૾ʹΑΔؔ࿈࡞඼ݕࡧ
    30

    View Slide

  31. ྨࣅը૾ʹΑΔؔ࿈࡞඼ݕࡧ
    31

    View Slide

  32. ྨࣅը૾ʹΑΔؔ࿈࡞඼ݕࡧ
    32

    View Slide

  33. 4.
    ྨࣅը૾ݕࡧͷ͜Ε͔Β

    View Slide

  34. • ಛ௃ྔม׵
    • ݸʑͷը૾σʔλͷྨࣅੑΛఆྔతʹൺֱՄೳͳܗࣜʹม׵͢Δ

    ‎ ఆੑతͳධՁʹ͍ۙ͠ಛ௃ྔม׵ػߏͱͯ͠ͷDeep CNN

    ↪︎ αʔϏεʹಛԽͨ͠”ྨࣅੑ”Λ֫ಘ͢Δ
    • ۙ๣୳ࡧ
    • ม׵ͯ͠ಘΒΕͨಛ௃ྔͲ͏͠Λൺֱ͠ɼྨࣅ͢Δू߹Λಛఆ͢Δ

    ‎ ਫ਼౓ͱ଎౓Λཱ྆͢ΔANN(ۙࣅۙ๣୳ࡧ)

    ↪︎ ΑΓߴ଎Ͱߴਫ਼౓ͳANN
    34
    ྨࣅը૾ݕࡧΛࢧ͑Δٕज़

    View Slide

  35. SSD
    35
    • Single Shot MultiBox Detector
    • ΦϒδΣΫτҐஔͱΫϥε෼ྨΛҰ
    ׅͰߦ͏Ϟσϧ
    • αʔϏεಠࣗͷΧςΰϦΛֶशσʔ
    λʹ༻͍ͨ

    View Slide

  36. Sanny
    େن໛ECαΠτͷͨΊͷ଎౓ͱਫ਼౓Λཱ྆ͨ͠
    ෼ࢄՄೳͳۙࣅۙ๣୳ࡧΤϯδϯ

    View Slide

  37. • ΫΤϦͱߴ࣍ݩϕΫτϧू߹Λ೚ҙͷ࣍ݩ਺Ͱ౳෼ͨ͠෦෼ϕΫτϧ୯ҐͰฒ
    ߦʹۙ๣୳ࡧͨ݁͠Ռͷ࿨ू߹Ͱ͋Δۙ๣ީิ͔Βɼ࠶౓ۙ๣୳ࡧΛߦ͏ɽ
    37
    ఏҊख๏ (଎౓໘)
    R@ R@ R@
    9@ 9@ 9@
    /SFDPSET
    // R@ Y@O
    Y㱨9@ // R@ Y@O
    Y㱨9@ // R@ Y@O
    Y㱨9@
    \ ^ \ ^ \ ^
    BSHNJOE R Y
    Y㱨\ ^
    㱮 㱮
    ෼ղલ
    ෼ղޙ
    ᶃ௿࣍ݩۭؒͷฒߦͨۙ͠๣୳ࡧ
    ᶄݻఆ਺ͷۙ๣ީิͷू໿
    ᶅۙ๣ީิͷઢܗ୳ࡧ
    ᶃ ᶄ ᶅ ଎౓վળ



    2VFSZ

    View Slide

  38. • ঎඼ಛੑΛΑ͘දݱ͓ͯ͠Γߴਫ਼౓ʹྨࣅ౓͕ൺֱՄೳͳߴ࣍ݩ͔ͭີͳϕΫ
    τϧͷू߹Λର৅ͱͨۙ͠ࣅۙ๣୳ࡧ
    • ֶशࡁΈCNNΛಛ௃நग़ثͱͯ͠ར༻ͯ͠ը૾͔ΒಘΒΕΔಛ௃ྔू߹
    • ςΩετΛ෼ࢄදݱ΁ม׵͢ΔWord2vec͔ΒಘΒΕΔಛ௃ྔू߹
    • ͜ΕΒ͕ɼݕࡧ࣭໰σʔλ(ΫΤϦ)ʹର͢Δߴ࣍ݩϕΫτϧू߹ͷۙ๣୳ࡧ݁
    Ռͷ্Ґू߹͕ɼΫΤϦͱߴ࣍ݩϕΫτϧू߹Λ೚ҙͷ࣍ݩ਺Ͱ౳෼ͨ͠෦෼
    ϕΫτϧ୯ҐͰۙ๣୳ࡧͨ݁͠Ռͷ্Ґू߹ͱྨࣅ͠΍͍͢͜ͱʹண໨
    38
    ఏҊख๏ (ਫ਼౓໘)
    ෦෼͕ྨࣅ͢Ε͹શମ͕ྨࣅ͢ΔՄೳੑ͕ߴ͍σʔλಛੑ

    View Slide

  39. 39
    Sanny: ఏҊख๏ͷ࣮૷
    4BOOZ 4BOOZ
    4BOOZ
    // //

    //
    //



    2VFSZ
    "MHPSJTN "MHPSJTN "MHPSJTN
    • ΫΤϦฒͼʹ୳ࡧର৅σʔλͷ೚ҙ࣍ݩ΁ͷ౳෼ͱ݁Ռͷू໿Λ୲౰͢Δ
    • ෦෼ϕΫτϧͷۙ๣୳ࡧΞϧΰϦζϜ͸໰Θͳ͍
    • ෦෼ϕΫτϧ͝ͱͷ୳ࡧॲཧ͸ಠཱͷͨΊ෼ࢄߏ੒͕Մೳ

    View Slide

  40. View Slide