Upgrade to Pro — share decks privately, control downloads, hide ads and more …

検索キーワードをPythonのScikit learnでクラスタリングした話 〜機械学習...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

検索キーワードをPythonのScikit learnでクラスタリングした話 〜機械学習を使って自然検索に強いサイトを作る〜

自然検索のサジェストキーワードを、PythonのScikit learnでクラスタリングすることで、検索キーワードに強いサイト作りをする

Avatar for soheiyagi

soheiyagi

July 14, 2019
Tweet

More Decks by soheiyagi

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ גࣜձࣾι΢୅දऔక໾ TPVDPKQ  ɾίϯςϯπϚʔέςΟϯά ɾ8FC޿ࠂӡ༻ ɾϚʔέςΟϯάπʔϧ࡞੒ ຊொΦʔϓϯιʔεϥϘӡӦ ɾΤϯδχΞͷίϛϡχςΟεϖʔε େࡕ1ZUIPOͷձɹΦʔΨφΠβʔ

    தখاۀிϛϥαϙઐ໳Ո೿ݣɹొ࿥ઐ໳Ո 8FC޿ࠂ΍ਓࡐϏδωεͷӦۀ͔Β࢓ࣄΛ͸͡Ίɺ 8FCӡ༻΍޿ࠂӡ༻Λ΍ΓͭͭɺϓϩάϥϜΛॻ͘ਓ ίʔυ͕ॻ͚ΔϚʔέολʔ !TPIFJZBHJ TPIFJZBHJ 4PIFJ:BHJʗീ໦૑ฏ
  2. ࣗݾ঺հ גࣜձࣾι΢୅දऔక໾ TPVDPKQ  ɾίϯςϯπϚʔέςΟϯά ɾ8FC޿ࠂӡ༻ ɾϚʔέςΟϯάπʔϧ࡞੒ ຊொΦʔϓϯιʔεϥϘӡӦ ɾΤϯδχΞͷίϛϡχςΟεϖʔε େࡕ1ZUIPOͷձɹΦʔΨφΠβʔ

    தখاۀிϛϥαϙઐ໳Ո೿ݣɹొ࿥ઐ໳Ո 8FC޿ࠂ΍ਓࡐϏδωεͷӦۀ͔Β࢓ࣄΛ͸͡Ίɺ 8FCӡ༻΍޿ࠂӡ༻Λ΍ΓͭͭɺϓϩάϥϜΛॻ͘ਓ ίʔυ͕ॻ͚ΔϚʔέολʔ !TPIFJZBHJ TPIFJZBHJ 4PIFJ:BHJʗീ໦૑ฏ ࠷ۙͷझຯ͸ ےτϨͰ͢ɻ
  3. αδΣετϫʔυͷऔಘ Πϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯࣄྫ೔ຊ Πϊϕʔγϣϯࣄྫ࠷ۙ Πϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯࣄྫւ֎ ΠϊϕʔγϣϯࣄྫJQIPOF Πϊϕʔγϣϯࣄྫ࠷৽ Πϊϕʔγϣϯࣄྫ਎ۙ ΠϊϕʔγϣϯࣄྫΞϝϦΧ

    Πϊϕʔγϣϯࣄྫ΢ΥʔΫϚϯ ΤίγεςϜΠϊϕʔγϣϯࣄྫ ӦۀΠϊϕʔγϣϯࣄྫ ΠϊϕʔγϣϯΦϑΟεࣄྫ ΦʔϓϯΠϊϕʔγϣϯࣄྫ ΦʔϓϯΠϊϕʔγϣϯࣄྫ೔ຊ ΦʔϓϯσʔλΠϊϕʔγϣϯࣄྫ େࡕΨεΠϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯձࣾࣄྫ Πϊϕʔγϣϯ؀ڥࣄྫ Ձ஋Πϊϕʔγϣϯࣄྫ ֶੜΠϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯࣄྫاۀ Πϊϕʔγϣϯۚ༥ࣄྫ େاۀΠϊϕʔγϣϯࣄྫ ٕज़Πϊϕʔγϣϯࣄྫ ۀքΠϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯࣄྫ૊Έ߹Θͤ ݚڀ։ൃΠϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯܦࡁࣄྫ ΠϊϕʔγϣϯܦӦࣄྫ খചۀΠϊϕʔγϣϯࣄྫ খചΠϊϕʔγϣϯࣄྫ ࢠڙΠϊϕʔγϣϯࣄྫ খചۀΠϊϕʔγϣϯࣄྫ ΠϊϕʔγϣϯࣄྫαʔϏε ࢈ֶ࿈ܞΠϊϕʔγϣϯࣄྫ αʔϏεۀΠϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯࣄྫू Πϊϕʔγϣϯࣄྫࣦഊ Πϊϕʔγϣϯ঎඼ࣄྫ Πϊϕʔγϣϯ৽݁߹ࣄྫ ΤίγεςϜΠϊϕʔγϣϯࣄྫ ࣾձ՝୊Πϊϕʔγϣϯࣄྫ ࣾ಺Πϊϕʔγϣϯࣄྫ ΠϊϕʔγϣϯδϨϯϚࣄྫ Πϊϕʔγϣϯ࣋ଓత੒௕ࣄྫ Πϊϕʔγϣϯࣄྫੈք Πϊϕʔγϣϯ੡඼ࣄྫ ੡଄ۀΠϊϕʔγϣϯࣄྫ ଟ༷ੑΠϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯ૊৫ࣄྫ Πϊϕʔγϣϯ૑ग़ࣄྫ ιʔγϟϧΠϊϕʔγϣϯࣄྫ೔ຊ ૑ൃΠϊϕʔγϣϯࣄྫ ଟ༷ੑΠϊϕʔγϣϯࣄྫ େاۀΠϊϕʔγϣϯࣄྫ μΠόʔγςΟΠϊϕʔγϣϯࣄྫ தখاۀΠϊϕʔγϣϯࣄྫ ΠϊϕʔγϣϯνʔϜࣄྫ Πϊϕʔγϣϯ஌ࣝ૑଄ࣄྫ ஍ҬΠϊϕʔγϣϯࣄྫ ΠϊϕʔγϣϯࣄྫσβΠϯࢥߟ σβΠϯΠϊϕʔγϣϯࣄྫ σδλϧΠϊϕʔγϣϯࣄྫ ഁյతΠϊϕʔγϣϯࣄྫ ඇ࿈ଓΠϊϕʔγϣϯࣄྫ ೔ཱΠϊϕʔγϣϯࣄྫ ϏδωεϞσϧΠϊϕʔγϣϯࣄྫ ϏδωεΠϊϕʔγϣϯࣄྫ ϏοάσʔλΠϊϕʔγϣϯࣄྫ ෋࢜ϑΠϧϜΠϊϕʔγϣϯࣄྫ ෋࢜௨ϑΟʔϧυΠϊϕʔγϣϯࣄྫ ෋࢜௨Πϊϕʔγϣϯࣄྫ ϓϩηεΠϊϕʔγϣϯࣄྫ ϓϩμΫτΠϊϕʔγϣϯࣄྫ ϔϧεέΞΠϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯࣄྫຊ ΠϊϕʔγϣϯࣄྫϚΠΫϩιϑτ ϚʔέςΟϯάΠϊϕʔγϣϯࣄྫ ϢʔβʔΠϊϕʔγϣϯࣄྫ ϦόʔεΠϊϕʔγϣϯࣄྫ ΞʔΩςΫνϟϧΠϊϕʔγϣϯࣄྫ ࢈ֶ࿈ܞΠϊϕʔγϣϯࣄྫ QHΠϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯࣄྫ NΠϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯͭͷػձࣄྫ λʔήοτΩʔϫʔυʮΠϊϕʔγϣϯࣄྫʯ݅
  4. ࣮ࡍͷਫ਼౓  ΠϊϕʔγϣϯܦӦࣄྫ Πϊϕʔγϣϯܦࡁࣄྫ Πϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯࣄྫ΢ΥʔΫϚϯ ϓϩηεΠϊϕʔγϣϯࣄྫ  ඇ࿈ଓΠϊϕʔγϣϯࣄྫ 

    Πϊϕʔγϣϯձࣾࣄྫ ΠϊϕʔγϣϯࣄྫΞϝϦΧ ΠϊϕʔγϣϯࣄྫαʔϏε Πϊϕʔγϣϯࣄྫاۀ Πϊϕʔγϣϯࣄྫू Πϊϕʔγϣϯࣄྫ਎ۙ Πϊϕʔγϣϯࣄྫੈք Πϊϕʔγϣϯ঎඼ࣄྫ ϓϩμΫτΠϊϕʔγϣϯࣄྫ ٕज़Πϊϕʔγϣϯࣄྫ ࢠڙΠϊϕʔγϣϯࣄྫ  μΠόʔγςΟΠϊϕʔγϣϯࣄྫ ଟ༷ੑΠϊϕʔγϣϯࣄྫ ଟ༷ੑΠϊϕʔγϣϯࣄྫ  Πϊϕʔγϣϯࣄྫࣦഊ Πϊϕʔγϣϯ૊৫ࣄྫ ֶੜΠϊϕʔγϣϯࣄྫ  Πϊϕʔγϣϯࣄྫւ֎ ΦʔϓϯΠϊϕʔγϣϯࣄྫ ΦʔϓϯΠϊϕʔγϣϯࣄྫ೔ຊ  ΠϊϕʔγϣϯࣄྫσβΠϯࢥߟ σβΠϯΠϊϕʔγϣϯࣄྫ  Πϊϕʔγϣϯࣄྫ Πϊϕʔγϣϯࣄྫ࠷ۙ Πϊϕʔγϣϯࣄྫ࠷৽ Πϊϕʔγϣϯࣄྫ೔ຊ Πϊϕʔγϣϯ੡඼ࣄྫ αʔϏεۀΠϊϕʔγϣϯࣄྫ ϏδωεΠϊϕʔγϣϯࣄྫ ۀքΠϊϕʔγϣϯࣄྫ ੡଄ۀΠϊϕʔγϣϯࣄྫ  Πϊϕʔγϣϯ؀ڥࣄྫ Πϊϕʔγϣϯۚ༥ࣄྫ ιʔγϟϧΠϊϕʔγϣϯࣄྫ೔ຊ ϏδωεϞσϧΠϊϕʔγϣϯࣄྫ  ΠϊϕʔγϣϯδϨϯϚࣄྫ Πϊϕʔγϣϯ࣋ଓత੒௕ࣄྫ ഁյతΠϊϕʔγϣϯࣄྫ  খചΠϊϕʔγϣϯࣄྫ খചۀΠϊϕʔγϣϯࣄྫ খചۀΠϊϕʔγϣϯࣄྫ  ෋࢜௨Πϊϕʔγϣϯࣄྫ ෋࢜௨ϑΟʔϧυΠϊϕʔγϣϯࣄྫ  NΠϊϕʔγϣϯࣄྫ QHΠϊϕʔγϣϯࣄྫ ΞʔΩςΫνϟϧΠϊϕʔγϣϯࣄྫ ΠϊϕʔγϣϯΦϑΟεࣄྫ ΠϊϕʔγϣϯνʔϜࣄྫ ΠϊϕʔγϣϯࣄྫϚΠΫϩιϑτ Πϊϕʔγϣϯࣄྫຊ Πϊϕʔγϣϯ૑ग़ࣄྫ Πϊϕʔγϣϯ஌ࣝ૑଄ࣄྫ ΦʔϓϯσʔλΠϊϕʔγϣϯࣄྫ σδλϧΠϊϕʔγϣϯࣄྫ ϏοάσʔλΠϊϕʔγϣϯࣄྫ ϔϧεέΞΠϊϕʔγϣϯࣄྫ ϢʔβʔΠϊϕʔγϣϯࣄྫ ϦόʔεΠϊϕʔγϣϯࣄྫ ӦۀΠϊϕʔγϣϯࣄྫ ݚڀ։ൃΠϊϕʔγϣϯࣄྫ ࢈ֶ࿈ܞΠϊϕʔγϣϯࣄྫ ࢈ֶ࿈ܞΠϊϕʔγϣϯࣄྫ ࣾձ՝୊Πϊϕʔγϣϯࣄྫ ࣾ಺Πϊϕʔγϣϯࣄྫ ૑ൃΠϊϕʔγϣϯࣄྫ େࡕΨεΠϊϕʔγϣϯࣄྫ ஍ҬΠϊϕʔγϣϯࣄྫ ೔ཱΠϊϕʔγϣϯࣄྫ ෋࢜ϑΠϧϜΠϊϕʔγϣϯࣄྫ
  5. ࣮ࡍͷਫ਼౓  ΤίγεςϜΠϊϕʔγϣϯࣄྫ ΤίγεςϜΠϊϕʔγϣϯࣄྫ  େاۀΠϊϕʔγϣϯࣄྫ େاۀΠϊϕʔγϣϯࣄྫ தখاۀΠϊϕʔγϣϯࣄྫ  ΠϊϕʔγϣϯࣄྫJQIPOF

    Πϊϕʔγϣϯࣄྫ૊Έ߹Θͤ Πϊϕʔγϣϯ৽݁߹ࣄྫ ϚʔέςΟϯάΠϊϕʔγϣϯࣄྫ Ձ஋Πϊϕʔγϣϯࣄྫ  Πϊϕʔγϣϯͭͷػձࣄྫ ·ͩ·ͩਫ਼౓͸্͛Δඞཁ͋Δ͕ɺ ໨ࢹͰҰ͔Β΍ΔΑΓ͸அવ࣌ؒ୹ॖ͕࣮ݱ
  6. ຊொΦʔϓϯιʔεϥϘ8FCαΠτͰެ։࣮ݧ Ωʔϫʔυ ॱҐ ݕࡧϘϦϡʔϜ ϓϩάϥϛϯάษڧձॳ৺ऀ   ϓϩάϥϛϯάษڧձॳ৺ऀ  

    JUษڧձॳ৺ऀ   JUॳ৺ऀηϛφʔ   VEFNZQZUIPO͓͢͢Ί   ίϯύεษڧձ   JUษڧձ   JUษڧձ   DPNQBTTษڧձ   ษڧձαΠτ   ΤϯδχΞษڧձ   ษڧձJU   EPUTษڧձ   ษڧձɹJU   JUษڧձΧϨϯμʔ   ΤϯδχΞษڧձ   JUษڧॳ৺ऀ   ໊ݹ԰QZUIPO   QZUIPOॳ৺ऀษڧձ   Ωʔϫʔυ ॱҐ ݕࡧϘϦϡʔϜ JUษڧձେࡕ   େࡕJUษڧձ   ϓϩάϥϛϯάษڧձ   େࡕJUษڧձ   JUษڧձେࡕ   QZUIPOษڧձ౦ژ   େࡕษڧձJU   ෱ԬJUษڧձ   ϓϩάϥϛϯάษڧձ   SVCZؔ੢   େࡕJUษڧձ   ໊ݹ԰ษڧձ   QZUIPOηϛφʔ   QZUIPOVEFNZ   VEFNZQZUIPO   JUษڧ   ૂͬͨΩʔϫʔυͰͷ্Ґදࣔͱ$7Λ֫ಘ ˞೥݄೔࣌఺ɹ"ISFGTௐ΂
  7. ˞೥݄೔࣌఺ɹ"ISFGTௐ΂ ຊொΦʔϓϯιʔεϥϘ8FCαΠτͰެ։࣮ݧ Ωʔϫʔυ ॱҐ ݕࡧϘϦϡʔϜ ϓϩάϥϛϯάษڧձॳ৺ऀ   ϓϩάϥϛϯάษڧձॳ৺ऀ 

     JUษڧձॳ৺ऀ   JUॳ৺ऀηϛφʔ   VEFNZQZUIPO͓͢͢Ί   ίϯύεษڧձ   JUษڧձ   JUษڧձ   DPNQBTTษڧձ   ษڧձαΠτ   ΤϯδχΞษڧձ   ษڧձJU   EPUTษڧձ   ษڧձɹJU   JUษڧձΧϨϯμʔ   ΤϯδχΞษڧձ   JUษڧॳ৺ऀ   ໊ݹ԰QZUIPO   QZUIPOॳ৺ऀษڧձ   Ωʔϫʔυ ॱҐ ݕࡧϘϦϡʔϜ JUษڧձେࡕ   େࡕJUษڧձ   ϓϩάϥϛϯάษڧձ   େࡕJUษڧձ   JUษڧձେࡕ   QZUIPOษڧձ౦ژ   େࡕษڧձJU   ෱ԬJUษڧձ   ϓϩάϥϛϯάษڧձ   SVCZؔ੢   େࡕJUษڧձ   ໊ݹ԰ษڧձ   QZUIPOηϛφʔ   QZUIPOVEFNZ   VEFNZQZUIPO   JUษڧ   ࠓճͷςʔϚʹ௚઀ؔ܎ͳ͍Ͱ͕͢ɺ ݕࡧΩʔϫʔυͷϘϦϡʔϜͱ $73ͷ૬ؔ͸ແ͍ͨΊɺ ࣮ࡍʹ͸$7ʹ͍ۙΩʔϫʔυͰ ্ҐදࣔͰ͖Δ͜ͱ͕ॏཁ ˞$73ɹίϯόʔδϣϯ཰ʢ੒໿཰ʣ ˞$7ɹίϯόʔδϣϯ
  8. ίʔυ͸ͲΜͳײ͡ # -*- coding: utf-8 -*- import pandas as pd

    from sklearn.cluster import KMeans ίʔυൈਮ pred = KMeans( n_clusters=int(query_len/5), init='k-means++' ).fit_predict(cust_array)
  9. # -*- coding: utf-8 -*- import pandas as pd from

    sklearn.cluster import KMeans ίʔυൈਮ pred = KMeans( n_clusters=int(query_len/5), init='k-means++' ).fit_predict(cust_array) ,.FBOTΛར༻ ίʔυ͸ͲΜͳײ͡
  10. # -*- coding: utf-8 -*- import pandas as pd from

    sklearn.cluster import KMeans ίʔυൈਮ pred = KMeans( n_clusters=int(query_len/5), init='k-means++' ).fit_predict(cust_array) Ϋϥελ਺ ίʔυ͸ͲΜͳײ͡
  11. # -*- coding: utf-8 -*- import pandas as pd from

    sklearn.cluster import KMeans ίʔυൈਮ pred = KMeans( n_clusters=int(query_len/5), init='k-means++' ).fit_predict(cust_array) αδΣετΩʔϫʔυ਺Λ Ͱׂͬͨ਺ࣈΛઃఆ ˞೚ҙͷ਺ࣈ ίʔυ͸ͲΜͳײ͡
  12. # -*- coding: utf-8 -*- import pandas as pd from

    sklearn.cluster import KMeans ίʔυൈਮ pred = KMeans( n_clusters=int(query_len/5), init='k-means++' ).fit_predict(cust_array) ॳظԽͷઃఆ ίʔυ͸ͲΜͳײ͡
  13. # -*- coding: utf-8 -*- import pandas as pd from

    sklearn.cluster import KMeans ίʔυൈਮ pred = KMeans( n_clusters=int(query_len/5), init='k-means++' ).fit_predict(cust_array) ֤σʔλʹର͢Δ Ϋϥελ൪߸Λฦ͢ ίʔυ͸ͲΜͳײ͡
  14. # -*- coding: utf-8 -*- import pandas as pd from

    sklearn.cluster import KMeans ίʔυൈਮ pred = KMeans( n_clusters=int(query_len/5), init='k-means++' ).fit_predict(cust_array) <<> <>  <> <>> ֤ΩʔϫʔυΛϕΫτϧԽͯ͠pU@QSFEJDUʹ౉͢ Ωʔϫʔυ<      > Ωʔϫʔυ<      > ίʔυ͸ͲΜͳײ͡
  15. # -*- coding: utf-8 -*- import pandas as pd from

    sklearn.cluster import KMeans ίʔυൈਮ pred = KMeans( n_clusters=int(query_len/5), init='k-means++' ).fit_predict(cust_array) <      > ֤ΩʔϫʔυͷΫϥελ൪߸͕ฦͬͯ͘Δ Ωʔϫʔυɿ Ωʔϫʔυɿ̍̒ ίʔυ͸ͲΜͳײ͡