motemote-data-science-2

9c42c4bc1d91c409d754da88c91cb2ef?s=47 kur0cky
August 01, 2020

 motemote-data-science-2

ネタですが,Rからelasticsearchを使ったり,elasticsearchの便利な機能を一部紹介しています笑

9c42c4bc1d91c409d754da88c91cb2ef?s=128

kur0cky

August 01, 2020
Tweet

Transcript

  1. σʔλͰʮϞςʯΛIBDLͤΑ BU5PLZP3 LVSDLZ dσΟφʔฤd

  2. ࣗݾ঺հ w 5XJUUFSɿ!LVSDLZ@Z w ॴଐɿ໊ࢗ؅ཧͷIPHFIPHFʢϲ݄ʣ w झຯɿԻָɼөըɼ͓ञɼλΠϐϯάɼFUD w ೰ΈɿϞς͍ͨ 2

  3. લճdөըσʔτฤd w POFUPPOFͳөըσʔτମݧΛఏڙ͢ΔͨΊʹɼࣗવͳձ࿩ͷத͔ΒϨ Ϗϡʔ఺਺Λਪఆ͢Δ͜ͱΛ໨తͱͨ͠ɽ w ͦͷͨΊʹɼөըυϝΠϯઐ༻ͷۃੑࣙॻΛ࡞੒͠ɼϨίϝϯυʹ׆༻ 3 https://speakerdeck.com/kur0cky/motemote-data-science-1

  4. өըΛݟͯऴΘΓͰྑ͔ͬͨͷͩΖ͏͔

  5. ࣮੷͕෺ޠ͍ͬͯΔ

  6. Կ͕͍͚ͳ͔ͬͨͷ͔

  7. ࠗ׆αΠτʹΑΔͱ өըͷ༨ӆΛָ͠ΜͩΓɺөըͱ͍͏ڞ௨ͷ࿩୊ͷձ࿩͔Β૬खͷঁੑʹ͍ͭͯ͞Β ʹ஌ΕͨΓͱɺөըͷޙ͸ઈ޷ͷίϛϡχέʔγϣϯͷػձͱͳΓ·͢ɻ͜ͷνϟϯε Λ͍͔ͨ͢Ίʹ΋ɺөըͷޙʹ͸͝൧΍͓஡ͳͲͷ༧ఆΛೖΕ·͠ΐ͏ɻ ภݟʹ·ΈΕ͍ͯΔ͕ɼ͍ͬͨΜ͜ΕΛ৴͡Δɽ

  8. POFUPPOFͳөը͚ͩͰ͸μϝ ͦͷޙͷձ࿩͕ॏཁ

  9. ϩʔϧϓϨΠϯά 9 ͓΋͠Ζ͔ͬͨͶʂʂ ద౰ʹೖͬͪΌ͓͏͔ ͜Ε͸തଧɽऑऀʹ͸ద౰͕Θ͔ΒΜ ͓͍͍͠ΠλϦΞϯ༧໿͓͍ͯͨ͠Μͩʂʂ ༻ҙप౸͗ͯ͢ॏ͍͠ɼ͓ෲ͕ݮͬͯͳ͍͔΋͠Εͳ͍ ͳʹ͔৯΂ͳ͕Β஻Ζ͏ʂ গ͠า͍ͯɼྑ͛͞ͳͱ͜Ζ୳ͦ͞͏͔ ૬खͷੑ֨΍৔ॴʹେ͖͘ґଘɽา͖͗͢͸/(ɽϦεΩʔ

  10. ͦͷ৔ͰαΫͬͱݕࡧͰ͖Δ ΞϓϦ͕ॏཁ

  11. 3ͱFMBTUJDTFBSDI BU5PLZP3 LVSDLZ

  12. w શจݕࡧγεςϜ w ಛ௃ɿ w ߴ଎ɽ෼ࢄܕͰεέʔϥϒϧɽΦʔϓϯιʔεɽ3&45"1*ɽ +40/ʹΑΔॊೈͳσʔλߏ଄ɽείΞϦϯάͷΧελϚΠζɽ 12 ՄࢹԽɾ෼ੳ ݕࡧ

    σʔλऩू
  13. Πϯετʔϧɾىಈ w .BDͷ৔߹ brew tap elastic/tap brew install elastic/tap/elasticsearch-full brew

    install elastic/tap/kibana-full brew install elastic/tap/logstash-full brew install elastic/tap/metricbeat-full elasticsearch & kibana & w IUUQMPDBMIPTUͰ৘ใ͕ฦͬͯ͘Ε͹ىಈ੒ޭ w ,JCBOB IUUQMPDBMIPTU ͷ$POTPMFΛ࢖͏ͱ৭ʑࢼ͠΍͍͢ 13 ଞͷ04͸ɿIUUQTXXXFMBTUJDDPHVJEFFOFMBTUJDTUBDLDVSSFOUJOTUBMMJOHFMBTUJDTUBDLIUNM
  14. جຊ༻ޠ w ΠϯσοΫε w FMBTUJDTFBSDI͕ݕࡧɾղੳͷର৅ͱ͢Δσʔλͷอଘઌ w υΩϡϝϯτλΠϓ w ΠϯσοΫε಺ͷάϧʔϓ w

    υΩϡϝϯτ w FMBTUJDTFBSDI಺ʹอଘ͞Εͨσʔλ w ϑΟʔϧυ w υΩϡϝϯτʹؚ·ΕΔଐੑ 14 σʔλϕʔε ςʔϒϧ Ϩίʔυ ΧϥϜ 3%#Ͱ͍͏ͱɹɹ
  15. ؆୯ͳ࢖͍ํ w +40/ͱ3&45"1*ͰઃఆɾΠϯσΩγϯάʢ౤ೖʣɾݕࡧ͢΂ͯΛߦ͏ 15 • ݕࡧ GET index_name/_search { "query"

    : { "match" : { "comment" : "σʔτ" } } } • ΠϯσΩγϯά PUT index_name/ { "name" : "ϥʔϝϯೋ࿠", "genre" : "όʔɾμΠχϯά" } • ઃఆ PUT index_name/ { "settings" : { hogehoge }, "mappings" : { fugafuga } } ܗଶૉղੳ΍શ൒֯ͷ౷ҰͳͲ ༷ʑͳઃఆΛهड़ ౤ೖ͢Δσʔλ͕ͲͷΑ͏ͳ ϑΟʔϧυΛ΋ͪ͏Δͷ͔ɼͦͷܕ΋هड़
  16. 3͔ΒFMBTUJDTFBSDIΛୟ͘ w FMBTUJDύοέʔδ͔Βୟ͘ɽ w σʔλϑϨʔϜΛ௚઀ΠϯσΩγϯάͰ͖Δɽ w શͯΛ3Ͱ׬͍݁ͤͨ͞ʂʂ w ΫΤϦ͸ɼϦετͰॻ͍ͯKTPOMJUFͰ+40/ʹ w

    جຊͷૢ࡞ 1. conn <- connect(host="127.0.0.1", port=9200) 2. docs_bulk(conn, df, index) 3. Search(conn, index, body = <query>) 16
  17. ࣮ફ

  18. ໨త w өըσʔτ͸ײ૝ઓͰ׬੒͢Δ w ϩʔϧϓϨΠϯάͷ݁Ռɼ͍͔ͭ͘ͷཁ݅Λຬͨ͢ඞཁ͕͋Δ w ͦͷ৔ͷঢ়گʹ߹ΘͤͯαΫοͱܾΊΔ͜ͱ͕ॏཁ w جຊతʹ͋·Γา͔ͤͳ͍ w

    Ͱ͖Δ͚࣭ͩͷྑ͍ళʹೖΔ 18
  19. ΍ͬͨ͜ͱ w ࢖༻σʔλ w ौ୩ۙลͷϨετϥϯ݅ w ళ໊ɼδϟϯϧɼਓؾͷΫνίϛɼͦͷଞళฮ৘ใ 19 HFPDPEJOH"1* ݕࡧ6*

    MFBqFUʹΑΔ஍ਤ %5ʹΑΔ៉ྷͳද ݱࡏ஍ͷऔಘ ݕࡧ εΫϨΠϐϯά
  20. ཁ݅ͷୡ੒ w LVSPNPKJQMVHJOʹΑΔલॲཧ w τʔΫϯԽɼશ൒֯౷ҰɼεςϛϯάɼFUD w σϑΥϧτͷείΞʢ#.ʣΛ࢖͏ w ۙ͞ʹԠͨ͡ݮਰؔ਺΋࢖͏ w

    ৯΂ϩάείΞ΋࢖͏ 20 w ೔ຊޠͷݕࡧ w ΫΤϦͱͷϚον౓ w ͋·Γา͔ͤͳ͍ w ͍͍ళʹೖΔ https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji-analyzer.html
  21. ڑ཭ʹԠͨ͡ݮਰؔ਺ w ࠓճ͸ɼҢ౓ɾܦ౓ͦΕͧΕʹ Ψ΢γΞϯΛ࢖༻ w ଞʹ΋ɼࢦ਺ؔ਺΍ઢܗ͕͋Δ 21 https://www.elastic.co/guide/en/elasticsearch/reference/current/query- dsl-function-score-query.html

  22. { "query": { "function_score": { "query": { ී௨ͷΫΤϦ͸͜͜ʹॻ͘ }, "functions":

    [ { "gauss": { "latitude": { "origin": [35.6591], "scale": [0.003] } } }, { "gauss": { "longitude": { "origin": [139.7003], "scale": [0.003] } } }, { "field_value_factor": { "field": ["score"], "factor": [3], "modifier": ["log"], "missing": [1] } } ], "score_mode": ["multiply"] } }, "size": [1000], "_source": ["name", "score", "genre", "tel_number"] } GVODUJPO@TDPSFʹΑΔείΞͷ౷߹ 22 Ң౓ͷݮਰ ܦ౓ͷݮਰ ϑΟʔϧυͷ஋ ͦͷ΋ͷΛ࢖͏ ͜ΕΒͷֻ͚ࢉͰ ࠷ऴతͳείΞͱ͢Δ KTPOΈʹͯ͘͘ਃ͠༁ͳ͍Ͱ͢ɾɾɾ
  23. σϞ

  24. None
  25. ·ͱΊ w POFUPPOFөըσʔτମݧΛఏڙ͢ΔͨΊʹ͸ɼ૬खͷ޷ΈΛεϜʔζ ʹఆྔԽ͢Δ͚ͩͰ͸μϝͩͬͨɽ w ʮөըσʔτ͸ײ૝ઓͰ׬੒͢Δʯͱ͍͏Ծઆͷ΋ͱɼྑ͍ళΛαΫͬͱ ݕࡧͰ͖ΔΞϓϦΛ࡞੒ͨ͠ɽ w ಛʹɼݱࡏҐஔ͔Βͷڑ཭ɾ৯΂ϩάͷείΞɾΫΤϦͱͷϚον౓Λ ߟྀͨ͠είΞϦϯάΛߦͬͨɽ

    25 FMBTUJDTFBSDI͸͍͍ͧʂ ʢެࣜυΩϡϝϯτ͕ຊ౰ʹ਌੾ʣ
  26. &OKPZ