$30 off During Our Annual Pro Sale. View Details »

motemote-data-science-2

kur0cky
August 01, 2020

 motemote-data-science-2

ネタですが,Rからelasticsearchを使ったり,elasticsearchの便利な機能を一部紹介しています笑

kur0cky

August 01, 2020
Tweet

More Decks by kur0cky

Other Decks in Technology

Transcript

 1. σʔλͰʮϞςʯΛIBDLͤΑ BU5PLZP3 LVSDLZ dσΟφʔฤd

 2. ࣗݾ঺հ w 5XJUUFSɿ!LVSDLZ@Z w ॴଐɿ໊ࢗ؅ཧͷIPHFIPHFʢϲ݄ʣ w झຯɿԻָɼөըɼ͓ञɼλΠϐϯάɼFUD w ೰ΈɿϞς͍ͨ 2

 3. લճdөըσʔτฤd w POFUPPOFͳөըσʔτମݧΛఏڙ͢ΔͨΊʹɼࣗવͳձ࿩ͷத͔ΒϨ Ϗϡʔ఺਺Λਪఆ͢Δ͜ͱΛ໨తͱͨ͠ɽ w ͦͷͨΊʹɼөըυϝΠϯઐ༻ͷۃੑࣙॻΛ࡞੒͠ɼϨίϝϯυʹ׆༻ 3 https://speakerdeck.com/kur0cky/motemote-data-science-1

 4. өըΛݟͯऴΘΓͰྑ͔ͬͨͷͩΖ͏͔

 5. ࣮੷͕෺ޠ͍ͬͯΔ

 6. Կ͕͍͚ͳ͔ͬͨͷ͔

 7. ࠗ׆αΠτʹΑΔͱ өըͷ༨ӆΛָ͠ΜͩΓɺөըͱ͍͏ڞ௨ͷ࿩୊ͷձ࿩͔Β૬खͷঁੑʹ͍ͭͯ͞Β ʹ஌ΕͨΓͱɺөըͷޙ͸ઈ޷ͷίϛϡχέʔγϣϯͷػձͱͳΓ·͢ɻ͜ͷνϟϯε Λ͍͔ͨ͢Ίʹ΋ɺөըͷޙʹ͸͝൧΍͓஡ͳͲͷ༧ఆΛೖΕ·͠ΐ͏ɻ ภݟʹ·ΈΕ͍ͯΔ͕ɼ͍ͬͨΜ͜ΕΛ৴͡Δɽ

 8. POFUPPOFͳөը͚ͩͰ͸μϝ ͦͷޙͷձ࿩͕ॏཁ

 9. ϩʔϧϓϨΠϯά 9 ͓΋͠Ζ͔ͬͨͶʂʂ ద౰ʹೖͬͪΌ͓͏͔ ͜Ε͸തଧɽऑऀʹ͸ద౰͕Θ͔ΒΜ ͓͍͍͠ΠλϦΞϯ༧໿͓͍ͯͨ͠Μͩʂʂ ༻ҙप౸͗ͯ͢ॏ͍͠ɼ͓ෲ͕ݮͬͯͳ͍͔΋͠Εͳ͍ ͳʹ͔৯΂ͳ͕Β஻Ζ͏ʂ গ͠า͍ͯɼྑ͛͞ͳͱ͜Ζ୳ͦ͞͏͔ ૬खͷੑ֨΍৔ॴʹେ͖͘ґଘɽา͖͗͢͸/(ɽϦεΩʔ

 10. ͦͷ৔ͰαΫͬͱݕࡧͰ͖Δ ΞϓϦ͕ॏཁ

 11. 3ͱFMBTUJDTFBSDI BU5PLZP3 LVSDLZ

 12. w શจݕࡧγεςϜ w ಛ௃ɿ w ߴ଎ɽ෼ࢄܕͰεέʔϥϒϧɽΦʔϓϯιʔεɽ3&45"1*ɽ +40/ʹΑΔॊೈͳσʔλߏ଄ɽείΞϦϯάͷΧελϚΠζɽ 12 ՄࢹԽɾ෼ੳ ݕࡧ

  σʔλऩू
 13. Πϯετʔϧɾىಈ w .BDͷ৔߹ brew tap elastic/tap brew install elastic/tap/elasticsearch-full brew

  install elastic/tap/kibana-full brew install elastic/tap/logstash-full brew install elastic/tap/metricbeat-full elasticsearch & kibana & w IUUQMPDBMIPTUͰ৘ใ͕ฦͬͯ͘Ε͹ىಈ੒ޭ w ,JCBOB IUUQMPDBMIPTU ͷ$POTPMFΛ࢖͏ͱ৭ʑࢼ͠΍͍͢ 13 ଞͷ04͸ɿIUUQTXXXFMBTUJDDPHVJEFFOFMBTUJDTUBDLDVSSFOUJOTUBMMJOHFMBTUJDTUBDLIUNM
 14. جຊ༻ޠ w ΠϯσοΫε w FMBTUJDTFBSDI͕ݕࡧɾղੳͷର৅ͱ͢Δσʔλͷอଘઌ w υΩϡϝϯτλΠϓ w ΠϯσοΫε಺ͷάϧʔϓ w

  υΩϡϝϯτ w FMBTUJDTFBSDI಺ʹอଘ͞Εͨσʔλ w ϑΟʔϧυ w υΩϡϝϯτʹؚ·ΕΔଐੑ 14 σʔλϕʔε ςʔϒϧ Ϩίʔυ ΧϥϜ 3%#Ͱ͍͏ͱɹɹ
 15. ؆୯ͳ࢖͍ํ w +40/ͱ3&45"1*ͰઃఆɾΠϯσΩγϯάʢ౤ೖʣɾݕࡧ͢΂ͯΛߦ͏ 15 • ݕࡧ GET index_name/_search { "query"

  : { "match" : { "comment" : "σʔτ" } } } • ΠϯσΩγϯά PUT index_name/ { "name" : "ϥʔϝϯೋ࿠", "genre" : "όʔɾμΠχϯά" } • ઃఆ PUT index_name/ { "settings" : { hogehoge }, "mappings" : { fugafuga } } ܗଶૉղੳ΍શ൒֯ͷ౷ҰͳͲ ༷ʑͳઃఆΛهड़ ౤ೖ͢Δσʔλ͕ͲͷΑ͏ͳ ϑΟʔϧυΛ΋ͪ͏Δͷ͔ɼͦͷܕ΋هड़
 16. 3͔ΒFMBTUJDTFBSDIΛୟ͘ w FMBTUJDύοέʔδ͔Βୟ͘ɽ w σʔλϑϨʔϜΛ௚઀ΠϯσΩγϯάͰ͖Δɽ w શͯΛ3Ͱ׬͍݁ͤͨ͞ʂʂ w ΫΤϦ͸ɼϦετͰॻ͍ͯKTPOMJUFͰ+40/ʹ w

  جຊͷૢ࡞ 1. conn <- connect(host="127.0.0.1", port=9200) 2. docs_bulk(conn, df, index) 3. Search(conn, index, body = <query>) 16
 17. ࣮ફ

 18. ໨త w өըσʔτ͸ײ૝ઓͰ׬੒͢Δ w ϩʔϧϓϨΠϯάͷ݁Ռɼ͍͔ͭ͘ͷཁ݅Λຬͨ͢ඞཁ͕͋Δ w ͦͷ৔ͷঢ়گʹ߹ΘͤͯαΫοͱܾΊΔ͜ͱ͕ॏཁ w جຊతʹ͋·Γา͔ͤͳ͍ w

  Ͱ͖Δ͚࣭ͩͷྑ͍ళʹೖΔ 18
 19. ΍ͬͨ͜ͱ w ࢖༻σʔλ w ौ୩ۙลͷϨετϥϯ݅ w ళ໊ɼδϟϯϧɼਓؾͷΫνίϛɼͦͷଞళฮ৘ใ 19 HFPDPEJOH"1* ݕࡧ6*

  MFBqFUʹΑΔ஍ਤ %5ʹΑΔ៉ྷͳද ݱࡏ஍ͷऔಘ ݕࡧ εΫϨΠϐϯά
 20. ཁ݅ͷୡ੒ w LVSPNPKJQMVHJOʹΑΔલॲཧ w τʔΫϯԽɼશ൒֯౷ҰɼεςϛϯάɼFUD w σϑΥϧτͷείΞʢ#.ʣΛ࢖͏ w ۙ͞ʹԠͨ͡ݮਰؔ਺΋࢖͏ w

  ৯΂ϩάείΞ΋࢖͏ 20 w ೔ຊޠͷݕࡧ w ΫΤϦͱͷϚον౓ w ͋·Γา͔ͤͳ͍ w ͍͍ళʹೖΔ https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji-analyzer.html
 21. ڑ཭ʹԠͨ͡ݮਰؔ਺ w ࠓճ͸ɼҢ౓ɾܦ౓ͦΕͧΕʹ Ψ΢γΞϯΛ࢖༻ w ଞʹ΋ɼࢦ਺ؔ਺΍ઢܗ͕͋Δ 21 https://www.elastic.co/guide/en/elasticsearch/reference/current/query- dsl-function-score-query.html

 22. { "query": { "function_score": { "query": { ී௨ͷΫΤϦ͸͜͜ʹॻ͘ }, "functions":

  [ { "gauss": { "latitude": { "origin": [35.6591], "scale": [0.003] } } }, { "gauss": { "longitude": { "origin": [139.7003], "scale": [0.003] } } }, { "field_value_factor": { "field": ["score"], "factor": [3], "modifier": ["log"], "missing": [1] } } ], "score_mode": ["multiply"] } }, "size": [1000], "_source": ["name", "score", "genre", "tel_number"] } GVODUJPO@TDPSFʹΑΔείΞͷ౷߹ 22 Ң౓ͷݮਰ ܦ౓ͷݮਰ ϑΟʔϧυͷ஋ ͦͷ΋ͷΛ࢖͏ ͜ΕΒͷֻ͚ࢉͰ ࠷ऴతͳείΞͱ͢Δ KTPOΈʹͯ͘͘ਃ͠༁ͳ͍Ͱ͢ɾɾɾ
 23. σϞ

 24. None
 25. ·ͱΊ w POFUPPOFөըσʔτମݧΛఏڙ͢ΔͨΊʹ͸ɼ૬खͷ޷ΈΛεϜʔζ ʹఆྔԽ͢Δ͚ͩͰ͸μϝͩͬͨɽ w ʮөըσʔτ͸ײ૝ઓͰ׬੒͢Δʯͱ͍͏Ծઆͷ΋ͱɼྑ͍ళΛαΫͬͱ ݕࡧͰ͖ΔΞϓϦΛ࡞੒ͨ͠ɽ w ಛʹɼݱࡏҐஔ͔Βͷڑ཭ɾ৯΂ϩάͷείΞɾΫΤϦͱͷϚον౓Λ ߟྀͨ͠είΞϦϯάΛߦͬͨɽ

  25 FMBTUJDTFBSDI͸͍͍ͧʂ ʢެࣜυΩϡϝϯτ͕ຊ౰ʹ਌੾ʣ
 26. &OKPZ