Save 37% off PRO during our Black Friday Sale! »

Hivemallを使ってSQLで機械学習

7730865ba5b8e2f7703fc9151184d1b6?s=47 babahr
January 16, 2020

 Hivemallを使ってSQLで機械学習

Data Gateway Talk vol.5のLT登壇資料です.
(イベントURL https://data-gateway-talk.connpass.com/event/155457/)

7730865ba5b8e2f7703fc9151184d1b6?s=128

babahr

January 16, 2020
Tweet

Transcript

  1. )JWFNBMMΛ࢖ͬͯ
 42-Ͱػցֶश ϒϨΠϯύουഅ৔͸Δ͔ ೥݄೔
 %BUB(BUFXBZ5BMLWPM
 ෼-5 1% |❚ | 01/22

  2. ࠓ೔ͷ಺༰ ͓఻͍͑ͨ͜͠ͱ w )JWFNBMMͱ͸ԿͰɺͲ͜ͰͲͷΑ͏ʹ࢖͏ͷ͔ w )JWFNBMMΛ࢖͏ͱԿ͕خ͍͠ͷ͔ ࿩͞ͳ͍͜ͱ w )JWFNBMMͰ࣮૷͞Ε͍ͯΔΞϧΰϦζϜͷৄࡉ w

    ࣮૷͢Δͱ͖ͷৄࡉͳ࢖͍ํ 9% |❚❚ | 02/22
  3. ࠓ೔ͷ಺༰ ͓఻͍͑ͨ͜͠ͱ w )JWFNBMM͸Ͳ͜ͰͲͷΑ͏ʹ࢖͏ͷ͔ w )JWFNMMΛ࢖͏ͱԿ͕خ͍͠ͷ͔ ࿩͞ͳ͍͜ͱ w )JWFNBMMͰ࣮૷͞Ε͍ͯΔΞϧΰϦζϜͷৄࡉ w

    )JWFNBMM EJHEBHͷৄࡉͳ࢖͍ํ ɹ)JWFNBMM ΍EJHEBH ͷ࢖͍ํʹؔͯ͠͸ ɹɾެࣜυΩϡϝϯτ ɹɾެࣜHJUIVCͷ࣮૷ྫ ɹɾ։ൃऀͷ༉Ҫ͞Μͷॻ͔Ε͍ͯΔRJJUB ɹɹˢݸਓతʹ͔ͳΓΦεεϝ ɹ౳Λ࢝Ίɺ༗༻ͳهࣄ͕͋Γ·͢ͷͰͦͪΒޚཡ͍ͩ͘͞ 14% |❚❚❚ | 03/22
  4. ࣗݾ঺հ ˗ ໊લɿഅ৔͸Δ͔ ˗ ग़਎ɿ૯߹ݚڀେֶӃେֶఱจՊֶઐ߈
 ଠཅܥ֎࿭੕ ׊৭ᛙ੕ ࿭੕࣭ྔఱମ ˗ σʔλαΠΤϯςΟετΛ͍ͯ͠Δ


    ΫϥΠΞϯτاۀͷΦ΢ϯυϝσ ΟΞͷձһσʔλ΍Ξϯέʔτ σʔλͷ෼ੳͳͲ 18% |❚❚❚❚ | 04/22 גࣜձࣾϒϨΠϯύουʮ "*ʢϓϥεΤʔΞΠʣʯ΢ΣϒαΠτΑΓ
  5. એ఻͍ͤͯͩ͘͞͞ 23% |❚❚❚❚❚ | 05/22

  6.  ϒϨΠϯύου നۚ୆ΦϑΟε ͜͜ 27% |❚❚❚❚❚❚ | 06/22

  7.  ϒϨΠϯύου നۚ୆ΦϑΟε ͜͜ 27% |❚❚❚❚❚❚ | 06/22 ςʔϚ ෺ཧֶʷػցֶश

    ಺༰ ɾػցֶशΛ࢖ͬͨ෺ཧֶ෼໺ͷݚڀ঺հ ɾ෺ཧग़਎σʔλαΠΤϯςΟετͷ͓࢓ࣄͷ࿩ ɾ࿦จ঺հ࿮ ର৅ऀ ɾ෺ཧ͕޷͖ͳσʔλΞφϦετ σʔλαΠΤϯςΟετ .-ΤϯδχΞ ɾ෺ཧ෼໺ͷݚڀऀ ɾ෺ཧֶઐ߈ͷֶੜ
  8. )JWFNBMM 32% |❚❚❚❚❚❚❚ | 07/22

  9. )JWF )JWF2- ˗ )BEPPQɿϏοάσʔλΛѻ͏ͨΊͷϛυϧ΢ΣΞͰ
 ɹɹɹɹฒྻ෼ࢄॲཧ͕Ͱ͖Δ ˗ )JWFɹɿ)BEPPQ্Ͱ%#.4ͷػೳΛఏڙ ˗ )JWF2-ɿ)JWFͰ࢖ΘΕΔ42-νοΫͳσʔλૢ࡞ݴޠ )BEPPQͷ

    ϩΰ )JWFͷϩΰ 36% |❚❚❚❚❚❚❚❚ | 08/22
  10. )JWFNBMM ˗ ػցֶशϥΠϒϥϦ 5SFBTVSF%BUB͕ࣾ։ൃ  ˗ )JWF2-ΫΤϦΛ࢖ֶͬͯश͔Β༧ଌ·ͰػցֶशͷҰ࿈ ͷॲཧΛ࣮ߦ͢Δ͜ͱ͕Մೳ ˗ ϥΠϒϥϦͷத਎͸KBWBͰهड़

    41% |❚❚❚❚❚❚❚❚❚ | 09/22
  11. 42- 45% |❚❚❚❚❚❚❚❚❚❚ | 10/22

  12. SELECT item_id ,order_id FROM order LIMIT 3 ; 50% |❚❚❚❚❚❚❚❚❚❚❚

    | 11/22
  13. +----------+----------+ | item_id | order_id | +----------+----------+ | 1 |

    1 | | 2 | 1 | | 3 | 2 | +----------+----------+ 55% |❚❚❚❚❚❚❚❚❚❚❚❚ | 12/22
  14. )JWFNBMMͰ9(#PPTU 59% |❚❚❚❚❚❚❚❚❚❚❚❚❚ | 13/22 /&8೥݄ʙ

  15. SELECT train_xgboost(features,target) FROM train_table ; ֶश 60% |❚❚❚❚❚❚❚❚❚❚❚❚❚❚ | 14/22

  16. +----------+----------------------+ | model_id | model | +----------+----------------------+ | 1 |L?jrjk,O5v%zkfspel……

    | +----------+----------------------+ 68% |❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚ | 15/22
  17. SELECT xgboost_predict
 (row_id,features,model_id,model) FROM xgb_model ; ༧ଌ 73% |❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚ |

    16/22
  18. σʔλϚʔτ xgb_input ֶश༻ΫΤϦ SELECT train_xgboost(features, label) as (model_id, model) FROM

    (SELECT features ,label FROM xgb_input cluster by rand(43) -- shuffle ) shuffled; xgb_model Ϟσϧ model_id Model 0 L?jrjk,O5v%z|kfspel…… 1 L?jrjk>PS;fkjsa^8~6|…… 2 lsa;p0”>02qJOel~2=*…… ộ ộ xgb_pred ༧ଌ݁Ռ ༧ଌ༻ΫΤϦ SELECT rowid, majority_vote(cast(predicted as int)) as label FROM ( SELECT xgboost_predict_one(rowid, dense_features, model_id, model) as (rowid, predicted) FROM xgb_model l LEFT OUTER JOIN xgb_input r ) t GROUP BY rowid; row_id Label 0 0 1 2 2 1 ộ ộ Ϟσϧ΍༧ଌ݁Ռ͸ςʔϒϧͰ؅ཧ 77% |❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚ | 17/22
  19. )JWFNBMM͕༗ޮͳ৔໘ 82% |❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚ | 18/22

  20. ྫ σδλϧϚʔέςΟϯάྖҬʹ͓͚Δ
 ػցֶशγεςϜ૊ΈࠐΈ ֤ج൫γεςϜ σʔλϕʔε &$ ސ٬৘ใ 8FC ෼ੳ༻؀ڥ Ճ޻ɾసૹ

    લॲཧɹˠɹֶशɹˠɹਪ࿦ Ճ޻ɾసૹ &$ ׆༻ઌ ؀ڥΛ༻ҙ͢ΔϦιʔε Ճ޻ɾసૹͷ࣮૷ ӡ༻ͷίετ 91% |❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚ | 20/22
  21. ֤ج൫γεςϜ σʔλϕʔε &$ ސ٬৘ใ 8FC લॲཧɹˠɹֶशɹˠɹਪ࿦ &$ ׆༻ઌ ✓σʔλϕʔε಺෦ͰػցֶशͷҰ࿈ͷϑϩʔ͕׬݁ ✓σʔλ࿈ܞͷखؒΛলུՄೳ

    ӡ༻໨ઢͰ͸େ͖ͳϝϦοτ 95% |❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚ | 21/22 )JWF͕࢖͑Δ ৔߹ ෼ੳ༻؀ڥ͕ෆཁ
  22. ·ͱΊ w )JWFNBMMͱ͍͏)JWF2-ϕʔεͰॻ͚ΔػցֶशϥΠϒϥϦ͕͋Γɺ σʔλసૹϓϩηε͕ෆཁͳ఺ɺࠓ͋Δ؀ڥͷ··࣮૷Ͱ͖Δ఺͕େ ͖ͳϝϦοτ w )JWF͕࢖͑Δ؀ڥͰػցֶशγεςϜΛ؆୯ʹࢼ͍ͨ͠৔߹ɺ )JWFNBMM͕Φεεϝ w ৮Ε·ͤΜͰ͕ͨ͠

    ࢖͑ΔΞϧΰϦζϜ΋ൺֱతଟذʹΘͨΔ )JWF͕࢖͑Δ؀ڥPS5SFBTVSF%BUBಋೖاۀͰ
 ͓࢓ࣄ͞Ε͍ͯΔ%4ͷํ͸ͥͻ͓ࢼ͠Λʂ 100% |❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚❚| 22/22