Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SQLを活用したデータ分析におけるChatGPTの活用法

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

 SQLを活用したデータ分析におけるChatGPTの活用法

Avatar for hikarut

hikarut

May 20, 2023
Tweet

More Decks by hikarut

Other Decks in Technology

Transcript

  1. © )JLBSV5BLBIBTIJ  MJHIU ߴڮޫ )JLBSV5BLBIBTIJ ࣗݾ঺հ 1SPpMF ෳۀͰϓϩόεέοτϘʔϧϦʔάͷΫϥϒνʔϜͰϚʔέςΟϯάετϥςδετͱͯ͠ σʔλ෼ੳͷαϙʔτΛߦͬͨΓɺͦͷଞෳ਺ࣾͰ42-Λ׆༻ͨ͠σʔλ෼ੳͷαϙʔτܦ

    ݧ͋Γɻʰσʔλ෼ੳྗΛߴΊΔϏδωεύʔιϯͷͨΊͷ42-ೖ໳ʱͷஶॻɻ ୈճҙࢥܾఆͷͨΊͷσʔλ෼ੳษڧձʹͯʰҙࢥܾఆʹܨ͛ΔͨΊͷҼՌਪ࿦ʢ%*%ʣ ͱ1ZUIPOʹΑΔ$BVTBM*NQBDUʱͰൃදɻ ຊۀ ෳۀɾݸਓ׆ಈ ೥ʹ৽ଔͰϠϑʔגࣜձࣾʹೖࣾ͠ɺ8FCΤϯδχΞͱͯ͠αʔϏεͷ։ൃɺӡ༻ɺاըɺ ఏҊͳͲΛ୲౰ɻ೥͔Β͸גࣜձࣾΠʔϒοΫΠχγΞςΟϒδϟύϯʹग़޲͠ɺࣾ಺Ͱ σʔλαΠΤϯεάϧʔϓΛ্ཱͪ͛άϧʔϓϚωδϟʔ ෦௕ ͱͯࣾ͠಺ͷσʔλ׆༻Λଅਐɻ ೥͔Β֎ࢿܥίϯαϧςΟϯάاۀʹೖࣾ͠σʔλΛ׆༻ͨ͠ϚʔέςΟϯάࢧԉΛ࣮ࢪɻ
  2. © )JLBSV5BLBIBTIJ  σʔλ෼ੳͰҙࢥܾఆʹͭͳ͛ΔͨΊͷͭͷTUFQ σʔλ෼ੳʹ͓͚ΔͭͷTUFQ     

      TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ ՝୊ Ծઆ ૝ఆ ෼ੳ ݁Ռ ߟ࡯ ߦಈ w ͳͥ෼ੳΛ͢Δ ͷ͔ʢ8IZʣ w ͦ΋ͦ΋ͷ՝୊ Λઃఆ͢Δ w ෼ੳΛ͢Δ͜ͱ Ͱ໌Β͔ʹͨ͠ ͍͜ͱ͸Կ͔ɺ ͦͷͨΊʹͲΜ ͳԾઆ͕͋Δͷ ͔Λߟ͑Δ w ෼ੳͱ͸ԾઆΛ ݕূ͢Δ͜ͱ w ෼ੳ݁Ռ͕෼ ͔ͬͨΒͲΜͳ ҙࢥܾఆ͕Ͱ͖ Δ͔ʁ w ࣄલʹ૝ఆΞΫ γϣϯΛߟ͑ͯ ͓͘͜ͱͰ෼ੳ ͔Βҙࢥܾఆ͠ ΍͘͢͢Δ w ͲͷΑ͏ʹ෼ੳ Λߦ͏͔ʁ w ۩ମతͳ෼ੳख ஈɻ42-Λ࢖ͬ ͨ෼ੳ΍ 1ZUIPOΛ࢖ͬ ͨ෼ੳͳͲ w ࡍͷ෼ੳ݁Ռ͕ Ͳ͏ͩͬͨͷ ͔ʁ w ਺஋ͱͯ͠ͷ ϑΝΫτ৘ใΛ ·ͱΊΔ w ෼ੳ݁Ռ͔ΒͲ ͷΑ͏ͳղऍ͕ Ͱ͖Δͷ͔ɺͲ ΜͳఏҊ͕Ͱ͖ ͦ͏͔ʁ w ϑΝΫτ৘ใΛ ͲͷΑ͏ʹղऍ ͯ͠ɺͲΜͳࣔ ࠦΛग़͔͢ w ෼ੳ݁ՌΛड͚ ͯ࣍ͷΞΫγϣ ϯ͸Ͳ͏͢Δͷ ͔ʁ w ͲΜͳҙࢥܾఆ Λͨ͠ͷ͔ʢ͢ Δͷ͔ʣ
  3. © )JLBSV5BLBIBTIJ  σʔλ෼ੳʹ͓͚ΔͭͷTUFQɹ۩ମྫ σʔλ෼ੳʹ͓͚ΔͭͷTUFQ     

      TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ ՝୊ Ծઆ ૝ఆ ෼ੳ ݁Ռ ߟ࡯ ߦಈ w ೥͔Β&$ αΠτͷച্͕ Լ͕͍ͬͯΔɻ &$αΠτͷച্ Λ্͛ΔͨΊʹ Ͳ͏͢Ε͹ྑ͍ ͔ w ঎඼"Λߪೖ͠ ͍ͯΔϢʔβʔ ͸঎඼#΋Ұॹ ʹߪೖ͍ͯ͠Δ ͷͰ͸ͳ͍͔ w ঎඼"Λങͬͯ Δਓʹ঎඼#Λ ϝʔϧͰ͓קΊ ͢Δ w %8)͔Β42- Λ࢖ͬͯόε έοτ෼ੳΛ࣮ ࢪ w ঎඼"Λߪೖ͠ ͍ͯΔਓ͕ ਓ w ͦͷ͏ͪ঎඼# Λߪೖ͍ͯ͠Δ ਓ͕ਓ w ͦͷ͏ͪ঎඼$ Λߪೖ͍ͯ͠Δ ਓ͕ਓ w ঎඼"͸঎඼# ΑΓ΋঎඼$ͷ ํ͕ҰॹʹങΘ Ε͓ͯΓɺ঎඼ "ͱ঎඼$ͷ૬ ੑ͕Α͍ w ঎඼"Λߪೖ͠ ͯ঎඼$Λߪೖ ͍ͯ͠ͳ͍ਓʹ ঎඼$ͷׂҾ ΫʔϙϯΛϝʔ ϧͰ഑෍͢Δ͜ ͱΛܾఆ
  4. © )JLBSV5BLBIBTIJ  $IBU(15ʹΑͬͯஔ͖׵ΘΔ΋ͷɾஔ͖׵ΘΒͳ͍΋ͷ σʔλ෼ੳʹ͓͚ΔͭͷTUFQ     

      TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ ՝୊ Ծઆ ૝ఆ ෼ੳ ݁Ռ ߟ࡯ ߦಈ º ˓ ˚ ˕ ˕ ˚ º ՝୊ͷઃఆ͸ਓ͕ߦ͏͕ɺ՝୊͕ܾ· Ε͹ԾઆΛग़͢͜ͱ΋$IBU(15Ͱ୅ ସՄೳ σʔλͷղऍ͸Ұൠ࿦ ͱͯ͠͸$IBU(15ͰҰ ఆͷߟ࡯ม͑ΒΕΔ͕ ҙࢥܾఆ͸ਓ͕ؒߦ͏ 1ZUIPO΍42-Λ࢖ͬͨ ۩ମతͳ෼ੳ͸ $IBU(15ʹΑͬͯ୅ସ ͞Ε΍͍͢
  5. © )JLBSV5BLBIBTIJ  42-Λ࢖ͬͨσʔλ෼ੳ σʔλ෼ੳʹ͓͚ΔͭͷTUFQʮ෼ੳɾ݁Ռʯ େاۀͳͲͰΑ͋͘ΔҰൠతͳσʔλ෼ੳ؀ڥ 42- 42- ෼ੳ݁Ռ σʔλ

    42-Λ࢖͏͜ͱͰͦͷ࣌ʑʹԠͯ͡ඞཁͳσʔλ Λૉૣ͘ूܭ͢Δ͜ͱ͕Ͱ͖Δ w 42-͕࢖͑Δͱ൚༻తʹσʔλ෼ੳʹ࢖͑Δ w ͔͠͠42-Λ࢖ͬͨ෼ੳ͸$IBU(15ʹஔ͖׵Θͬͯ͠·͏ʂʁ جװγεςϜ %8)ʢσʔλ΢ΣΞϋ΢εʣ #*πʔϧ
  6. © )JLBSV5BLBIBTIJ  $IBU(15ͷϓϩϯϓτ $IBU(15Λ׆༻ͨ͠σʔλ෼ੳʹ͓͚Δ42-ͷੜ੒ ࣍ͷ৚݅ʹै͍ɺ೥ͷ೥ؒͷϢʔβʔ͝ͱͷߪ ೖ݅਺ʹԠͯ͡).- ϔϏʔɺϛυϧɺϥΠτ ʹ෼ ྨ͠ɺ).-͝ͱͷϢʔβʔ਺Λूܭ͍ͯͩ͘͠͞ɻ

    ·ͨɺ͜ͷͱ͖Ϣʔβʔ৘ใ͕ۭͷσʔλͱΩϟϯη ϧͷ͋Δߪೖσʔλ͸আ֎͍ͯͩ͘͠͞ɻ ϔϏʔ೥ؒͷߪೖ݅਺͕݅Ҏ্ ϛυϧ೥ؒͷߪೖ݅਺͕݅Ҏ্ɺ݅ະຬ ϥΠτ೥ؒͷߪೖ݅਺͕݅ະຬ ˞ʰσʔλ෼ੳྗΛߴΊΔϏδωεύʔιϯͷͨΊͷ42-ೖ໳ʱୈষԋश໰୊ΑΓ ࣄલʹςʔϒϧͷεΩʔϚΛΠϯϓοτ ໰୊ʢϓϩϯϓτʣ &$αΠτͰ).-෼ੳΛߦ͏ྫ
  7. © )JLBSV5BLBIBTIJ  ਓ͕ॻ͍ͨ42-ͱ$IBU(15͕ॻ͍ͨ42-ͷൺֱ $IBU(15Λ׆༻ͨ͠σʔλ෼ੳʹ͓͚Δ42-ͷੜ੒ ᶃਓؒ ᶄ$IBU(15 ᶅ42-$3&"503 WITH hml_users

    AS( SELECT user_id, CASE WHEN COUNT(DISTINCT order_id) >= 500 THEN 'ϔϏʔ' WHEN COUNT(DISTINCT order_id) >= 300 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS hml FROM orders WHERE user_id IS NOT NULL AND order_date BETWEEN '2022-01-01' AND '2022-12-31' AND is_canceled = 0 GROUP BY user_id ) SELECT hml, COUNT(DISTINCT user_id) AS uu FROM hml_users GROUP BY hml SELECT CASE WHEN order_counts >= 500 THEN 'ϔϏʔ' WHEN order_counts >= 300 AND order_counts < 500 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS hml, COUNT(DISTINCT orders.user_id) AS user_count FROM orders JOIN ( SELECT user_id, COUNT(*) AS order_counts FROM orders JOIN products ON orders.order_product_id = products.product_id WHERE orders.is_canceled = 0 AND orders.order_date >= '2022-01-01' AND orders.order_date < '2023-01-01' GROUP BY user_id ) AS user_orders ON orders.user_id = user_orders.user_id WHERE orders.is_canceled = 0 AND orders.order_date >= '2022-01-01' AND orders.order_date < '2023-01-01' GROUP BY hml; SELECT CASE WHEN purchases_per_user.user_purchases >= 500 THEN 'ϔϏʔ' WHEN purchases_per_user.user_purchases >= 300 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS HML, COUNT(DISTINCT orders.user_id) AS user_count FROM orders INNER JOIN ( SELECT user_id, COUNT(*) AS user_purchases FROM orders WHERE orders.is_canceled = 0 AND orders.order_date BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY user_id ) AS purchases_per_user ON orders.user_id = purchases_per_user.user_id INNER JOIN users ON orders.user_id = users.user_id AND users.is_deleted = 0 GROUP BY HML;
  8. © )JLBSV5BLBIBTIJ  ਓ͕ॻ͍ͨ42-ͱ$IBU(15͕ॻ͍ͨ42-ͷൺֱ $IBU(15Λ׆༻ͨ͠σʔλ෼ੳʹ͓͚Δ42-ͷੜ੒ ᶃਓؒ ᶄ$IBU(15 ᶅ42-$3&"503 ˓ ˕

    ˚ w ݁Ռ͸ਖ਼ղ͕ͩগ͠42- ͕৑௕ w ༨෼ͳ৚͕݅ೖͬͯඍົ ʹҧ͏݁Ռ w 42-͕গ͠৑௕ w XJUI۟Λ࢖ͬͯՄಡੑͷ ߴ͍42-Λҙࣝ
  9. © )JLBSV5BLBIBTIJ  $IBU(15Ͱ42-Λग़ྗ͢Δ࣌ͷϙΠϯτ ϓϩϯϓτΛมߋͯ͠42-Λग़ྗ ࣍ͷ৚݅ʹै͍ɺ೥ͷ೥ؒͷϢʔβʔ͝ͱͷߪೖ݅਺ʹԠͯ͡).- ϔϏʔɺϛυϧɺϥΠτ ʹ ෼ྨ͠ɺ).-͝ͱͷϢʔβʔ਺Λूܭ͍ͯͩ͘͠͞ɻ·ͨɺ͜ͷͱ͖Ϣʔβʔ৘ใ͕ۭͷσʔλͱΩϟ ϯηϧͷ͋Δߪೖσʔλ͸আ֎͍ͯͩ͘͠͞ɻσʔλϕʔε͸42-JUFͰ͢ɻ·ͨɺ42-͸XJUI۟Λ࢖ͬ

    ͯՄಡੑͷߴ͍42-ʹ͍ͯͩ͘͠͞ɻ ϔϏʔ೥ؒͷߪೖ݅਺͕݅Ҏ্ ϛυϧ೥ؒͷߪೖ݅਺͕݅Ҏ্ɺ݅ະຬ ϥΠτ೥ؒͷߪೖ݅਺͕݅ະຬ ໰୊ʢϓϩϯϓτʣ ✔࢖͍ͬͯΔσʔλϕʔεΛࢦఆ͢Δʢ42-JUFɺ.Z42-ɺ1PTUHSF42-ͳͲʣ ✔ʮXJUI۟Λ࢖ͬͯʯͳͲՄಡੑΛߴΊΔΑ͏ʹ໌ࣔతʹࢦఆ͢Δ $IBU(15Ͱ42-Λग़ྗ͢Δ࣌ͷϙΠϯτ
  10. © )JLBSV5BLBIBTIJ  ଟগ͸վળ͞ΕΔ͕·ͩ৑௕ͳ42- $IBU(15Λ׆༻ͨ͠σʔλ෼ੳʹ͓͚Δ42-ͷੜ੒ ᶄ$IBU(15 ᶅ42-$3&"503 WITH -- 2022೥ʹ஫จ͞Εͨߪೖσʔλͷநग़

    order_2022 AS ( SELECT * FROM orders WHERE order_date BETWEEN '2022-01-01' AND '2022-12-31' AND is_canceled = 0 ), -- Ϣʔβʔ͝ͱͷ೥ؒߪೖ݅਺ͷܭࢉ user_purchase_count AS ( SELECT user_id, COUNT(DISTINCT order_id) AS purchase_count FROM order_2022 GROUP BY user_id ), -- ϢʔβʔΛHML͝ͱʹ෼ྨ͢Δ user_hml AS ( SELECT user_id, CASE WHEN purchase_count >= 500 THEN 'ϔϏʔ' WHEN purchase_count >= 300 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS hml FROM user_purchase_count ), -- HML͝ͱͷϢʔβʔ਺Λूܭ͢Δ hml_user_count AS ( SELECT hml, COUNT(DISTINCT user_id) AS user_count FROM user_hml GROUP BY hml ) -- ࠷ऴతͳग़ྗ݁Ռ SELECT hml, user_count FROM hml_user_count ORDER BY hml DESC WITH purchase_counts AS ( SELECT user_id, COUNT(*) AS cnt FROM orders WHERE is_canceled = 0 AND order_date BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY user_id ) SELECT CASE WHEN cnt >= 500 THEN 'ϔϏʔ' WHEN cnt >= 300 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS HML, COUNT(*) AS user_count FROM purchase_counts JOIN users ON purchase_counts.user_id = users.user_id WHERE users.is_deleted = 0 GROUP BY HML;
  11. © )JLBSV5BLBIBTIJ  (PPHMFͷੜ੒"*ʰ#BSEʱ 0OFNPSFUIJOH IUUQTCBSEHPPHMFDPN #BSE w (PPHMF͕։ൃͨ͠ੜ੒త"*Ͱɺ࠷৽ͷ"* ج൫Ϟσϧʮ1B-.ʯΛར༻

    w ೥݄೔࣌఺Ͱ΢ΣΠτϦετ͕ ͳ͘ͳΓΧࠃҎ্ͰӳޠͰͷར༻͕ ՄೳʹͳΓɺ೔ຊޠʹ΋ରԠ w ೥݄೔࣌఺Ͱ͸ʮࢼݧӡ༻൛ʯ
  12. © )JLBSV5BLBIBTIJ  (PPHMFͷੜ੒"*ʰ#BSEʱ 0OFNPSFUIJOH #BSE SELECT COUNT(DISTINCT user_id) AS

    user_count, CASE WHEN COUNT(order_id) >= 500 THEN 'ϔϏʔ' WHEN COUNT(order_id) >= 300 AND COUNT(order_id) < 500 THEN 'ϛυϧ' WHEN COUNT(order_id) < 300 THEN 'ϥΠτ' END AS category FROM orders WHERE user_id IS NOT NULL AND is_canceled = FALSE AND order_date BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY category; w ΤϥʔͰ࣮ߦ݁Ռ͕ฦͬͯ͜ͳ͍ w ͨͩ͠ɺ#BSE͸ෳ਺ͷճ౴͕ฦͬ ͯ͘ΔͷͰɺҰ෦ਖ਼ղ΋ฦͬͯ͘Δ º
  13. © )JLBSV5BLBIBTIJ  ·ͱΊ 4VNNBSZ σʔλ෼ੳʹ͸̓ͭͷTUFQ͕͋Γ$IBU(15ͳͲͷ"*ʹΑͬͯ୅ ସ͞ΕΔ෦෼ͱ͞Εͳ͍෦෼͕͋Δ  w ద੾ͳ໰͍ͷઃఆʢ՝୊ʣͱҙࢥܾఆʢߦಈʣ͸୅ସ͞Εͳ͍

    w 42-ͷੜ੒ͳͲ෼ੳ΍݁Ռͷநग़͸$IBU(15ʹΑͬͯ୅ସ͞ΕΔՄೳੑ͸͋Δ $IBU(15΍#BSEΛਖ਼͘͠׆༻͢ΔͨΊͷೳྗ͕ඞཁ  w ద੾ͳ໰͍Λઃఆͯ͠ϓϩϯϓτΛ౤͛Δٕज़ w ؒҧͬͨ42-͕ฦͬͯ͘Δ͜ͱ΋͋ΔͷͰɺग़ྗ͞Εͨ42-͕ਖ਼͍͔͠Λ൑அ͢Δ͚ͩͷೳྗ͕ඞཁ w ͢Ͱʹ஌ͬͯΔ΋ͷΛΑΓޮ཰తʹऔಘ͍ͨ͠৔߹ʹศར w ͦͷͨΊʹجຊతͳ42-ͳͲͷٕज़͸ֶΜͩํ͕Α͍ʢద੾ͳࢦࣔΛग़ͨ͢Ίɺ݁Ռ͕ਖ਼͍͔͠Λ൑அ͢Δ ͨΊʣ
  14. &/%