BigQuery ML Hands-on

BigQuery ML Hands-on

BigQuery ML hands-on @ wantedly

Makoto Tanji @tan_z_tan

Cecfa78bd810db9409151a7ee9aaa346?s=128

Makoto Tanji

August 09, 2018
Tweet

Transcript

  1. ©2018 Wantedly, Inc. BigQuery MLϋϯζΦϯ ଎शձ Makoto Tanji 2018-08-09

  2. ©2018 Wantedly, Inc. ࣗݾ঺հ Makoto Tanji ୮࣏ ৴ @tan_z_tan (@tan-z-tan)

    ػցֶशνʔϜ@Wantedly • SQL͸େֶ࣌୅શ͘ॻ͔ͳ͔͚ͬͨ Ͳ࢓ࣄ͢Δ༻ʹͳ֮ͬͯ͑ͨ • झຯ: νΣϩ, ϫΧαΪ௼Γ
  3. ©2018 Wantedly, Inc. 1. Hello BigQuery ML (10 min.) •

    What is BQ ML? • BQ ML in Wantedly 2. Hands on (35 min.) • Model Creation • Model Evaluation • +Spreadsheet 3. Summary (10 min.) ࠓ೔΍Δ͜ͱ
  4. ©2018 Wantedly, Inc. 8IBUJT#JH2VFSZ.- #2.-

  5. ©2018 Wantedly, Inc. • Α͋͘Γͦ͏ͳ໰୊ • ʢϚʔέςΟϯά԰͞Μʣσʔλ͸BigQueryʹཷ·͍͚ͬͯ͘Ͳɺେن໛ͩ͠෼ੳνʔ ϜʹγΰτΛཔ·ͳ͍ͱɻɻɻ • ʢ෼ੳ԰͞ΜʣBigQuery͔Β෼ੳϓϥοτϑΥʔϜʹσʔλΛίϐʔͯ͠෼ੳͨ͠ɻͰ

    ΋଍Γͳ͍σʔλ͕͔͋ͬͨΒɺ·ͨσʔλͷίϐʔ͔ΒͰ͢ɻɻɻ →γΰτͰίίϩΦυϥͳ͍ BigQuery ML
  6. ©2018 Wantedly, Inc. BigQuery ML Data Scientist΍ ML Engineer͕͍ͳͯ͘΋ σʔλ͔Βҙࢥܾఆ͍ͨ͠ɻ

    BigQuery MLͰͰ͖·͢ɻαʔόϨεͳ؀ڥ Ͱ͋ͳͨ̍ਓͰͰ͖·͢ɻ
  7. ©2018 Wantedly, Inc. 1. σʔλͷҠಈ͕͍Βͳ͍ 1. σʔλίϐʔ͕ͳ͍ͷͰ୭͔ʹཔΉ͜ͱ͕ඞཁͳ͍ 2. ηΩϡϦςΟϦεΫܰݮ 2.

    ΞϧΰϦζϜ͸༻ҙ͞Ε͍ͯΔ • σʔλͱ΍Γ͍ͨ͜ͱΛࢦఆ͢Δ͚ͩ • ػցֶशͷຽओԽ ΤϯδχΞ޲͚ͱ͍͏ΑΓɺࠓ͋Δσʔλ͔ΒϏδωεʹ׆͔͍ͨ͠ਓʹɻ BigQuery ML
  8. ©2018 Wantedly, Inc. WantedlyͷதͰͷར༻ ࢼݧతʹWantedly PeopleͷPushʹར༻ • Ϣʔβͷߦಈϩά͔Βʮࠓ೔ૹΔPushΛ։෧͢Δ֬཰ʯΛ༧ଌ • ։෧཰͕ᮢ஋Ҏ্Ͱ͋Ε͹ૹΔ

    • ແବͳPushૹ৴͕ݮΒ͍ͨ͠ • Ͱ͖ͨ͜ͱ • ᮢ஋Λௐ੔͢Δ͜ͱͰɺૹ৴਺ͱ։෧཰ͷτϨʔυΦϑ͕ௐ੔Ͱ͖Δ • ֶशɾ࢓૊ΈԽ;͘Ίͯ3೔͘Β͍Ͱ࣮૷Ͱ͖ͨ BQ ML in Wantedly
  9. ©2018 Wantedly, Inc. )BOETPO࢖ͬͯΈΑ͏

  10. ©2018 Wantedly, Inc. 1. ࣄલ४උ 1. BigQuery Account + Google

    Spreadsheet 2. Browser (Chrome) 2. Ϟσϧͷஔ͖৔Λ࡞Δ 3. ֶश 4. ධՁɾՄࢹԽ BigQuery ML
  11. ©2018 Wantedly, Inc. https://console.cloud.google.com/bigquery (࠷ۙ৽͍͠UIʹͳͬͨ)

  12. ©2018 Wantedly, Inc. • ֶशͨ͠ϞσϧΛஔ͘৔ॴΛܾΊΔɻࠓ೔͸tmpͱ͔bqml_tutorialͱ͔ͰOK • ྫ͑͹ “bqml_tutorial.model_1”Έ͍ͨͳ৔ॴʹϞσϧ͕อଘ͞ΕΔɻ • ӡ༻ɾ؅ཧ͸Ͳ͏͢ΔΜͩΖ͏ʁe.g.

    ୭͔͕ϞσϧΛফͯ͠͠·ͬͨΒʁ • →͜͜Β΁Μ͸ӡ༻ͷ࠷దԽ͕ඞཁ͔΋ ʢͳ͚Ε͹ʣϞσϧͷஔ͖৔ॴΛ࡞Δ
  13. ©2018 Wantedly, Inc. • CREATE MODEL • ML.EVALUATE • ML.PREDICT

    ఏڙ͞Ε͍ͯΔػೳ • ML.WEIGHTS • ML.ROC_CURVE • ML.TRAINING_INFO • ML.FEATURE_INFO
  14. ©2018 Wantedly, Inc. • CREATE MODEL • ML.EVALUATE • ML.PREDICT

    ఏڙ͞Ε͍ͯΔػೳ • ML.WEIGHTS • ML.ROC_CURVE • ML.TRAINING_INFO • ML.FEATURE_INFO σʔλΛ༩ֶ͑ͯश͠ϞσϧΛ࡞Δ Ϟσϧͷਫ਼౓Λදࣔ ৽ͨͳσʔλʹର͢Δ༧ଌΛग़ྗ ϞσϧͷॏΈΛग़ྗ ᮢ஋Λม͑ͨͱ͖ͷਫ਼౓Λग़ྗ ֶशIterationͷ৘ใ ಛ௃ྔ
  15. ©2018 Wantedly, Inc. TutorialΛ΍ͬͯΈΔ https://cloud.google.com/bigquery/docs/bigqueryml-natality?hl=ja https://cloud.google.com/bigquery/docs/bigqueryml-analyst-start

  16. ©2018 Wantedly, Inc. Predict Birth Weight using Linear Regression

  17. ©2018 Wantedly, Inc. Predict Birth Weight #standardSQL CREATE MODEL `tmp.natality_model`

    OPTIONS (model_type='linear_reg', input_label_cols=['weight_pounds']) AS SELECT weight_pounds, is_male, gestation_weeks, mother_age, CAST(mother_race AS string) AS mother_race FROM `bigquery-public-data.samples.natality` WHERE weight_pounds IS NOT NULL AND RAND() < 0.001 ग़࢈࣌ͷମॏΛઢܗճؼ ମॏ = f[ੑผɺ೛৷ظؒɺ฼਌ͷ೥ྸɺ฼਌ͷਓछ] STRINGͷม਺Λࢦఆ͢ΔͱࣗಈͰΧςΰϦม਺ʹͯ͘͠ΕΔ ͏Ε͍͠ ༧ଌ͍ͨ͠ΧϥϜ໊ આ໌ม਺ CREATE MODEL
  18. ©2018 Wantedly, Inc. ML.EVALUATE Predict Birth Weight #standardSQL SELECT *

    FROM ML.EVALUATE(MODEL `tmp.natality_model`, ( SELECT weight_pounds, is_male, gestation_weeks, mother_age, CAST(mother_race AS STRING) AS mother_race FROM `bigquery-public-data.samples.natality` WHERE weight_pounds IS NOT NULL)) ༩͑ͨσʔλʹର͢ΔճؼޡࠩͳͲ
  19. ©2018 Wantedly, Inc. ML.PREDICT Predict Birth Weight #standardSQL SELECT predicted_weight_pounds

    FROM ML.PREDICT(MODEL `tmp.natality_model`, ( SELECT is_male, gestation_weeks, mother_age, CAST(mother_race AS STRING) AS mother_race FROM `bigquery-public-data.samples.natality` WHERE state = "WY"))
  20. ©2018 Wantedly, Inc. ML.WEIGHTS Predict Birth Weight SELECT * FROM

    ML.WEIGHTS(MODEL `tmp.natality_model`) ΧςΰϦม਺ͷ֤ཁૉ ϞσϧͷॏΈΛදࣔ
  21. ©2018 Wantedly, Inc. Predict whether user makes transaction or not

    using Logistic Regression
  22. ©2018 Wantedly, Inc. #standardSQL CREATE MODEL `tmp.sample_make_transaction` OPTIONS(model_type='logistic_reg') AS SELECT

    IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170630' αΠτʹ๚ΕͨϢʔβ͕औҾ͢Δ͔Ͳ͏͔Λ൑ఆ CREATE [OR REPLACE] MODEL User makes transaction or not? औҾ͔ͨ͠ {0, 1} defaultͰ’label’͕ ໨తม਺ʹͳΔ
  23. ©2018 Wantedly, Inc. ML.EVALUATE #standardSQL SELECT * FROM ML.EVALUATE(MODEL `tmp.sample_make_transaction`,

    (SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801')) ͓ͳ͡ΈͷPrecision, Recall, F scoreͳͲ͕Ͱͯ͘Δ User makes transaction or not?
  24. ©2018 Wantedly, Inc. ML.ROC_CURVE User makes transaction or not? #standardSQL

    SELECT * FROM ML.ROC_CURVE(MODEL `tmp.sample_make_transaction`, ( SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801')) thresholdΛม͑ͯTP, TF, FP, FNͳͲΛग़ྗͯ͘͠ΕΔ →͋ͱ͸google spreadsheetͳͲʹग़ྗͯ͠άϥϑԽ
  25. ©2018 Wantedly, Inc. ROC Curve, PR Curve User makes transaction

    or not? Google Spreadsheetʹॻ͖ग़ͯ͠ɺROCΧʔϒ΍PRΧʔϒ͕ඳ͚Δ ͋Δᮢ஋ʹͨ͠ͱ͖ͷrecall, precisionͳͲ͕ͩͤΔˠτϨʔυΦϑͷՄࢹԽ
  26. ©2018 Wantedly, Inc. User makes transaction or not? • ML.WEIGHTS

    ֤ม਺ͷॏΈΛग़ྗͯ͘͠ΕΔ SELECT * FROM ML.WEIGHTS(MODEL `tmp.sample_make_transaction`) ஋͕ߴ͍΄ͲऔҾ͢Δ֬཰͕ߴ͍ MacϢʔβ͸SunOSϢʔβʢʁʣΑΓऔҾ͢Δ֬཰͕ߴ͘༧૝͞ΕΔ
  27. ©2018 Wantedly, Inc. 4VNNBSZ

  28. ©2018 Wantedly, Inc. 1. Hello BigQuery ML (10 min.) •

    What is BQ ML? • BQ ML in Wantedly 2. Hands on (35 min.) • Model Creation • Model Evaluation • +Spreadsheet 3. Summary (10 min.) ࠓ೔΍Δ͜ͱ
  29. ©2018 Wantedly, Inc. ·ͱΊ • σʔλΛҠಈͤͣʹBQ্͚ͩͰֶशɾ༧ଌ͕Մೳ • ࠷௿ݶ͜ΕΛ֮͑ͨΒ࢖͑Δ MODEL CREATE,

    EVALUATE, PREDICT • ৄࡉʹத਎Λݟ͍ͨ WEIGHTSɺROC_CURVE, TRAINING_INFO… • ՄࢹԽ Google εϓϨουγʔτͰRecall, Precision, ROCΧʔϒ • ָ͍͠MLϥΠϑΛʂ
  30. ©2018 Wantedly, Inc. ৮ͬͯΈͯͷmisc. 1. ਖ਼ଇԽ߲΋ࢦఆͰ͖Δ l1_reg, l2_reg 2. STRING→ΧςΰϦม਺ͷม׵͕ศར

    “FEMALE” / “MALE” → 2ͭͷdummyม਺Λ࡞ͬͯ͘ΕΔ 3. Unbalancedσʔλʹରͯ͠͸ʁ ಛʹΫϥεʹର͢Δweightͷࢦఆ͸ݟ͔ͭΒͳ͔ͬͨ 4. લճͷଓ͖͔Β࢝ΊΔʹ͸ʁ warm_start = True 5. SQLͰͲ͏΍ͬͯಈ͍͍ͯΔ͔ʁ https://www.wantedly.com/companies/wantedly/post_articles/129482
  31. ©2018 Wantedly, Inc. "OZ2VFTUJPOTʁ BigQuery MLϋϯζΦϯ଎शձ