Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BigQuery ML Hands-on

BigQuery ML Hands-on

BigQuery ML hands-on @ wantedly

Makoto Tanji @tan_z_tan

Makoto Tanji

August 09, 2018
Tweet

More Decks by Makoto Tanji

Other Decks in Programming

Transcript

  1. ©2018 Wantedly, Inc. ࣗݾ঺հ Makoto Tanji ୮࣏ ৴ @tan_z_tan (@tan-z-tan)

    ػցֶशνʔϜ@Wantedly • SQL͸େֶ࣌୅શ͘ॻ͔ͳ͔͚ͬͨ Ͳ࢓ࣄ͢Δ༻ʹͳ֮ͬͯ͑ͨ • झຯ: νΣϩ, ϫΧαΪ௼Γ
  2. ©2018 Wantedly, Inc. 1. Hello BigQuery ML (10 min.) •

    What is BQ ML? • BQ ML in Wantedly 2. Hands on (35 min.) • Model Creation • Model Evaluation • +Spreadsheet 3. Summary (10 min.) ࠓ೔΍Δ͜ͱ
  3. ©2018 Wantedly, Inc. 1. σʔλͷҠಈ͕͍Βͳ͍ 1. σʔλίϐʔ͕ͳ͍ͷͰ୭͔ʹཔΉ͜ͱ͕ඞཁͳ͍ 2. ηΩϡϦςΟϦεΫܰݮ 2.

    ΞϧΰϦζϜ͸༻ҙ͞Ε͍ͯΔ • σʔλͱ΍Γ͍ͨ͜ͱΛࢦఆ͢Δ͚ͩ • ػցֶशͷຽओԽ ΤϯδχΞ޲͚ͱ͍͏ΑΓɺࠓ͋Δσʔλ͔ΒϏδωεʹ׆͔͍ͨ͠ਓʹɻ BigQuery ML
  4. ©2018 Wantedly, Inc. WantedlyͷதͰͷར༻ ࢼݧతʹWantedly PeopleͷPushʹར༻ • Ϣʔβͷߦಈϩά͔Βʮࠓ೔ૹΔPushΛ։෧͢Δ֬཰ʯΛ༧ଌ • ։෧཰͕ᮢ஋Ҏ্Ͱ͋Ε͹ૹΔ

    • ແବͳPushૹ৴͕ݮΒ͍ͨ͠ • Ͱ͖ͨ͜ͱ • ᮢ஋Λௐ੔͢Δ͜ͱͰɺૹ৴਺ͱ։෧཰ͷτϨʔυΦϑ͕ௐ੔Ͱ͖Δ • ֶशɾ࢓૊ΈԽ;͘Ίͯ3೔͘Β͍Ͱ࣮૷Ͱ͖ͨ BQ ML in Wantedly
  5. ©2018 Wantedly, Inc. 1. ࣄલ४උ 1. BigQuery Account + Google

    Spreadsheet 2. Browser (Chrome) 2. Ϟσϧͷஔ͖৔Λ࡞Δ 3. ֶश 4. ධՁɾՄࢹԽ BigQuery ML
  6. ©2018 Wantedly, Inc. • CREATE MODEL • ML.EVALUATE • ML.PREDICT

    ఏڙ͞Ε͍ͯΔػೳ • ML.WEIGHTS • ML.ROC_CURVE • ML.TRAINING_INFO • ML.FEATURE_INFO
  7. ©2018 Wantedly, Inc. • CREATE MODEL • ML.EVALUATE • ML.PREDICT

    ఏڙ͞Ε͍ͯΔػೳ • ML.WEIGHTS • ML.ROC_CURVE • ML.TRAINING_INFO • ML.FEATURE_INFO σʔλΛ༩ֶ͑ͯश͠ϞσϧΛ࡞Δ Ϟσϧͷਫ਼౓Λදࣔ ৽ͨͳσʔλʹର͢Δ༧ଌΛग़ྗ ϞσϧͷॏΈΛग़ྗ ᮢ஋Λม͑ͨͱ͖ͷਫ਼౓Λग़ྗ ֶशIterationͷ৘ใ ಛ௃ྔ
  8. ©2018 Wantedly, Inc. Predict Birth Weight #standardSQL CREATE MODEL `tmp.natality_model`

    OPTIONS (model_type='linear_reg', input_label_cols=['weight_pounds']) AS SELECT weight_pounds, is_male, gestation_weeks, mother_age, CAST(mother_race AS string) AS mother_race FROM `bigquery-public-data.samples.natality` WHERE weight_pounds IS NOT NULL AND RAND() < 0.001 ग़࢈࣌ͷମॏΛઢܗճؼ ମॏ = f[ੑผɺ೛৷ظؒɺ฼਌ͷ೥ྸɺ฼਌ͷਓछ] STRINGͷม਺Λࢦఆ͢ΔͱࣗಈͰΧςΰϦม਺ʹͯ͘͠ΕΔ ͏Ε͍͠ ༧ଌ͍ͨ͠ΧϥϜ໊ આ໌ม਺ CREATE MODEL
  9. ©2018 Wantedly, Inc. ML.EVALUATE Predict Birth Weight #standardSQL SELECT *

    FROM ML.EVALUATE(MODEL `tmp.natality_model`, ( SELECT weight_pounds, is_male, gestation_weeks, mother_age, CAST(mother_race AS STRING) AS mother_race FROM `bigquery-public-data.samples.natality` WHERE weight_pounds IS NOT NULL)) ༩͑ͨσʔλʹର͢ΔճؼޡࠩͳͲ
  10. ©2018 Wantedly, Inc. ML.PREDICT Predict Birth Weight #standardSQL SELECT predicted_weight_pounds

    FROM ML.PREDICT(MODEL `tmp.natality_model`, ( SELECT is_male, gestation_weeks, mother_age, CAST(mother_race AS STRING) AS mother_race FROM `bigquery-public-data.samples.natality` WHERE state = "WY"))
  11. ©2018 Wantedly, Inc. ML.WEIGHTS Predict Birth Weight SELECT * FROM

    ML.WEIGHTS(MODEL `tmp.natality_model`) ΧςΰϦม਺ͷ֤ཁૉ ϞσϧͷॏΈΛදࣔ
  12. ©2018 Wantedly, Inc. #standardSQL CREATE MODEL `tmp.sample_make_transaction` OPTIONS(model_type='logistic_reg') AS SELECT

    IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170630' αΠτʹ๚ΕͨϢʔβ͕औҾ͢Δ͔Ͳ͏͔Λ൑ఆ CREATE [OR REPLACE] MODEL User makes transaction or not? औҾ͔ͨ͠ {0, 1} defaultͰ’label’͕ ໨తม਺ʹͳΔ
  13. ©2018 Wantedly, Inc. ML.EVALUATE #standardSQL SELECT * FROM ML.EVALUATE(MODEL `tmp.sample_make_transaction`,

    (SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801')) ͓ͳ͡ΈͷPrecision, Recall, F scoreͳͲ͕Ͱͯ͘Δ User makes transaction or not?
  14. ©2018 Wantedly, Inc. ML.ROC_CURVE User makes transaction or not? #standardSQL

    SELECT * FROM ML.ROC_CURVE(MODEL `tmp.sample_make_transaction`, ( SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801')) thresholdΛม͑ͯTP, TF, FP, FNͳͲΛग़ྗͯ͘͠ΕΔ →͋ͱ͸google spreadsheetͳͲʹग़ྗͯ͠άϥϑԽ
  15. ©2018 Wantedly, Inc. ROC Curve, PR Curve User makes transaction

    or not? Google Spreadsheetʹॻ͖ग़ͯ͠ɺROCΧʔϒ΍PRΧʔϒ͕ඳ͚Δ ͋Δᮢ஋ʹͨ͠ͱ͖ͷrecall, precisionͳͲ͕ͩͤΔˠτϨʔυΦϑͷՄࢹԽ
  16. ©2018 Wantedly, Inc. User makes transaction or not? • ML.WEIGHTS

    ֤ม਺ͷॏΈΛग़ྗͯ͘͠ΕΔ SELECT * FROM ML.WEIGHTS(MODEL `tmp.sample_make_transaction`) ஋͕ߴ͍΄ͲऔҾ͢Δ֬཰͕ߴ͍ MacϢʔβ͸SunOSϢʔβʢʁʣΑΓऔҾ͢Δ֬཰͕ߴ͘༧૝͞ΕΔ
  17. ©2018 Wantedly, Inc. 1. Hello BigQuery ML (10 min.) •

    What is BQ ML? • BQ ML in Wantedly 2. Hands on (35 min.) • Model Creation • Model Evaluation • +Spreadsheet 3. Summary (10 min.) ࠓ೔΍Δ͜ͱ
  18. ©2018 Wantedly, Inc. ·ͱΊ • σʔλΛҠಈͤͣʹBQ্͚ͩͰֶशɾ༧ଌ͕Մೳ • ࠷௿ݶ͜ΕΛ֮͑ͨΒ࢖͑Δ MODEL CREATE,

    EVALUATE, PREDICT • ৄࡉʹத਎Λݟ͍ͨ WEIGHTSɺROC_CURVE, TRAINING_INFO… • ՄࢹԽ Google εϓϨουγʔτͰRecall, Precision, ROCΧʔϒ • ָ͍͠MLϥΠϑΛʂ
  19. ©2018 Wantedly, Inc. ৮ͬͯΈͯͷmisc. 1. ਖ਼ଇԽ߲΋ࢦఆͰ͖Δ l1_reg, l2_reg 2. STRING→ΧςΰϦม਺ͷม׵͕ศར

    “FEMALE” / “MALE” → 2ͭͷdummyม਺Λ࡞ͬͯ͘ΕΔ 3. Unbalancedσʔλʹରͯ͠͸ʁ ಛʹΫϥεʹର͢Δweightͷࢦఆ͸ݟ͔ͭΒͳ͔ͬͨ 4. લճͷଓ͖͔Β࢝ΊΔʹ͸ʁ warm_start = True 5. SQLͰͲ͏΍ͬͯಈ͍͍ͯΔ͔ʁ https://www.wantedly.com/companies/wantedly/post_articles/129482