Slide 1

Slide 1 text

©2018 Wantedly, Inc. BigQuery MLϋϯζΦϯ ଎शձ Makoto Tanji 2018-08-09

Slide 2

Slide 2 text

©2018 Wantedly, Inc. ࣗݾ঺հ Makoto Tanji ୮࣏ ৴ @tan_z_tan (@tan-z-tan) ػցֶशνʔϜ@Wantedly • SQL͸େֶ࣌୅શ͘ॻ͔ͳ͔͚ͬͨ Ͳ࢓ࣄ͢Δ༻ʹͳ֮ͬͯ͑ͨ • झຯ: νΣϩ, ϫΧαΪ௼Γ

Slide 3

Slide 3 text

©2018 Wantedly, Inc. 1. Hello BigQuery ML (10 min.) • What is BQ ML? • BQ ML in Wantedly 2. Hands on (35 min.) • Model Creation • Model Evaluation • +Spreadsheet 3. Summary (10 min.) ࠓ೔΍Δ͜ͱ

Slide 4

Slide 4 text

©2018 Wantedly, Inc. 8IBUJT#JH2VFSZ.- #2.-

Slide 5

Slide 5 text

©2018 Wantedly, Inc. • Α͋͘Γͦ͏ͳ໰୊ • ʢϚʔέςΟϯά԰͞Μʣσʔλ͸BigQueryʹཷ·͍͚ͬͯ͘Ͳɺେن໛ͩ͠෼ੳνʔ ϜʹγΰτΛཔ·ͳ͍ͱɻɻɻ • ʢ෼ੳ԰͞ΜʣBigQuery͔Β෼ੳϓϥοτϑΥʔϜʹσʔλΛίϐʔͯ͠෼ੳͨ͠ɻͰ ΋଍Γͳ͍σʔλ͕͔͋ͬͨΒɺ·ͨσʔλͷίϐʔ͔ΒͰ͢ɻɻɻ →γΰτͰίίϩΦυϥͳ͍ BigQuery ML

Slide 6

Slide 6 text

©2018 Wantedly, Inc. BigQuery ML Data Scientist΍ ML Engineer͕͍ͳͯ͘΋ σʔλ͔Βҙࢥܾఆ͍ͨ͠ɻ BigQuery MLͰͰ͖·͢ɻαʔόϨεͳ؀ڥ Ͱ͋ͳͨ̍ਓͰͰ͖·͢ɻ

Slide 7

Slide 7 text

©2018 Wantedly, Inc. 1. σʔλͷҠಈ͕͍Βͳ͍ 1. σʔλίϐʔ͕ͳ͍ͷͰ୭͔ʹཔΉ͜ͱ͕ඞཁͳ͍ 2. ηΩϡϦςΟϦεΫܰݮ 2. ΞϧΰϦζϜ͸༻ҙ͞Ε͍ͯΔ • σʔλͱ΍Γ͍ͨ͜ͱΛࢦఆ͢Δ͚ͩ • ػցֶशͷຽओԽ ΤϯδχΞ޲͚ͱ͍͏ΑΓɺࠓ͋Δσʔλ͔ΒϏδωεʹ׆͔͍ͨ͠ਓʹɻ BigQuery ML

Slide 8

Slide 8 text

©2018 Wantedly, Inc. WantedlyͷதͰͷར༻ ࢼݧతʹWantedly PeopleͷPushʹར༻ • Ϣʔβͷߦಈϩά͔Βʮࠓ೔ૹΔPushΛ։෧͢Δ֬཰ʯΛ༧ଌ • ։෧཰͕ᮢ஋Ҏ্Ͱ͋Ε͹ૹΔ • ແବͳPushૹ৴͕ݮΒ͍ͨ͠ • Ͱ͖ͨ͜ͱ • ᮢ஋Λௐ੔͢Δ͜ͱͰɺૹ৴਺ͱ։෧཰ͷτϨʔυΦϑ͕ௐ੔Ͱ͖Δ • ֶशɾ࢓૊ΈԽ;͘Ίͯ3೔͘Β͍Ͱ࣮૷Ͱ͖ͨ BQ ML in Wantedly

Slide 9

Slide 9 text

©2018 Wantedly, Inc. )BOETPO࢖ͬͯΈΑ͏

Slide 10

Slide 10 text

©2018 Wantedly, Inc. 1. ࣄલ४උ 1. BigQuery Account + Google Spreadsheet 2. Browser (Chrome) 2. Ϟσϧͷஔ͖৔Λ࡞Δ 3. ֶश 4. ධՁɾՄࢹԽ BigQuery ML

Slide 11

Slide 11 text

©2018 Wantedly, Inc. https://console.cloud.google.com/bigquery (࠷ۙ৽͍͠UIʹͳͬͨ)

Slide 12

Slide 12 text

©2018 Wantedly, Inc. • ֶशͨ͠ϞσϧΛஔ͘৔ॴΛܾΊΔɻࠓ೔͸tmpͱ͔bqml_tutorialͱ͔ͰOK • ྫ͑͹ “bqml_tutorial.model_1”Έ͍ͨͳ৔ॴʹϞσϧ͕อଘ͞ΕΔɻ • ӡ༻ɾ؅ཧ͸Ͳ͏͢ΔΜͩΖ͏ʁe.g. ୭͔͕ϞσϧΛফͯ͠͠·ͬͨΒʁ • →͜͜Β΁Μ͸ӡ༻ͷ࠷దԽ͕ඞཁ͔΋ ʢͳ͚Ε͹ʣϞσϧͷஔ͖৔ॴΛ࡞Δ

Slide 13

Slide 13 text

©2018 Wantedly, Inc. • CREATE MODEL • ML.EVALUATE • ML.PREDICT ఏڙ͞Ε͍ͯΔػೳ • ML.WEIGHTS • ML.ROC_CURVE • ML.TRAINING_INFO • ML.FEATURE_INFO

Slide 14

Slide 14 text

©2018 Wantedly, Inc. • CREATE MODEL • ML.EVALUATE • ML.PREDICT ఏڙ͞Ε͍ͯΔػೳ • ML.WEIGHTS • ML.ROC_CURVE • ML.TRAINING_INFO • ML.FEATURE_INFO σʔλΛ༩ֶ͑ͯश͠ϞσϧΛ࡞Δ Ϟσϧͷਫ਼౓Λදࣔ ৽ͨͳσʔλʹର͢Δ༧ଌΛग़ྗ ϞσϧͷॏΈΛग़ྗ ᮢ஋Λม͑ͨͱ͖ͷਫ਼౓Λग़ྗ ֶशIterationͷ৘ใ ಛ௃ྔ

Slide 15

Slide 15 text

©2018 Wantedly, Inc. TutorialΛ΍ͬͯΈΔ https://cloud.google.com/bigquery/docs/bigqueryml-natality?hl=ja https://cloud.google.com/bigquery/docs/bigqueryml-analyst-start

Slide 16

Slide 16 text

©2018 Wantedly, Inc. Predict Birth Weight using Linear Regression

Slide 17

Slide 17 text

©2018 Wantedly, Inc. Predict Birth Weight #standardSQL CREATE MODEL `tmp.natality_model` OPTIONS (model_type='linear_reg', input_label_cols=['weight_pounds']) AS SELECT weight_pounds, is_male, gestation_weeks, mother_age, CAST(mother_race AS string) AS mother_race FROM `bigquery-public-data.samples.natality` WHERE weight_pounds IS NOT NULL AND RAND() < 0.001 ग़࢈࣌ͷମॏΛઢܗճؼ ମॏ = f[ੑผɺ೛৷ظؒɺ฼਌ͷ೥ྸɺ฼਌ͷਓछ] STRINGͷม਺Λࢦఆ͢ΔͱࣗಈͰΧςΰϦม਺ʹͯ͘͠ΕΔ ͏Ε͍͠ ༧ଌ͍ͨ͠ΧϥϜ໊ આ໌ม਺ CREATE MODEL

Slide 18

Slide 18 text

©2018 Wantedly, Inc. ML.EVALUATE Predict Birth Weight #standardSQL SELECT * FROM ML.EVALUATE(MODEL `tmp.natality_model`, ( SELECT weight_pounds, is_male, gestation_weeks, mother_age, CAST(mother_race AS STRING) AS mother_race FROM `bigquery-public-data.samples.natality` WHERE weight_pounds IS NOT NULL)) ༩͑ͨσʔλʹର͢ΔճؼޡࠩͳͲ

Slide 19

Slide 19 text

©2018 Wantedly, Inc. ML.PREDICT Predict Birth Weight #standardSQL SELECT predicted_weight_pounds FROM ML.PREDICT(MODEL `tmp.natality_model`, ( SELECT is_male, gestation_weeks, mother_age, CAST(mother_race AS STRING) AS mother_race FROM `bigquery-public-data.samples.natality` WHERE state = "WY"))

Slide 20

Slide 20 text

©2018 Wantedly, Inc. ML.WEIGHTS Predict Birth Weight SELECT * FROM ML.WEIGHTS(MODEL `tmp.natality_model`) ΧςΰϦม਺ͷ֤ཁૉ ϞσϧͷॏΈΛදࣔ

Slide 21

Slide 21 text

©2018 Wantedly, Inc. Predict whether user makes transaction or not using Logistic Regression

Slide 22

Slide 22 text

©2018 Wantedly, Inc. #standardSQL CREATE MODEL `tmp.sample_make_transaction` OPTIONS(model_type='logistic_reg') AS SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170630' αΠτʹ๚ΕͨϢʔβ͕औҾ͢Δ͔Ͳ͏͔Λ൑ఆ CREATE [OR REPLACE] MODEL User makes transaction or not? औҾ͔ͨ͠ {0, 1} defaultͰ’label’͕ ໨తม਺ʹͳΔ

Slide 23

Slide 23 text

©2018 Wantedly, Inc. ML.EVALUATE #standardSQL SELECT * FROM ML.EVALUATE(MODEL `tmp.sample_make_transaction`, (SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801')) ͓ͳ͡ΈͷPrecision, Recall, F scoreͳͲ͕Ͱͯ͘Δ User makes transaction or not?

Slide 24

Slide 24 text

©2018 Wantedly, Inc. ML.ROC_CURVE User makes transaction or not? #standardSQL SELECT * FROM ML.ROC_CURVE(MODEL `tmp.sample_make_transaction`, ( SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801')) thresholdΛม͑ͯTP, TF, FP, FNͳͲΛग़ྗͯ͘͠ΕΔ →͋ͱ͸google spreadsheetͳͲʹग़ྗͯ͠άϥϑԽ

Slide 25

Slide 25 text

©2018 Wantedly, Inc. ROC Curve, PR Curve User makes transaction or not? Google Spreadsheetʹॻ͖ग़ͯ͠ɺROCΧʔϒ΍PRΧʔϒ͕ඳ͚Δ ͋Δᮢ஋ʹͨ͠ͱ͖ͷrecall, precisionͳͲ͕ͩͤΔˠτϨʔυΦϑͷՄࢹԽ

Slide 26

Slide 26 text

©2018 Wantedly, Inc. User makes transaction or not? • ML.WEIGHTS ֤ม਺ͷॏΈΛग़ྗͯ͘͠ΕΔ SELECT * FROM ML.WEIGHTS(MODEL `tmp.sample_make_transaction`) ஋͕ߴ͍΄ͲऔҾ͢Δ֬཰͕ߴ͍ MacϢʔβ͸SunOSϢʔβʢʁʣΑΓऔҾ͢Δ֬཰͕ߴ͘༧૝͞ΕΔ

Slide 27

Slide 27 text

©2018 Wantedly, Inc. 4VNNBSZ

Slide 28

Slide 28 text

©2018 Wantedly, Inc. 1. Hello BigQuery ML (10 min.) • What is BQ ML? • BQ ML in Wantedly 2. Hands on (35 min.) • Model Creation • Model Evaluation • +Spreadsheet 3. Summary (10 min.) ࠓ೔΍Δ͜ͱ

Slide 29

Slide 29 text

©2018 Wantedly, Inc. ·ͱΊ • σʔλΛҠಈͤͣʹBQ্͚ͩͰֶशɾ༧ଌ͕Մೳ • ࠷௿ݶ͜ΕΛ֮͑ͨΒ࢖͑Δ MODEL CREATE, EVALUATE, PREDICT • ৄࡉʹத਎Λݟ͍ͨ WEIGHTSɺROC_CURVE, TRAINING_INFO… • ՄࢹԽ Google εϓϨουγʔτͰRecall, Precision, ROCΧʔϒ • ָ͍͠MLϥΠϑΛʂ

Slide 30

Slide 30 text

©2018 Wantedly, Inc. ৮ͬͯΈͯͷmisc. 1. ਖ਼ଇԽ߲΋ࢦఆͰ͖Δ l1_reg, l2_reg 2. STRING→ΧςΰϦม਺ͷม׵͕ศར “FEMALE” / “MALE” → 2ͭͷdummyม਺Λ࡞ͬͯ͘ΕΔ 3. Unbalancedσʔλʹରͯ͠͸ʁ ಛʹΫϥεʹର͢Δweightͷࢦఆ͸ݟ͔ͭΒͳ͔ͬͨ 4. લճͷଓ͖͔Β࢝ΊΔʹ͸ʁ warm_start = True 5. SQLͰͲ͏΍ͬͯಈ͍͍ͯΔ͔ʁ https://www.wantedly.com/companies/wantedly/post_articles/129482

Slide 31

Slide 31 text

©2018 Wantedly, Inc. "OZ2VFTUJPOTʁ BigQuery MLϋϯζΦϯ଎शձ