Google Cloud ML を用いた機械学習基盤の構築と運用/pepabo_ml_infrastructure_starchart

Google Cloud ML を用いた機械学習基盤の構築と運用/pepabo_ml_infrastructure_starchart

GCPUG Fukuoka 5th 〜Machine Learning 祭〜
https://gcpugfukuoka.connpass.com/event/46049/

Cd3d2cb2dadf5488935fe0ddaea7938a?s=128

monochromegane

January 28, 2017
Tweet

Transcript

  1. ࡾ୐༔հ / Pepabo R&D Institute, GMO Pepabo, Inc. 2017.01.28 GCPUG

    Fukuoka 5th ʙMachine Learning ࡇʙ Google Cloud MLΛ༻͍ͨ ػցֶशج൫ͷߏஙͱӡ༻
  2. ϓϦϯγύϧΤϯδχΞ ࡾ୐༔հ!NPOPDISPNFHBOF ϖύϘݚڀॴݚڀһ IUUQCMPHNPOPDISPNFHBOFDPN

  3. ࠓ೔͓࿩͢͠Δ͜ͱ •ϖύϘݚڀॴͱɺͳΊΒ͔ͳγεςϜ •ͳΊΒ͔ͳγεςϜΛ࣮ݱ͢Δػցֶशج൫ •Google Cloud ML ͱ StarChart Λ༻͍ͨػցֶशج ൫ͷӡ༻

    https://icons8.com/
  4. ϖύϘݚڀॴ

  5. ϖύϘݚڀॴ ུশʮϖύݚʯ ͸ɺࣄۀΛࠩ ผԽͰ͖Δٕज़Λ࡞Γग़ͨ͢ΊʹʮͳΊΒ͔ ͳγεςϜʯͱ͍͏ίϯηϓτͷԼͰݚڀ։ ൃʹऔΓ૊Ή૊৫Ͱ͢ɻ ϖύϘݚڀॴʹ͍ͭͯ  http://rand.pepabo.com/

  6. ͳΊΒ͔ͳγεςϜ

  7. γεςϜͷ֤ཁૉ͕໌ࣔతͳૢ࡞Λܦͣʹಛ ௃Λೝࣝ͠ɺͦͷಛ௃΍ؔ܎ੑʹج͖ͮɺͦ ͷ࣌ʑͷঢ়گʹԠͨ͡࠷దͳαʔϏεΛఏڙ ͢Δ ͳΊΒ͔ͳγεςϜ http://rand.pepabo.com/

  8. ػցֶशͱɺͳΊΒ͔ͳγεςϜ

  9. ྫ͑͹

  10. WebαʔϏεͷΞΫηε਺Λ༧ଌ͢Δ •ैྔ՝ۚͷԾ૝Ϧιʔεӡ༻ʹ͓͍ͯ࠷దͳϦιʔεधཁͷ༧ ଌ͸ίετΧοτʹͭͳ͕Δ •WebαʔϏεͷϦιʔεधཁ͸ϦΫΤετॲཧ݅਺ɺͭ·ΓΞ Ϋηε਺ͱ૬͕ؔ͋Δ͸ͣ •Ϧιʔεͷ૿ݮʹ΋͋Δఔ౓ͷ͕͔͔࣌ؒΔͨΊɺϦΞϧλΠ ϜͰ͸ͳ͘ҰఆִؒͰͷΞΫηε਺༧ଌͰे෼ͱߟ͑Δ

  11. WebαʔϏεͷΞΫηε਺Λ༧ଌ͢Δ ΞΫηε਺Λ༧ଌͰ͖ΔΑ͏ʹͳΕ ͹ɺϐʔΫλΠϜʹ͋Θͤͨ୆਺ݟੵ ΋Γ͔Β࣌ؒ͝ͱͷ࠷దʢͱࢥΘΕ Δʣ୆਺ݟੵ΋Γ͕ՄೳʹͳΔ

  12. LSTM Long Short Term Memory

  13. WebαʔϏεͷΞΫηε਺Λ༧ଌ͢Δ -45.Λ༻͍ͨΞΫηε਺༧ଌ w ೖྗ͸ظ ೔෼ ͷΞΫηε਺ͱΧϨϯμʔ৘ใ w ग़ྗ͸༧ଌͨ͠ظઌͷΞΫηε਺ w ظઌͷ༧ଌʹ͸લճͷ༧ଌΛೖྗʹؚΊͳ͕Β

    ظઌ·Ͱͷ༧ଌ w ࠨਤ͸िؒ෼Λ܁Γฦ͠༧ଌ IUUQSBOEQFQBCPDPNBSUJDMFNPOPDISPNFHBOF
  14. ϖύݚ

  15. ΞΧσϛοΫͳਫ४ʹ͓͚Δ৽نੑɾ༗ޮੑɾ ৴པੑΛ௥ٻ͢ΔݚڀΛߦ͏ͱͱ΋ʹɺݚڀ ։ൃٕͨ͠ज़Λ࣮ࡍͷγεςϜͱ࣮ͯ͠૷ɾ ఏڙ͢Δ͜ͱΛ௨ͯ͠ɺࣄۀͷ੒௕ʹߩݙ͠ ·͢ɻ ϖύϘݚڀॴʹ͍ͭͯ  http://rand.pepabo.com/

  16. Service meets ML. ࢖ͬͯφϯϘ

  17. ػցֶशج൫ʹ ٻΊΒΕΔ΋ͷ

  18. ྫ͑͹ɺΞΫηε਺༧ଌΛαʔϏεͰ࢖͏ Users Service log rack-bigfoot Bigfoot Activity ML Platform training

    dataset train input data prediction tune Access count prediction and schedule scaling
  19. ྫ͑͹ɺΞΫηε਺༧ଌΛαʔϏεͰ࢖͏ Users Service log rack-bigfoot Bigfoot Activity ML Platform training

    dataset train input data prediction tune Access count prediction and schedule scaling ͍͔ͳΔಛ௃Λ͔࣋ͭΛ ਫ਼៛ʹೝࣝ͢Δ ਓؒʹ໌ࣔతͳૢ࡞Λ ՝͞ͳ͍ ͦͷ࣌ʑͷঢ়گʹԠͯ͡ ࠷దͳαʔϏεΛఏڙ͢Δ
  20. OK, ͳΊΒ͔

  21. ػցֶशج൫ͷཁ݅Λߟ͑Δ  ϩά΍%#ͳͲͷαʔϏεࢿ࢈ͱ࿈ܞͰ͖Δ  ൺֱత༰қʹϞσϧͷߏஙͱࢼߦ͕ߦ͑Δ  ֶश݁ՌΛར༻͢ΔͨΊͷखஈͱͯ͠"1*Λఏڙ͢Δ  ֶश݁ՌͷϩʔΧϧར༻͕Ͱ͖Δͱͳ͓Α͍ 

    ্هͷ࢓૊Έ͕εέʔϥϒϧͰ͋Δ͜ͱ
  22. Google Cloud ML Ͱߟ͑Δ  ೖग़ྗ͕$MPVE4UPSBHFܦ༝  ܇࿅ϓϩάϥϜͱͯ͠5FOTPS'MPXΛ࠾༻  ΦϯϥΠϯ༧ଌαʔϏεʹΑΓϞσϧͷ"1*Խ

     ֶश݁Ռ͸$MPVE4UPSBHFʹอଘɺϩʔΧϧͰͷར༻΋  ෼ࢄܕͷτϨʔχϯάΠϯϑϥͱෛՙ෼ࢄαʔϏεͱͷ࿈ܞ ※ ݕ౼ʹؔ͢Δৄࡉ: http://rand.pepabo.com/article/2017/01/18/pepabo-ml-platform-and-workflow/
  23. ػցֶशج൫Λ ӡ༻͢Δ

  24. ֶश݁Ռ͸Ӭଓతͳ΋ͷͰ͸ͳ͍ expose Create 1. ϞσϧΛߏஙͯ͠ެ։͢Δ 2. αʔϏεࢿ࢈ͷมԽʹ߹ΘͤͯϞσϧΛվળ͢Δ tune

  25. ͦͷϞσϧɺαʔϏεʹఏڙͯ͠େৎ෉ʁ expose create ֶशͨ͠಺༰͕֬ೝͳ͘ద༻͞Εͯ͠·͏͜ͱͰҙਤ͠ͳ͍݁ Ռ͕ར༻͞ΕΔ͜ͱΛ๷͍͗ͨ tune Ϟσϧֶश͠ͳ͓ͨ͠ͷͰɺ APIͷ݁Ռ͕มΘͬͯ·͢ ਫ਼౓Լ͕ͬͯΔؾ͕͢Δʁ ग़ྗͷ࣍ݩ਺มΘͬͯͳ͍ʁ

  26. ͦͷϞσϧɺαʔϏεʹఏڙͯ͠େৎ෉ʁ expose create ֶशͨ͠಺༰͕֬ೝͳ͘ద༻͞Εͯ͠·͏͜ͱͰҙਤ͠ͳ͍݁ Ռ͕ར༻͞ΕΔ͜ͱΛ๷͍͗ͨ -> Ϟσϧͷόʔδϣϯ؅ཧ tune Ϟσϧֶशͯ͠৽όʔδϣϯ ͭͬͯ͘·͢ɻ੾ସ͍͍Ͱ͔͢ʁ

    ͍͖ͳΓมΘΒͳͯ͘ศར ͚ͩͲɺมߋ಺༰͕Θ͔ΒΜͳʁ default
  27. ͦͷϞσϧɺαʔϏεʹఏڙͯ͠େৎ෉ʁ expose create ֶशͨ͠಺༰͕֬ೝͳ͘ద༻͞Εͯ͠·͏͜ͱͰҙਤ͠ͳ͍݁ Ռ͕ར༻͞ΕΔ͜ͱΛ๷͍͗ͨ -> Ϟσϧͷίʔυ؅ཧ tune Ϟσϧͷ৽όʔδϣϯɺมߋ಺༰ͷ ϨϏϡʔ͓ئ͍͠·͢ʂ

    LGTM!!! όʔδϣϯ੾Γସ͑·͢ʂʂ default management
  28. StarChart https://github.com/monochromegane/starchart

  29. StarChart is a tool to manage Google Cloud Machine Learning

    training programs and model versions. StarChart
  30. StarChart Train job, model default version Train programs and model

    versions on GitHub train, expose, apply StarChart • όʔδϣϯ؅ཧͷ੾ସʹ͓͚Δ൑அج४ͱͳΔ܇࿅ϓϩάϥϜɺύϥϝλɺδϣϒ ৘ใ·ͰؚΊͯίʔυͰ؅ཧ • ֶश࣌ͷδϣϒID΍Cloud Storageͷύεɺόʔδϣϯʹඥͮ͘ύϥϝλ৘ใͷऔ ಘʹ·ͭΘΔCloud MLͷࡉ͔ͳ࢖͍উख΋վળ
  31. Let’s try

  32. DCGAN on Cloud ML using StarChart

  33. DCGAN TensorFlowʹΑΔDCGANͰΞΠ υϧͷإը૾ੜ੒ http://memo.sugyan.com/entry/20160516/1463359395

  34. ܇࿅ϓϩάϥϜΛGit؅ཧ͢Δ . !"" dcgan #"" setup.py !"" trainer #"" __init__.py

    #"" dcgan.py !"" task.py Ϟσϧ໊ͷ഑Լʹύοέʔδߏ੒ʹͳ ΔΑ͏ʹ܇࿅ϓϩάϥϜΛ഑ஔ ґଘύοέʔδ͕͋Δ৔߹͸ɺ setup.pyΛ४උ ࠓճ͸face-generatorͷdcgan.pyͱ main.pyΛར༻ https://github.com/sugyan/face-generator
  35. ܇࿅ϓϩάϥϜΛδϣϒͱͯ͠ొ࿥ $ starchart train \ -m dcgan \ # MODEL_NAME

    -M trainer.task \ # MODULE_NAME -- \ --train_dir=TRAIN_PATH/model \ # YOUR_TRAIN_PARAMS --images_dir=TRAIN_PATH/images \ --data_dir=gs://$BUCKET_NAME/data/dcgan • ύοέʔδϯάɺCloud Storage΁ͷΞοϓϩʔυɺδϣϒొ࿥Λ࣮ߦ • `TRAIN_PATH`͸Cloud Storage্ʹδϣϒ͝ͱʹ࡞੒͞ΕΔσΟϨΫτϦ໊ʹղऍ • ϓϩδΣΫτIDɺϦʔδϣϯɺΫϨσϯγϟϧ͸direnvܦ༝ͷ؀ڥม਺ࢦఆ͕ศར
  36. δϣϒͷ࣮ߦΛ଴ͭ $ starchart state -m dcgan jobId: dcgan_20170125191521 (FAILED) •

    δϣϒIDͱεςʔλεΛ֬ೝ • ϩάදࣔػೳ͸ະ࣮૷
  37. FAILED??

  38. ܇࿅ϓϩάϥϜΛ$MPVE.-ʹରԠͤ͞Δ import os os.listdir() os.mkdir(path) os.path.exists() with open(filename, 'wb') as

    f: • δϣϒ࣮ߦ࣌ͷFileIO͸Cloud StorageΛର৅ͱ͢Δ • tensorflow.python.lib.io.file_ioύοέʔδΛ࢖͏͜ͱͰϩʔΧϧύεࢦఆɺCloud Storageࢦఆ(gs://)Λಁաతʹѻ͑Δ • ύεࢦఆ͸ίϚϯυϥΠϯҾ਺Ͱ౉ͤΔΑ͏࣮૷͓ͯ͘͠ͱศར from tensorflow.python.lib.io import file_io file_io.list_directory() file_io.create_dir(path) file_io.file_exists() with file_io.FileIO(filename, 'w') as f:
  39. ܇࿅ϓϩάϥϜΛδϣϒͱͯ͠ొ࿥ $ starchart train \ -m dcgan \ # MODEL_NAME

    -M trainer.task \ # MODULE_NAME -- \ --train_dir=TRAIN_PATH/model \ # YOUR_TRAIN_PARAMS --images_dir=TRAIN_PATH/images \ --data_dir=gs://$BUCKET_NAME/data/dcgan
  40. δϣϒͷ࣮ߦΛ଴ͭ $ starchart state -m dcgan jobId: dcgan_20170125194440 (SUCCESSED) jobId:

    dcgan_20170125191521 (FAILED) • δϣϒIDͱεςʔλεΛ֬ೝ
  41. ϞσϧΛެ։͢Δ $ starchart expose -m dcgan • ੒ޭͨ͠δϣϒΛݩʹϞσϧΛొ࿥ • Ϟσϧొ࿥࣌ʹόʔδϣϯ΋ొ࿥ͯ͠༧ଌαʔϏεAPIͱͯ͠ެ։

    • όʔδϣϯ໊͸ v + δϣϒ໊ model v20170125194440 (default) Cloud ML & Storage
  42. Not working…

  43. ༧ଌαʔϏε"1*ͷ࢓૊Έʢٖࣅίʔυʣ request_params = {'instances': [{'sample_inputs': np.zeros((1, 40)).tolist()}]} def feed_from_request(request, tensor_keys):

    feed = {} request_keys = request['instances'][0].keys() for key in request_keys: feed[tensor_keys[key]] = [instance[key] for instance in request['instances']] return feed with tf.Session() as sess: new_saver = tf.train.import_meta_graph(‘TRAIN_PATH/model/export.meta’) new_saver.restore(sess, ‘TRAIN_PATH/model/export’) tensor_keys = json.loads(tf.get_collection('inputs')[0]) feed = feed_from_request(request_params, tensor_keys) op = json.loads(tf.get_collection('outputs')[0]) result = sess.run(op, feed_dict=feed) print(result) ֶश݁Ռͷ.FUB(SBQIΛ෮ݩ ίϨΫγϣϯ JOQVUT ͷςϯι ϧͱ"1*ϦΫΤετύϥϝλΛ ඥ෇͚ ίϨΫγϣϯ PVUQVUT ͷςϯ ιϧΛΦϖϨʔγϣϯͱͯ͠ඥ ෇͚ͨGFFEΛҾ਺ʹ࣮ߦ
  44. ܇࿅ϓϩάϥϜΛ"1*ʹରԠͤ͞Δ  saver.save(sess, os.path.join(FLAGS.train_dir, 'export')) • APIͰར༻͢ΔͨΊɺ࠷ऴͷֶश݁ՌΛΤΫεϙʔτ͢Δ • ΤΫεϙʔτ໊͸exportͰͳ͚Ε͹ͳΒͳ͍ •

    StarChartͷ৔߹ɺ`TRAIN_PATH/model/export` ͱͯ͠ग़ྗ͠ͳ͚Ε͹ͳΒͳ͍
  45. • ίϨΫγϣϯʹೖྗ༻ςϯιϧΛ௥Ճɻग़ྗςϯιϧͷ࣮ߦ࣌ʹfeed_dictͱͯ͠౉ ͢ύϥϝλΛࢦఆ • ίϨΫγϣϯʹग़ྗ༻ςϯιϧΛ௥Ճɻ༧ଌαʔϏεAPI࣮ߦ࣌ͷΦϖϨʔγϣϯ Λࢦఆ • ग़ྗ༻ςϯιϧʹ `tf.image.encode_jpeg`Λ࢖͏ͱInternal Server

    Error ͩͬͨͷͰtf.reshape(tf.squeeze(image, [0]), [1, -1]) ͱͨ͠ # Input sample_inputs = tf.placeholder(tf.float32, shape=(None, 1, dcgan.z_dim)) tf.add_to_collection('inputs', json.dumps({'sample_inputs': sample_inputs.name})) # Output sample_outputs = dcgan.sample_image_vectors(1, 1, inputs=sample_inputs[0]) tf.add_to_collection('outputs', json.dumps({'sample_outputs': sample_outputs.name})) ܇࿅ϓϩάϥϜΛ"1*ʹରԠͤ͞Δ 
  46. ܇࿅ϓϩάϥϜΛδϣϒͱͯ͠ొ࿥ $ starchart train \ -m dcgan \ # MODEL_NAME

    -M trainer.task \ # MODULE_NAME -- \ --train_dir=TRAIN_PATH/model \ # YOUR_TRAIN_PARAMS --images_dir=TRAIN_PATH/images \ --data_dir=gs://$BUCKET_NAME/data/dcgan
  47. δϣϒͷ࣮ߦΛ଴ͭ $ starchart state -m dcgan jobId: dcgan_20170125201233 (SUCCESSED) jobId:

    dcgan_20170125194440 (SUCCESSED) jobId: dcgan_20170125191521 (FAILED) • δϣϒIDͱεςʔλεΛ֬ೝ
  48. ϞσϧΛެ։͢Δ $ starchart expose -m dcgan model v20170125194440 (default) Cloud

    ML & Storage v20170125201233
  49. ϞσϧϑΝΠϧΛGit؅ཧ͢Δ . #"" dcgan $ #"" setup.py $ !"" trainer

    $ #"" __init__.py $ #"" dcgan.py $ !"" task.py !"" dcgan.json exposeͨ݁͠Ռ͕`Ϟσϧ໊.json`ʹอ ଘ͞ΕΔɻσϑΥϧτόʔδϣϯͷ੾ ସʹ࢖͏ͷͰ͜Ε΋Git؅ཧͱ͢Δ
  50. ϞσϧϑΝΠϧΛGit؅ཧ͢Δ { "model": "MODEL_NAME", "versions": [ { "version": { "name":

    "projects/PROJECT_ID/models/MODEL_NAME/versions/v20170111170842", "deploymentUri": "gs://PROJECT_ID-ml/MODEL_NAME/20170111170842/model", "createTime": "2017-01-11T09:12:54Z", "job": { "jobId": "MODEL_NAME_20170111170842", "trainingInput": { "packageUris": [ "gs://PROJECT_ID-ml/MODEL_NAME/20170111170842/packages/trainer-0.0.0.tar.gz" ], "pythonModule": "trainer.task", "args": [ "--model_dir=gs://PROJECT_ID-ml/MODEL_NAME/20170111170842/model", "--train_dir=gs://PROJECT_ID-ml/MODEL_NAME/20170111170842/train", ], "region": "us-central1" }, "createTime": "2017-01-11T08:08:49Z", "startTime": "2017-01-11T08:13:55Z", "endTime": "2017-01-11T08:40:55Z", "state": "SUCCEEDED", "trainingOutput": { "consumedMLUnits": 0.45 } }, "isDefault": true } } ] } "1*όʔδϣϯʹඥͮ͘δϣϒ ΍࣮ߦ࣌ύϥϝλɺσϑΥϧτ όʔδϣϯ͔Ͳ͏͔֬ೝͰ͖Δ
  51. ༧ଌαʔϏε"1*Λ࢖͏ project = 'project-123456' model = 'dcgan' version = 'v20170125194440'

    credentials = GoogleCredentials.get_application_default() ml = discovery.build('ml', 'v1beta1', credentials=credentials) body = {'instances': [{'sample_inputs': np.zeros((1, 40)).tolist()}]} request = ml.projects().predict(name='projects/{}/models/{}/versions/{}'.format(project, model, version), body=body) try: response = request.execute() output = response['predictions'][0]['sample_outputs'] with tf.Session() as sess: image = sess.run(tf.image.encode_jpeg(tf.reshape(tf.constant(output, dtype=tf.uint8), [96, 96, 3]))) with file_io.FileIO('out.jpg', 'w') as f: f.write(image) except errors.HttpError as err: print(err._get_reason())
  52. It works !!

  53. ϞσϧͷσϑΥϧτόʔδϣϯΛมߋ͢Δ $ starchart apply -m dcgan model v20170125194440 Cloud ML

    & Storage v20170125201233 (default) • ϞσϧϑΝΠϧΛฤू͠ɺσϑΥϧτͱ͍ͨ͠όʔδϣϯͷ `isDefault` Λ true ʹɻ • ͜ͷ࣌఺ͷ܇࿅ϓϩάϥϜͱϞσϧϑΝΠϧΛPullRequestͱ͢Δ • ϨϏϡʔͰσϑΥϧτόʔδϣϯͷج४Λຬ͍ͨͯͨ͠ΒϚʔδͯ͠apply
  54. 4UBS$IBSUʹΑΔӡ༻ͷ͓͞Β͍ •Ϟσϧ͝ͱͷ܇࿅ϓϩάϥϜΛGit؅ཧ •train -> expose -> ϨϏϡʔ -> apply Λ܁Γฦ͢

    Easy & Useful
  55. ·ͱΊ

  56. ·ͱΊ •ػցֶशΛαʔϏεར༻͢ΔͨΊʹػցֶशج൫Λݕ ౼ͨ͠ •Google Cloud MLʹΑΔߏஙɺӡ༻ΛߦͬͯΈͨ •ෆศͳ఺͸StarChartͰվળͨ͠ •ػցֶशͰαʔϏεվળ͠Α͏ʂʂ

  57. ͓ΘΓ

  58. ϖύϘΧϨοδظੜืूத ʙ෱ԬͰ׆༂͍ͨ͠ʂ8FCΞϓϦέʔγϣϯΤϯδχΞʙ ࠷৽ͷ࠾༻৘ใΛνΣοΫˠ !QC@SFDSVJU