Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Google Cloud ML を用いた機械学習基盤の構築と運用/pepabo_ml_infr...

Google Cloud ML を用いた機械学習基盤の構築と運用/pepabo_ml_infrastructure_starchart

GCPUG Fukuoka 5th 〜Machine Learning 祭〜
https://gcpugfukuoka.connpass.com/event/46049/

monochromegane

January 28, 2017
Tweet

More Decks by monochromegane

Other Decks in Technology

Transcript

  1. ࡾ୐༔հ / Pepabo R&D Institute, GMO Pepabo, Inc. 2017.01.28 GCPUG

    Fukuoka 5th ʙMachine Learning ࡇʙ Google Cloud MLΛ༻͍ͨ ػցֶशج൫ͷߏஙͱӡ༻
  2. ྫ͑͹ɺΞΫηε਺༧ଌΛαʔϏεͰ࢖͏ Users Service log rack-bigfoot Bigfoot Activity ML Platform training

    dataset train input data prediction tune Access count prediction and schedule scaling
  3. ྫ͑͹ɺΞΫηε਺༧ଌΛαʔϏεͰ࢖͏ Users Service log rack-bigfoot Bigfoot Activity ML Platform training

    dataset train input data prediction tune Access count prediction and schedule scaling ͍͔ͳΔಛ௃Λ͔࣋ͭΛ ਫ਼៛ʹೝࣝ͢Δ ਓؒʹ໌ࣔతͳૢ࡞Λ ՝͞ͳ͍ ͦͷ࣌ʑͷঢ়گʹԠͯ͡ ࠷దͳαʔϏεΛఏڙ͢Δ
  4. Google Cloud ML Ͱߟ͑Δ  ೖग़ྗ͕$MPVE4UPSBHFܦ༝  ܇࿅ϓϩάϥϜͱͯ͠5FOTPS'MPXΛ࠾༻  ΦϯϥΠϯ༧ଌαʔϏεʹΑΓϞσϧͷ"1*Խ

     ֶश݁Ռ͸$MPVE4UPSBHFʹอଘɺϩʔΧϧͰͷར༻΋  ෼ࢄܕͷτϨʔχϯάΠϯϑϥͱෛՙ෼ࢄαʔϏεͱͷ࿈ܞ ※ ݕ౼ʹؔ͢Δৄࡉ: http://rand.pepabo.com/article/2017/01/18/pepabo-ml-platform-and-workflow/
  5. StarChart is a tool to manage Google Cloud Machine Learning

    training programs and model versions. StarChart
  6. StarChart Train job, model default version Train programs and model

    versions on GitHub train, expose, apply StarChart • όʔδϣϯ؅ཧͷ੾ସʹ͓͚Δ൑அج४ͱͳΔ܇࿅ϓϩάϥϜɺύϥϝλɺδϣϒ ৘ใ·ͰؚΊͯίʔυͰ؅ཧ • ֶश࣌ͷδϣϒID΍Cloud Storageͷύεɺόʔδϣϯʹඥͮ͘ύϥϝλ৘ใͷऔ ಘʹ·ͭΘΔCloud MLͷࡉ͔ͳ࢖͍উख΋վળ
  7. ܇࿅ϓϩάϥϜΛGit؅ཧ͢Δ . !"" dcgan #"" setup.py !"" trainer #"" __init__.py

    #"" dcgan.py !"" task.py Ϟσϧ໊ͷ഑Լʹύοέʔδߏ੒ʹͳ ΔΑ͏ʹ܇࿅ϓϩάϥϜΛ഑ஔ ґଘύοέʔδ͕͋Δ৔߹͸ɺ setup.pyΛ४උ ࠓճ͸face-generatorͷdcgan.pyͱ main.pyΛར༻ https://github.com/sugyan/face-generator
  8. ܇࿅ϓϩάϥϜΛδϣϒͱͯ͠ొ࿥ $ starchart train \ -m dcgan \ # MODEL_NAME

    -M trainer.task \ # MODULE_NAME -- \ --train_dir=TRAIN_PATH/model \ # YOUR_TRAIN_PARAMS --images_dir=TRAIN_PATH/images \ --data_dir=gs://$BUCKET_NAME/data/dcgan • ύοέʔδϯάɺCloud Storage΁ͷΞοϓϩʔυɺδϣϒొ࿥Λ࣮ߦ • `TRAIN_PATH`͸Cloud Storage্ʹδϣϒ͝ͱʹ࡞੒͞ΕΔσΟϨΫτϦ໊ʹղऍ • ϓϩδΣΫτIDɺϦʔδϣϯɺΫϨσϯγϟϧ͸direnvܦ༝ͷ؀ڥม਺ࢦఆ͕ศར
  9. δϣϒͷ࣮ߦΛ଴ͭ $ starchart state -m dcgan jobId: dcgan_20170125191521 (FAILED) •

    δϣϒIDͱεςʔλεΛ֬ೝ • ϩάදࣔػೳ͸ະ࣮૷
  10. ܇࿅ϓϩάϥϜΛ$MPVE.-ʹରԠͤ͞Δ import os os.listdir() os.mkdir(path) os.path.exists() with open(filename, 'wb') as

    f: • δϣϒ࣮ߦ࣌ͷFileIO͸Cloud StorageΛର৅ͱ͢Δ • tensorflow.python.lib.io.file_ioύοέʔδΛ࢖͏͜ͱͰϩʔΧϧύεࢦఆɺCloud Storageࢦఆ(gs://)Λಁաతʹѻ͑Δ • ύεࢦఆ͸ίϚϯυϥΠϯҾ਺Ͱ౉ͤΔΑ͏࣮૷͓ͯ͘͠ͱศར from tensorflow.python.lib.io import file_io file_io.list_directory() file_io.create_dir(path) file_io.file_exists() with file_io.FileIO(filename, 'w') as f:
  11. ܇࿅ϓϩάϥϜΛδϣϒͱͯ͠ొ࿥ $ starchart train \ -m dcgan \ # MODEL_NAME

    -M trainer.task \ # MODULE_NAME -- \ --train_dir=TRAIN_PATH/model \ # YOUR_TRAIN_PARAMS --images_dir=TRAIN_PATH/images \ --data_dir=gs://$BUCKET_NAME/data/dcgan
  12. δϣϒͷ࣮ߦΛ଴ͭ $ starchart state -m dcgan jobId: dcgan_20170125194440 (SUCCESSED) jobId:

    dcgan_20170125191521 (FAILED) • δϣϒIDͱεςʔλεΛ֬ೝ
  13. ༧ଌαʔϏε"1*ͷ࢓૊Έʢٖࣅίʔυʣ request_params = {'instances': [{'sample_inputs': np.zeros((1, 40)).tolist()}]} def feed_from_request(request, tensor_keys):

    feed = {} request_keys = request['instances'][0].keys() for key in request_keys: feed[tensor_keys[key]] = [instance[key] for instance in request['instances']] return feed with tf.Session() as sess: new_saver = tf.train.import_meta_graph(‘TRAIN_PATH/model/export.meta’) new_saver.restore(sess, ‘TRAIN_PATH/model/export’) tensor_keys = json.loads(tf.get_collection('inputs')[0]) feed = feed_from_request(request_params, tensor_keys) op = json.loads(tf.get_collection('outputs')[0]) result = sess.run(op, feed_dict=feed) print(result) ֶश݁Ռͷ.FUB(SBQIΛ෮ݩ ίϨΫγϣϯ JOQVUT ͷςϯι ϧͱ"1*ϦΫΤετύϥϝλΛ ඥ෇͚ ίϨΫγϣϯ PVUQVUT ͷςϯ ιϧΛΦϖϨʔγϣϯͱͯ͠ඥ ෇͚ͨGFFEΛҾ਺ʹ࣮ߦ
  14. • ίϨΫγϣϯʹೖྗ༻ςϯιϧΛ௥Ճɻग़ྗςϯιϧͷ࣮ߦ࣌ʹfeed_dictͱͯ͠౉ ͢ύϥϝλΛࢦఆ • ίϨΫγϣϯʹग़ྗ༻ςϯιϧΛ௥Ճɻ༧ଌαʔϏεAPI࣮ߦ࣌ͷΦϖϨʔγϣϯ Λࢦఆ • ग़ྗ༻ςϯιϧʹ `tf.image.encode_jpeg`Λ࢖͏ͱInternal Server

    Error ͩͬͨͷͰtf.reshape(tf.squeeze(image, [0]), [1, -1]) ͱͨ͠ # Input sample_inputs = tf.placeholder(tf.float32, shape=(None, 1, dcgan.z_dim)) tf.add_to_collection('inputs', json.dumps({'sample_inputs': sample_inputs.name})) # Output sample_outputs = dcgan.sample_image_vectors(1, 1, inputs=sample_inputs[0]) tf.add_to_collection('outputs', json.dumps({'sample_outputs': sample_outputs.name})) ܇࿅ϓϩάϥϜΛ"1*ʹରԠͤ͞Δ 
  15. ܇࿅ϓϩάϥϜΛδϣϒͱͯ͠ొ࿥ $ starchart train \ -m dcgan \ # MODEL_NAME

    -M trainer.task \ # MODULE_NAME -- \ --train_dir=TRAIN_PATH/model \ # YOUR_TRAIN_PARAMS --images_dir=TRAIN_PATH/images \ --data_dir=gs://$BUCKET_NAME/data/dcgan
  16. δϣϒͷ࣮ߦΛ଴ͭ $ starchart state -m dcgan jobId: dcgan_20170125201233 (SUCCESSED) jobId:

    dcgan_20170125194440 (SUCCESSED) jobId: dcgan_20170125191521 (FAILED) • δϣϒIDͱεςʔλεΛ֬ೝ
  17. ϞσϧϑΝΠϧΛGit؅ཧ͢Δ . #"" dcgan $ #"" setup.py $ !"" trainer

    $ #"" __init__.py $ #"" dcgan.py $ !"" task.py !"" dcgan.json exposeͨ݁͠Ռ͕`Ϟσϧ໊.json`ʹอ ଘ͞ΕΔɻσϑΥϧτόʔδϣϯͷ੾ ସʹ࢖͏ͷͰ͜Ε΋Git؅ཧͱ͢Δ
  18. ϞσϧϑΝΠϧΛGit؅ཧ͢Δ { "model": "MODEL_NAME", "versions": [ { "version": { "name":

    "projects/PROJECT_ID/models/MODEL_NAME/versions/v20170111170842", "deploymentUri": "gs://PROJECT_ID-ml/MODEL_NAME/20170111170842/model", "createTime": "2017-01-11T09:12:54Z", "job": { "jobId": "MODEL_NAME_20170111170842", "trainingInput": { "packageUris": [ "gs://PROJECT_ID-ml/MODEL_NAME/20170111170842/packages/trainer-0.0.0.tar.gz" ], "pythonModule": "trainer.task", "args": [ "--model_dir=gs://PROJECT_ID-ml/MODEL_NAME/20170111170842/model", "--train_dir=gs://PROJECT_ID-ml/MODEL_NAME/20170111170842/train", ], "region": "us-central1" }, "createTime": "2017-01-11T08:08:49Z", "startTime": "2017-01-11T08:13:55Z", "endTime": "2017-01-11T08:40:55Z", "state": "SUCCEEDED", "trainingOutput": { "consumedMLUnits": 0.45 } }, "isDefault": true } } ] } "1*όʔδϣϯʹඥͮ͘δϣϒ ΍࣮ߦ࣌ύϥϝλɺσϑΥϧτ όʔδϣϯ͔Ͳ͏͔֬ೝͰ͖Δ
  19. ༧ଌαʔϏε"1*Λ࢖͏ project = 'project-123456' model = 'dcgan' version = 'v20170125194440'

    credentials = GoogleCredentials.get_application_default() ml = discovery.build('ml', 'v1beta1', credentials=credentials) body = {'instances': [{'sample_inputs': np.zeros((1, 40)).tolist()}]} request = ml.projects().predict(name='projects/{}/models/{}/versions/{}'.format(project, model, version), body=body) try: response = request.execute() output = response['predictions'][0]['sample_outputs'] with tf.Session() as sess: image = sess.run(tf.image.encode_jpeg(tf.reshape(tf.constant(output, dtype=tf.uint8), [96, 96, 3]))) with file_io.FileIO('out.jpg', 'w') as f: f.write(image) except errors.HttpError as err: print(err._get_reason())
  20. ϞσϧͷσϑΥϧτόʔδϣϯΛมߋ͢Δ $ starchart apply -m dcgan model v20170125194440 Cloud ML

    & Storage v20170125201233 (default) • ϞσϧϑΝΠϧΛฤू͠ɺσϑΥϧτͱ͍ͨ͠όʔδϣϯͷ `isDefault` Λ true ʹɻ • ͜ͷ࣌఺ͷ܇࿅ϓϩάϥϜͱϞσϧϑΝΠϧΛPullRequestͱ͢Δ • ϨϏϡʔͰσϑΥϧτόʔδϣϯͷج४Λຬ͍ͨͯͨ͠ΒϚʔδͯ͠apply