Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Argo Workflow による機械学習ワークフロー管理

Argo Workflow による機械学習ワークフロー管理

2019/06/27
Data Pipeline Casual Talk #3

Livesense Inc.

June 27, 2019
Tweet

More Decks by Livesense Inc.

Other Decks in Technology

Transcript

  1. ࿩͢͜ͱ ͳͥ Argo Workflow ͕ඞཁ͔ͩͬͨ • ϦϒηϯεͷαʔϏεͱMLγεςϜ • MLγεςϜͷ։ൃɾӡ༻ࣄ৘ •

    MLγεςϜͷίϯϙʔωϯτ෼ׂͱίϯςφԽ Argo Workflow ΛͲ͏࢖͍ͬͯΔ͔ • Argo Workflow ͷجຊػೳ • ϦϒηϯεͰͷ Argo Workflow ӡ༻ ※ Kubernetes ͷجૅ஌ࣝΛલఏͱ͍ͯ͠·͢
  2. ໨తʹԠͯ͡ݴޠɾϥΠϒϥϦΛ࢖͍෼͚Δ ٻਓαʔϏε͸޿ࠂαʔϏεͳͲͱൺ΂୯Ձ͕େ͖͘CVR͕খ͍͞ → ࠷໬ਪఆϕʔεͷҰൠతͳMLϥΠϒϥϦ͕ద͠ͳ͍͜ͱ΋ → ϞσϧɾΞϧΰϦζϜͷࣗલ࣮૷ͷͨΊݴޠɾϥΠϒϥϦΛ࢖͍෼͚Δ • ϨίϝϯυΞϧΰϦζϜΛ Julia Ͱ࣮૷

    • Alternating Least SquaresʹΑΔFactorization Machinesͷύϥϝʔλਪఆ • Factorization MachinesΛϨίϝϯσʔγϣϯͰ࢖͏ͱ͖ͷධՁਪఆ஋ܭࢉ • ਪఆɾ༧ଌϞσϧͰ Stan Λར༻ • ֊૚ϕΠζʹΑΔখඪຊσʔλͷൺ཰ͷਪఆ
  3. ίϯϙʔωϯτͷ෼ׂ ·ͣ͸γεςϜΛ࣍ͷΑ͏ͳ୯ػೳίϯϙʔωϯτʹ෼ׂͨ͠ • ֤ίϯϙʔωϯτ͸ CLI Ͱ୯ಠ࣮ߦͰ͖Δ • ίϯϙʔωϯτؒͷೖग़ྗ͸͢΂ͯϑΝΠϧΛհ͢Δ name role

    input file output file sqlkit DBIO SQL CSV nlpkit ࣗવݴޠॲཧ ςΩετ BoWϕΫτϧ recommender Ϩίϝϯυ ධՁ஋ ਪનείΞ
  4. ίϯϙʔωϯτͷίϯςφԽ ͞Βʹ֤ίϯϙʔωϯτΛ୯ҰͷίϯςφΠϝʔδʹͨ͠ • ֤ίϯςφίϯϙʔωϯτ͸ docker run ΍ kubectl run Ͱ࣮ߦͰ͖Δ

    • γεςϜ͝ͱͷࠩ෼͸΄΅ઃఆϑΝΠϧ΍SQL͚ͩͰදݱ # load dataset docker run -v $(pwd):/workdir sqlkit select ratings.sql /workdir/ratings.csv docker run -v $(pwd):/workdir sqlkit select content.sql /workdir/content.csv # preprocess docker run -v $(pwd):/workdir nlpkit vectorize /workdir/content.csv /workdir/features.csv # run recommender docker run -v $(pwd):/workdir recommender predict config.yaml /workdir
  5. ϫʔΫϑϩʔΛͲ͏࣮ݱ͢Δ͔ʁ ίϯϙʔωϯτͷ෼ׂͱίϯςφԽʹΑΓෳ਺ͷ՝୊ΛղܾͰ͖ͨ • ີ݁߹ͷղফɾεςοϓ࣮ߦͷՄೳԽ • ڞ௨෦෼ͷ࠶ར༻ՄೳԽ • ݴޠɾϥΠϒϥϦͷ࢖͍෼͚ͷ༰қԽ ͔͠͠ɺෳࡶͳϫʔΫϑϩʔΛͲ͏ߏஙɾ؅ཧ͢Δ͔ͷ՝୊͸࢒Δ •

    ୯७ͳόονॲཧͳΒ docker run ΍ kubectl run Λஞ࣮࣍ߦ͢Δ͚ͩ • ࣮ࡍʹ͜ͷํࣜͰຊ൪Քಇ͍ͯ͠ΔγεςϜ΋ଘࡏ • ฒྻԽɾϦτϥΠͳͲͷߴ౓ͳϫʔΫϑϩʔΛ࣮ݱ͍ͨ͠৔߹͸ʁ
  6. Argo Workflow "Container native workflow engine for Kubernetes" Kubernetes ্Ͱෳ਺ͷίϯςφ͔ΒͳΔϫʔΫϑϩʔΛ࣮ߦͰ͖Δ

    ͻͱ͜ͱͰݴ͏ͱʮߴػೳͳ k8s Jobʯ • ෳ਺ίϯςφͷ௚ྻɾฒྻɾDAG࣮ߦ • ෼ذɾϧʔϓɾϑοΫ౳ͷ੍ޚϑϩʔ • ϦτϥΠɾλΠϜΞ΢τɾϫʔΧʔϊʔυબ୒ • ϞχλϦϯά༻ Web UI
  7. Argo Workflow ͷಛ௃ CRD controller ͱ࣮ͯ͠૷͞Ε͍ͯΔ • argo submit Ͱ࡞੒͞Εͨ

    Workflow ϦιʔεΛ controller ͕࣮ߦ • ϫʔΫϑϩʔͷ֤εςοϓ͸ Pod ͱͯ͠ಈ࡞ ϫʔΫϑϩʔͷ࣮ߦʹઐ೦͠ɺτϦΨʔ΍ఆظ࣮ߦͷػೳ͸΋ͨͳ͍ • ࢖͍উख͸ Airflow, Digdag ΑΓ Luigi ʹ͍ۙ • Argo Events ͱ͍͏ผπʔϧͰ༷ʑͳτϦΨʔΛఏڙ
  8. ୯ҰίϯςφΛ࣮ߦ͢Δ࠷΋؆୯ͳϫʔΫϑϩʔ apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: hello-world- spec: entrypoint:

    entrypoint # ࠷ॳʹ࣮ߦ͢ΔίϯςφςϯϓϨʔτΛࢦఆ templates: # ̍ͭҎ্ͷίϯςφςϯϓϨʔτΛఆٛ - name: entrypoint container: image: alpine:latest command: ["echo", "hello world"] ※Ҏ߱ͷྫ͸ spec ഑ԼͷΈهࡌ
  9. ϫʔΫϑϩʔʹύϥϝʔλΛ౉͢ entrypoint: entrypoint arguments: # ϫʔΫϑϩʔ࣮ߦ࣌ʹ argo submit -p message=hello

    ͷΑ͏ʹ౉ͤΔ parameters: - name: message templates: - name: entrypoint container: image: alpine:latest command: ["echo", "{{workflow.parameters.message}}"] # ύϥϝʔλͷຒΊࠐΈ
  10. εςοϓʹύϥϝʔλΛ౉͢ entrypoint: entrypoint templates: - name: entrypoint inputs: # ޙड़ͷ

    steps, dag ͳͲ͔Β౉͢ parameters: - name: message value: hello container: image: alpine:latest command: ["echo", "{{inputs.parameters.message}}"] # ύϥϝʔλͷຒΊࠐΈ
  11. steps: εςοϓͷ௚ྻɾฒྻ࣮ߦ templates: - name: entrypoint steps: - - name:

    hello1 template: echo # ίϯςφςϯϓϨʔτΛࢦఆ arguments: {parameters: [{name: "message", value: "hello1"}]} - - name: hello2a # hello1 ͷ࣍ʹ hello2a, hello2b Λ࣮ߦ template: echo arguments: {parameters: [{name: "message", value: "hello2a"}]} - name: hello2b # hello2a, hello2b ͸ฒྻ࣮ߦ template: echo arguments: {parameters: [{name: "message", value: "hello2b"}]} - name: echo inputs: {parameters: [{name: "message"}]} container: image: alpine:latest command: ["echo", "{{inputs.parameters.message}}"]
  12. dag: ͰλεΫͷDAG࣮ߦ templates: - name: entrypoint dag: tasks: - name:

    A template: echo arguments: {parameters: [{name: message, value: A}]} - name: B dependencies: [A] # ґଘλεΫΛࢦఆ template: echo arguments: {parameters: [{name: message, value: B}]} - name: C dependencies: [A] template: echo arguments: {parameters: [{name: message, value: C}]} - name: D dependencies: [B, C] # ґଘλεΫΛෳ਺ࢦఆ template: echo arguments: {parameters: [{name: message, value: D}]}
  13. artifact: εςοϓؒͰϑΝΠϧΛड͚౉͠ templates: - name: entrypoint steps: - - {name:

    generate-artifact, template: generate-artifact} - - {name: consume-artifact, template: consume-artifact} - name: generate-artifact container: image: alpine:latest command: ["sh", "-c", "echo hello > /tmp/output.txt"] outputs: artifacts: - {name: "result", path: "/tmp/output.txt"} - name: consume-artifact container: image: alpine:latest command: ["sh", "-c", "cat /tmp/input.txt"] inputs: artifacts: - {name: "result", path: "/tmp/input.txt"}
  14. when: ϫʔΫϑϩʔͷ෼ذ templates: - name: entrypoint steps: - - name:

    flip-coin template: flip-coin # when Ͱશεςοϓͷ݁ՌΛ΋ͱʹ෼ذ - - when: "{{steps.flip-coin.outputs.result}} == heads" name: heads - when: "{{steps.flip-coin.outputs.result}} == tails" name: tails - name: flip-coin script: image: python:latest command: [python] source: "import random; print(random.choice(['heads', 'tails']))"
  15. withItems, withParams: εςοϓͷ܁Γฦ͠ templates: - name: entrypoint steps: # withItems

    Ͱ౉ͨ͠ item ͷ਺͚ͩεςοϓΛฒྻ࣮ߦ - - withItems: ["hello world", "goodbye world", "ok world"] name: each template: echo arguments: {parameters: [{name: "message", value: "{{item}}"}]} # withParams ʹ ["hello world", "goodbye world"] ͷΑ͏ͳ JSON Λ౉͢͜ͱ΋Մೳ - - withParams: "{{workflow.parameters.params}}" name: each template: echo arguments: {parameters: [{name: "message", value: "{{item}}"}]}
  16. exitHandler : ϫʔΫϑϩʔͷ੒ޭɾࣦഊ࣌ͷϋϯυϦϯά onExit: exit-handler templates: - name: entrypoint container:

    image: alpine:latest command: ["exit", "1"] - name: exit-handler steps: # workflow.status Λ΋ͱʹ෼ذ - - when: "{{workflow.status}} == Succeeded" template: echo arguments: {parameters: [{name: "message", value: "SUCCESS"}]} - when: "{{workflow.status}} != Succeeded" template: echo arguments: {parameters: [{name: "message", value: "ERROR!"}]}
  17. ϦτϥΠɾλΠϜΞ΢τͳͲ templates: - name: entrypoint # ϦτϥΠճ਺ͳͲΛઃఆ retryStrategy: limit: 2

    # λΠϜΞ΢τΛઃఆ (Pod ͷه๏ͱಉ͡) activeDeadlineSeconds: 28800 # ϊʔυͷࢦఆ (Pod ͷه๏ͱಉ͡) nodeSelector: cloud.google.com/gke-nodepool: highmem-pool # Ϧιʔε੍ݶ (Pod ͷه๏ͱಉ͡) container: resources: limits: memory: "32Gi"
  18. ͦͷଞ • ฒྻ࣮ߦ࣌ͷฒྻ਺্ݶΛઃఆ • ϘϦϡʔϜʹΑΔσʔλͷड͚౉͠ • ิॿίϯςφͷར༻ (Sidecar, Daemon, ...)

    • ֎෦ετϨʔδͷར༻ • etc. ৄ͘͠͸ެࣜͷ example Λࢀর https://github.com/argoproj/argo/tree/master/examples
  19. MLγεςϜͷ࣮ߦج൫ GCP্ͰGKE Λத৺ͱ͢Δػցֶशج൫Λߏங • ෳ਺ͷMLγεςϜΛ୯ҰͷGKEΫϥελʹू໿ • όονॲཧ͚ͩͰͳ͘WebΞϓϦ౳΋ಉ͡ΫϥελͰӡ༻ Argo Workflow ͷར༻

    • ίϯςφίϯϙʔωϯτ͸GCBͰϏϧυ͠GCR ʹొ࿥ • ϫʔΫϑϩʔఆٛ͸ଞͷ manifest ͱಉ͡ϨϙδτϦͰ؅ཧ • ఆظ࣮ߦ͢ΔϫʔΫϑϩʔ͸ CronJob Ͱ argo submit
  20. ӡ༻ࢦ਑ όονॲཧ͸ͱΓ͋͑ͣ Workflow ͱͯ͠ఆٛ • खݩͰ docker run ͚ͩͰࢼݧ࣮ߦͰ͖ΔΑ͏γεςϜΛ࣮૷ •

    ·ͣ͸୯Ұεςοϓͷ Workflow ͱͯ͠ӡ༻ʹࡌͤΔ ӡ༻͠ͳ͕ΒίϯϙʔωϯτԽΛਐΊͯຊମΛεϦϜԽ • DBIO΍௨஌ͳͲͷڞ௨ॲཧΛஈ֊తʹ੾Γग़͍ͯ͘͠ • ฒྻԽɾϦτϥΠͳͲ͸ͳΔ΂͘ Workflow ଆͷػೳͰ࣮ݱ ҎԼɺࣄྫͱӡ༻ϊ΢ϋ΢Λ঺հ
  21. CASE: ίϯϙʔωϯτͷ૊Έ߹Θͤ • ϨίϝϯυΤϯδϯ͸ಛʹίϯϙʔωϯτԽ͕ਐΜͰ͍Δ • SQL΍ઃఆϑΝΠϧ͸ͻͱͭͷίϯςφʹ·ͱΊͯ࠷ॳʹల։ templates: - name: entrypoint

    steps: - - name: load-config - - name: sqlkit withItems: - sqlfile: /workspace/sql/ratings.sql - sqlfile: /workspace/sql/contents.sql - - name: nlpkit - - name: recommender
  22. CASE: ϝΠϯͷόονॲཧͷεϦϜԽ • ਪఆɾ༧ଌϞσϧ͸DBIO΍௨஌ͳͲΛ੾Γग़ͯ͠ϝΠϯͷόονॲཧΛεϦϜԽ • MLΤϯδχΞɾMLج൫ΤϯδχΞͰͷ෼୲Λ͠΍͍ͯ͘͢͠Δ onExit: exit-handler templates: -

    name: entrypoint steps: - - name: train-predict # MLΤϯδχΞ͕࣮૷ (ग़ྗ͸CSV) - - name: import-to-db # MLج൫ΤϯδχΞ͕࣮૷ - name: exit-handler # MLج൫ΤϯδχΞ͕࣮૷ steps: - - when: "{{workflow.status}} != Succeeded" name: notify-error
  23. CASE: MLϞσϧͷ؆қతͳCD • ਪఆɾ༧ଌϞσϧͷ݁ՌϏϡʔϫ͸ Deployment ͱͯ͠ӡ༻ • ਪఆॲཧ׬ྃ࣌ʹ kubectl set

    env ͰϏϡʔϫʹ৽͍͠ϞσϧΛಡΈࠐ·ͤΔ • Rolling Update ʹΑΓμ΢ϯλΠϜແ͠ͷϞσϧߋ৽΋Մೳ templates: - name: entrypoint steps: - - name: train-predict - - name: import-to-db - - name: update-viewer - name: update-viewer container: image: kubectl command: ["sh", "-c"] args: ["kubectl set env deployment/viewer-app MODEL={{workflow.parameters.model}}"]
  24. CASE: ॏ͍ɾෆ҆ఆͳMLॲཧΛѻ͏ • ਪఆɾ༧ଌϞσϧͳͲͰ Stan Λଟ༻ • ϝϞϦɾCPUΛେྔʹফඅ͢Δ৔߹͸ઐ༻ͷϊʔυͰ࣮ߦ • αϯϓϦϯά͕֬཰తʹࣦഊ͢ΔͷͰϦτϥΠɾλΠϜΞ΢τ͕ඞཁ

    - name: train-predict activeDeadlineSeconds: 28800 # 8h retryStrategy: limit: 2 nodeSelector: cloud.google.com/gke-nodepool: highmem-pool container: resources: limits: memory: "32Gi"
  25. CASE: Ϟσϧਪఆͷಈతͳฒྻ࣮ߦ • όϯσΟοτπʔϧͰ͸࣮ࢪதͷςετ͝ͱʹਪఆॲཧ͕ඞཁ • ֤ςετͷਪఆॲཧΛಈతʹฒྻ࣮ߦ templates: - name: entrypoint

    steps: # ਪఆॲཧ͕ඞཁͳςετΛϦετΞοϓ - - name: list-experiments # ਪఆॲཧ͕ඞཁͳςετΛϦετΞοϓ # લͷεςοϓͷग़ྗ͔ΒύϥϝʔλͷϦετΛಡΈࠐΈ - - withParams: "{{steps.list-experiments.outputs.parameters.experiments}}" # Ϧετͷཁૉ͝ͱʹޙଓͷεςοϓΛ࣮ߦ name: calc-weights arguments: parameters: [{name: experimentId, value: "{{item.experimentId}}"}]
  26. ӡ༻ TIPS Argo Workflow ͷ Web UI ΁ͷΞΫηε • σϑΥϧτͰ͸

    kubectl port-forward ͰΞΫηε͢Δඞཁ͕͋Δ • ΠϯλʔωοτΞΫηεΛՄೳʹ͢Δʹ͸ Ingress ͰϩʔυόϥϯαΛཱͯΔ • GCP ͷ Identity-Aware Proxy Λ࢖͏ͱϩʔυόϥϯαଆͰೝূΛ͔͚ΒΕΔ ݹ͍ϫʔΫϑϩʔͷΫϦʔϯΞοϓ • ࣮ߦࡁΈͷ Workflow ͱͦͷ؅ཧ͢Δ Pod ͸ Successful ͷ··࢒Γଓ͚Δ • ఆظతʹݹ͍ Workflow Λ࡟আ͢Δ CronJob Λཱ͍ͯͯΔ • argo delete --older Φϓγϣϯ͕ศར
  27. Argo Workflow - Cons ଞͷϫʔΫϑϩʔΤϯδϯ΄ͲϓϩάϥϚϒϧͰ͸ͳ͍ • Airflow, Luigi ͷΑ͏ʹ Python

    DSL ͕ॻ͚ͨΓ͸͠ͳ͍ • ֤Ϋϥ΢υαʔϏεઐ༻ͷΦϖϨʔλ͸༻ҙ͞Ε͍ͯͳ͍ ࡞੒͞ΕͨϫʔΫϑϩʔΛଈ࣮࣌ߦ͢ΔҎ֎ͷػೳ͸΋ͨͳ͍ • ఆظ࣮ߦʹ͸ CronJob ͳͲΛ࢖͏ඞཁ͕͋Δ • Web UI ͸ϞχλϦϯάͷΈͰϦτϥΠͳͲͷૢ࡞͸Ͱ͖ͳ͍ • ϫʔΫϑϩʔࣗମͷςϯϓϨʔτԽɾ࠶ར༻͕͠ʹ͍͘ • WorkflowTemplate ͕ఏҊ͞Ε͍ͯΔͷͰظ଴
  28. ·ͱΊ ͳͥ Argo Workflow ͕ඞཁ͔ͩͬͨ • ෳ਺ͷαʔϏεͰMLγεςϜΛར༻ • ଟ͘ͷεςοϓ͔ΒͳΔόονॲཧ͕ෳ਺ଘࡏ •

    ։ൃɾӡ༻Λޮ཰Խ͢ΔͨΊίϯϙʔωϯτΛ෼ׂͯ͠ίϯςφԽ Argo Workflow ΛͲ͏࢖͍ͬͯΔ͔ • ίϯςφίϯϙʔωϯτΛ૊Έ߹ΘͤͯϫʔΫϑϩʔΛߏங • MLγεςϜͷ։ൃɾӡ༻্ͷ౎߹ʹ߹Θ֤ͤͯछػೳΛ׆༻