RPC Library for Distributed Processing Used for Machine Learning

Machine Learning Job with Kubernetes - Machine learning jobs run
on multiple nodes that communicate with each other. - CPU nodes fetch data from storage. - After preprocessing, sends it to the GPU node. - Pods can have access credentials for the storage using Secret. - Resource limits (CPU, Mem, GPU). - Collect and read logs from each Pod. $16 /PEF $16 /PEF (16 /PEF 1PE 5SBJO 1PE 1PE 1PE 1PE data storage (16 /PEF 5SBJO 1PE

apiVersion: batch/v1 kind: Job metadata: generateName: hello- spec: backoffLimit: 0
ttlSecondsAfterFinished: 10 template: spec: restartPolicy: Never containers: - name: job image: bash args: - echo - Hello, world! resources: limits: cpu: 200m memory: 200Mi

ttlSecondsAfterFinished: 10 template: spec: restartPolicy: Never containers: - name: job image: bash args: - echo - Hello, world! resources: limits: cpu: 200m memory: 200Mi Add suffixes to make it re-runnable

ttlSecondsAfterFinished: 10 template: spec: restartPolicy: Never containers: - name: job image: bash args: - echo - Hello, world! resources: limits: cpu: 200m memory: 200Mi Set for all jobs to avoid retry

ttlSecondsAfterFinished: 10 template: spec: restartPolicy: Never containers: - name: job image: bash args: - echo - Hello, world! resources: limits: cpu: 200m memory: 200Mi cleaned up TTL seconds after the resource has finished

ttlSecondsAfterFinished: 10 template: spec: restartPolicy: Never containers: - name: job image: bash args: - echo - Hello, world! resources: limits: cpu: 200m memory: 200Mi Image name and command arguments

ttlSecondsAfterFinished: 10 template: spec: restartPolicy: Never containers: - name: job image: bash args: - echo - Hello, world! resources: limits: cpu: 200m memory: 200Mi CPU and memory usage can be limited

ttlSecondsAfterFinished: 10 template: spec: restartPolicy: Never containers: - name: job image: bash args: - echo - Hello, world! resources: limits: cpu: 200m memory: 200Mi kubectl create -f example.yaml kubectl logs -f -l job-name=hello-xxxxx

apiVersion: batch/v1 kind: Job metadata: generateName: ${job_prefix}- spec: backoffLimit: 0
template: spec: restartPolicy: Never nodeSelector: example/gpu: "v100" tolerations: - key: nvidia.com/gpu effect: NoSchedule operator: Exists volumes: - name: dshm emptyDir: medium: Memory containers: - name: job image: ${IMAGE} args: [bash, -c, "${args}"] env: - name: ACCESS_ID valueFrom: secretKeyRef: name: my-storage key: access_id - name: SECRET_KEY valueFrom: secretKeyRef: name: my-storage key: secret_key resources: limits: nvidia.com/gpu: 1 cpu: 30 memory: 100Gi volumeMounts: - mountPath: /dev/shm name: dshm set -eux export template_yaml=$1 export job_prefix=$2 echo IMAGE: $IMAGE if [ "$#" -gt 3 ]; then shift 2 export args="$@" fi basedir=$(dirname "$0") jobyaml=$job_prefix.yaml tmp_job=$(mktemp) envsubst < $basedir/$template_yaml > $jobyaml cat $jobyaml trap 'kubectl delete job $job_name; rm -f $tmp_job' EXIT kubectl create -f $jobyaml -o json > $tmp_job namespace=$(jq -r .metadata.namespace $tmp_job) job_name=$(jq -r .metadata.name $tmp_job) for ((k = 0; k < 20; ++k)); do # wait until pod is ready if kubectl wait --for=condition=ready pod -l job-name=$job_name --timeout=10s -n ${namespace}; then kubectl logs -f -n ${namespace} -l job-name=$job_name break else # check init status initStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{if .state.terminated}}{{"Exit"}}{{end}}{{end}}{{"\n"}}{{end}}' -n ${namespace} -l job-name=${job_name}) if [ X"${initStatus}" != X ]; then break fi fi done errorStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{.state.terminated.exitCode}}{{end}}{{end}}' -n ${namespace} -l job-name=${job_name} | sed 's/0\.//g') if [ -n "${errorStatus}" ]; then echo "${errorStatus}" exit 1 fi

template: spec: restartPolicy: Never nodeSelector: example/gpu: "v100" tolerations: - key: nvidia.com/gpu effect: NoSchedule operator: Exists volumes: - name: dshm emptyDir: medium: Memory containers: - name: job image: ${IMAGE} args: [bash, -c, "${args}"] env: - name: ACCESS_ID valueFrom: secretKeyRef: name: my-storage key: access_id - name: SECRET_KEY valueFrom: secretKeyRef: name: my-storage key: secret_key resources: limits: nvidia.com/gpu: 1 cpu: 30 memory: 100Gi volumeMounts: - mountPath: /dev/shm name: dshm set -eux export template_yaml=$1 export job_prefix=$2 echo IMAGE: $IMAGE if [ "$#" -gt 3 ]; then shift 2 export args="$@" fi basedir=$(dirname "$0") jobyaml=$job_prefix.yaml tmp_job=$(mktemp) envsubst < $basedir/$template_yaml > $jobyaml cat $jobyaml trap 'kubectl delete job $job_name; rm -f $tmp_job' EXIT kubectl create -f $jobyaml -o json > $tmp_job namespace=$(jq -r .metadata.namespace $tmp_job) job_name=$(jq -r .metadata.name $tmp_job) for ((k = 0; k < 20; ++k)); do # wait until pod is ready if kubectl wait --for=condition=ready pod -l job-name=$job_name --timeout=10s -n ${namespace}; then kubectl logs -f -n ${namespace} -l job-name=$job_name break else # check init status initStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{if .state.terminated}}{{"Exit"}}{{end}}{{end}}{{"\n"}}{{end}}' -n ${namespace} -l job-name=${job_name}) if [ X"${initStatus}" != X ]; then break fi fi done errorStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{.state.terminated.exitCode}}{{end}}{{end}}' -n ${namespace} -l job-name=${job_name} | sed 's/0\.//g') if [ -n "${errorStatus}" ]; then echo "${errorStatus}" exit 1 fi An image and args can be replaced

template: spec: restartPolicy: Never nodeSelector: example/gpu: "v100" tolerations: - key: nvidia.com/gpu effect: NoSchedule operator: Exists volumes: - name: dshm emptyDir: medium: Memory containers: - name: job image: ${IMAGE} args: [bash, -c, "${args}"] env: - name: ACCESS_ID valueFrom: secretKeyRef: name: my-storage key: access_id - name: SECRET_KEY valueFrom: secretKeyRef: name: my-storage key: secret_key resources: limits: nvidia.com/gpu: 1 cpu: 30 memory: 100Gi volumeMounts: - mountPath: /dev/shm name: dshm set -eux export template_yaml=$1 export job_prefix=$2 echo IMAGE: $IMAGE if [ "$#" -gt 3 ]; then shift 2 export args="$@" fi basedir=$(dirname "$0") jobyaml=$job_prefix.yaml tmp_job=$(mktemp) envsubst < $basedir/$template_yaml > $jobyaml cat $jobyaml trap 'kubectl delete job $job_name; rm -f $tmp_job' EXIT kubectl create -f $jobyaml -o json > $tmp_job namespace=$(jq -r .metadata.namespace $tmp_job) job_name=$(jq -r .metadata.name $tmp_job) for ((k = 0; k < 20; ++k)); do # wait until pod is ready if kubectl wait --for=condition=ready pod -l job-name=$job_name --timeout=10s -n ${namespace}; then kubectl logs -f -n ${namespace} -l job-name=$job_name break else # check init status initStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{if .state.terminated}}{{"Exit"}}{{end}}{{end}}{{"\n"}}{{end}}' -n ${namespace} -l job-name=${job_name}) if [ X"${initStatus}" != X ]; then break fi fi done errorStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{.state.terminated.exitCode}}{{end}}{{end}}' -n ${namespace} -l job-name=${job_name} | sed 's/0\.//g') if [ -n "${errorStatus}" ]; then echo "${errorStatus}" exit 1 fi common configurations for GPU

template: spec: restartPolicy: Never nodeSelector: example/gpu: "v100" tolerations: - key: nvidia.com/gpu effect: NoSchedule operator: Exists volumes: - name: dshm emptyDir: medium: Memory containers: - name: job image: ${IMAGE} args: [bash, -c, "${args}"] env: - name: ACCESS_ID valueFrom: secretKeyRef: name: my-storage key: access_id - name: SECRET_KEY valueFrom: secretKeyRef: name: my-storage key: secret_key resources: limits: nvidia.com/gpu: 1 cpu: 30 memory: 100Gi volumeMounts: - mountPath: /dev/shm name: dshm set -eux export template_yaml=$1 export job_prefix=$2 echo IMAGE: $IMAGE if [ "$#" -gt 3 ]; then shift 2 export args="$@" fi basedir=$(dirname "$0") jobyaml=$job_prefix.yaml tmp_job=$(mktemp) envsubst < $basedir/$template_yaml > $jobyaml cat $jobyaml trap 'kubectl delete job $job_name; rm -f $tmp_job' EXIT kubectl create -f $jobyaml -o json > $tmp_job namespace=$(jq -r .metadata.namespace $tmp_job) job_name=$(jq -r .metadata.name $tmp_job) for ((k = 0; k < 20; ++k)); do # wait until pod is ready if kubectl wait --for=condition=ready pod -l job-name=$job_name --timeout=10s -n ${namespace}; then kubectl logs -f -n ${namespace} -l job-name=$job_name break else # check init status initStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{if .state.terminated}}{{"Exit"}}{{end}}{{end}}{{"\n"}}{{end}}' -n ${namespace} -l job-name=${job_name}) if [ X"${initStatus}" != X ]; then break fi fi done errorStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{.state.terminated.exitCode}}{{end}}{{end}}' -n ${namespace} -l job-name=${job_name} | sed 's/0\.//g') if [ -n "${errorStatus}" ]; then echo "${errorStatus}" exit 1 fi For ObjectStorage

template: spec: restartPolicy: Never nodeSelector: example/gpu: "v100" tolerations: - key: nvidia.com/gpu effect: NoSchedule operator: Exists volumes: - name: dshm emptyDir: medium: Memory containers: - name: job image: ${IMAGE} args: [bash, -c, "${args}"] env: - name: ACCESS_ID valueFrom: secretKeyRef: name: my-storage key: access_id - name: SECRET_KEY valueFrom: secretKeyRef: name: my-storage key: secret_key resources: limits: nvidia.com/gpu: 1 cpu: 30 memory: 100Gi volumeMounts: - mountPath: /dev/shm name: dshm set -eux export template_yaml=$1 export job_prefix=$2 echo IMAGE: $IMAGE if [ "$#" -gt 3 ]; then shift 2 export args="$@" fi basedir=$(dirname "$0") jobyaml=$job_prefix.yaml tmp_job=$(mktemp) envsubst < $basedir/$template_yaml > $jobyaml cat $jobyaml trap 'kubectl delete job $job_name; rm -f $tmp_job' EXIT kubectl create -f $jobyaml -o json > $tmp_job namespace=$(jq -r .metadata.namespace $tmp_job) job_name=$(jq -r .metadata.name $tmp_job) for ((k = 0; k < 20; ++k)); do # wait until pod is ready if kubectl wait --for=condition=ready pod -l job-name=$job_name --timeout=10s -n ${namespace}; then kubectl logs -f -n ${namespace} -l job-name=$job_name break else # check init status initStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{if .state.terminated}}{{"Exit"}}{{end}}{{end}}{{"\n"}}{{end}}' -n ${namespace} -l job-name=${job_name}) if [ X"${initStatus}" != X ]; then break fi fi done errorStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{.state.terminated.exitCode}}{{end}}{{end}}' -n ${namespace} -l job-name=${job_name} | sed 's/0\.//g') if [ -n "${errorStatus}" ]; then echo "${errorStatus}" exit 1 fi "logs" will fail if called before the pod is started. "wait" waits until the pod goes running. It means that if the pod fails immediately after created, It will wait forever. 1SFQBSJOH 'JOJTIFE 3VOOJOH logs wait

template: spec: restartPolicy: Never nodeSelector: example/gpu: "v100" tolerations: - key: nvidia.com/gpu effect: NoSchedule operator: Exists volumes: - name: dshm emptyDir: medium: Memory containers: - name: job image: ${IMAGE} args: [bash, -c, "${args}"] env: - name: ACCESS_ID valueFrom: secretKeyRef: name: my-storage key: access_id - name: SECRET_KEY valueFrom: secretKeyRef: name: my-storage key: secret_key resources: limits: nvidia.com/gpu: 1 cpu: 30 memory: 100Gi volumeMounts: - mountPath: /dev/shm name: dshm set -eux export template_yaml=$1 export job_prefix=$2 echo IMAGE: $IMAGE if [ "$#" -gt 3 ]; then shift 2 export args="$@" fi basedir=$(dirname "$0") jobyaml=$job_prefix.yaml tmp_job=$(mktemp) envsubst < $basedir/$template_yaml > $jobyaml cat $jobyaml trap 'kubectl delete job $job_name; rm -f $tmp_job' EXIT kubectl create -f $jobyaml -o json > $tmp_job namespace=$(jq -r .metadata.namespace $tmp_job) job_name=$(jq -r .metadata.name $tmp_job) for ((k = 0; k < 20; ++k)); do # wait until pod is ready if kubectl wait --for=condition=ready pod -l job-name=$job_name --timeout=10s -n ${namespace}; then kubectl logs -f -n ${namespace} -l job-name=$job_name break else # check init status initStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{if .state.terminated}}{{"Exit"}}{{end}}{{end}}{{"\n"}}{{end}}' -n ${namespace} -l job-name=${job_name}) if [ X"${initStatus}" != X ]; then break fi fi done errorStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{.state.terminated.exitCode}}{{end}}{{end}}' -n ${namespace} -l job-name=${job_name} | sed 's/0\.//g') if [ -n "${errorStatus}" ]; then echo "${errorStatus}" exit 1 fi "logs" will fail if called before the pod is started. "wait" waits until the pod goes running. It means that if the pod fails immediately after created, It will wait forever. 1SFQBSJOH 'BJMFE wait forever

template: spec: restartPolicy: Never nodeSelector: example/gpu: "v100" tolerations: - key: nvidia.com/gpu effect: NoSchedule operator: Exists volumes: - name: dshm emptyDir: medium: Memory containers: - name: job image: ${IMAGE} args: [bash, -c, "${args}"] env: - name: ACCESS_ID valueFrom: secretKeyRef: name: my-storage key: access_id - name: SECRET_KEY valueFrom: secretKeyRef: name: my-storage key: secret_key resources: limits: nvidia.com/gpu: 1 cpu: 30 memory: 100Gi volumeMounts: - mountPath: /dev/shm name: dshm set -eux export template_yaml=$1 export job_prefix=$2 echo IMAGE: $IMAGE if [ "$#" -gt 3 ]; then shift 2 export args="$@" fi basedir=$(dirname "$0") jobyaml=$job_prefix.yaml tmp_job=$(mktemp) envsubst < $basedir/$template_yaml > $jobyaml cat $jobyaml trap 'kubectl delete job $job_name; rm -f $tmp_job' EXIT kubectl create -f $jobyaml -o json > $tmp_job namespace=$(jq -r .metadata.namespace $tmp_job) job_name=$(jq -r .metadata.name $tmp_job) for ((k = 0; k < 20; ++k)); do # wait until pod is ready if kubectl wait --for=condition=ready pod -l job-name=$job_name --timeout=10s -n ${namespace}; then kubectl logs -f -n ${namespace} -l job-name=$job_name break else # check init status initStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{if .state.terminated}}{{"Exit"}}{{end}}{{end}}{{"\n"}}{{end}}' -n ${namespace} -l job-name=${job_name}) if [ X"${initStatus}" != X ]; then break fi fi done errorStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{.state.terminated.exitCode}}{{end}}{{end}}' -n ${namespace} -l job-name=${job_name} | sed 's/0\.//g') if [ -n "${errorStatus}" ]; then echo "${errorStatus}" exit 1 fi "logs" will fail if called before the pod is started. "wait" waits until the pod goes running. It means that if the pod fails immediately after created, It will wait forever. 1SFQBSJOH 'JOJTIFE 3VOOJOH logs failed

template: spec: restartPolicy: Never nodeSelector: example/gpu: "v100" tolerations: - key: nvidia.com/gpu effect: NoSchedule operator: Exists volumes: - name: dshm emptyDir: medium: Memory containers: - name: job image: ${IMAGE} args: [bash, -c, "${args}"] env: - name: ACCESS_ID valueFrom: secretKeyRef: name: my-storage key: access_id - name: SECRET_KEY valueFrom: secretKeyRef: name: my-storage key: secret_key resources: limits: nvidia.com/gpu: 1 cpu: 30 memory: 100Gi volumeMounts: - mountPath: /dev/shm name: dshm set -eux export template_yaml=$1 export job_prefix=$2 echo IMAGE: $IMAGE if [ "$#" -gt 3 ]; then shift 2 export args="$@" fi basedir=$(dirname "$0") jobyaml=$job_prefix.yaml tmp_job=$(mktemp) envsubst < $basedir/$template_yaml > $jobyaml cat $jobyaml trap 'kubectl delete job $job_name; rm -f $tmp_job' EXIT kubectl create -f $jobyaml -o json > $tmp_job namespace=$(jq -r .metadata.namespace $tmp_job) job_name=$(jq -r .metadata.name $tmp_job) for ((k = 0; k < 20; ++k)); do # wait until pod is ready if kubectl wait --for=condition=ready pod -l job-name=$job_name --timeout=10s -n ${namespace}; then kubectl logs -f -n ${namespace} -l job-name=$job_name break else # check init status initStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{if .state.terminated}}{{"Exit"}}{{end}}{{end}}{{"\n"}}{{end}}' -n ${namespace} -l job-name=${job_name}) if [ X"${initStatus}" != X ]; then break fi fi done errorStatus=$(kubectl get po -o go-template='{{range .items}}{{range .status.containerStatuses}}\ {{.state.terminated.exitCode}}{{end}}{{end}}' -n ${namespace} -l job-name=${job_name} | sed 's/0\.//g') if [ -n "${errorStatus}" ]; then echo "${errorStatus}" exit 1 fi Depending on the kubectl version, the exitCode can be 0 or 0.0.

swimmy.cmd python -m swimmy.cmd --name example-predict --image example.com/predictor python run-predict.py
--all - No need to Deploy anything before - Execute command on the image - Print pod logs and events -

swimmy.cmd python -m swimmy.cmd --name example-predict --image example.com/predictor --cpu 4000m
--mem 4Gi python run-predict.py --all - No need to Deploy anything before - Execute command on the image - Print pod logs and events - Resource limits -

--mem 4Gi -e KEY_SECRET -e STAGE=dev python run-predict.py --all - No need to Deploy anything before - Execute command on the image - Print pod logs and events - Resource limits - Environment variable -

--mem 4Gi -e KEY_SECRET -e STAGE=dev --template object-storage.yaml python run-predict.py --all - No need to Deploy anything before - Execute command on the image - Print pod logs and events - Resource limits - Environment variable - Pod template system - containers: - env: - name: ACCESS_ID valueFrom: secretKeyRef: name: my-storage key: access_id - name: SECRET_KEY valueFrom: secretKeyRef: name: my-storage key: secret_key

--mem 4Gi -e KEY_SECRET -e STAGE=dev --template object-storage.yaml --template gpu-v100.yaml python run-predict.py --all - No need to Deploy anything before - Execute command on the image - Print pod logs and events - Resource limits - Environment variable - Pod template system - nodeSelector: example/gpu: "v100" tolerations: - key: nvidia.com/gpu effect: NoSchedule operator: Exists containers: - resources: limits: nvidia.com/gpu: 1 volumeMounts: - mountPath: /dev/shm name: dshm

--mem 4Gi -e KEY_SECRET -e STAGE=dev --template object-storage --template configmap:mycm/gpu-v100.yaml python run-predict.py --all - No need to Deploy anything before - Execute command on the image - Print pod logs and events - Resource limits - Environment variable - Pod template system - 5FNQMBUFMPDBUJPOQSJPSJUZ QSFpYXJUIDPOpHNBQ MPDBMpMF SFBEBTDPOpHNBQTXJNNZUFNQMBUFT\UFNQMBUF^ZBNM

--mem 4Gi -e KEY_SECRET -e STAGE=dev --template object-storage.yaml --template gpu-v100.yaml --files run-predict.py python run-predict.py --all - No need to Deploy anything before - Execute command on the image - Print pod logs and events - Resource limits - Environment variable - Pod template system - Send any files -

--mem 4Gi -e KEY_SECRET -e STAGE=dev --template object-storage.yaml --template gpu-v100.yaml --files run-predict.py --connect-timeout 120 --ttl-seconds 3600 python run-predict.py --all - No need to Deploy anything before - Execute command on the image - Print pod logs and events - Resource limits - Environment variable - Pod template system - Send any files - TTL and startup time limit

--mem 4Gi -e KEY_SECRET -e STAGE=dev --template object-storage.yaml --template gpu-v100.yaml --files run-predict.py --connect-timeout 120 --ttl-seconds 3600 python run-predict.py --all - No need to Deploy anything before - Execute command on the image - Print pod logs and events - Resource limits - Environment variable - Pod template system - Send any files - TTL and startup time limit 4BNFWFSTJPOPG1ZUIPOJTSFRVJSFEJOUIFJNBHF

Hello, Swimmy! import asyncio import logging import swimmy import foo
@swimmy.remotefn def hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main())

import asyncio import logging import swimmy import foo @swimmy.remotefn def
hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS *OUFSOBMT Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS $SFBUF+PC kind: Role rules: - apiGroups: - "" resources: - configmaps - pods - pods/status - pods/log - events verbs: - get - list - watch - apiGroups: - batch resources: - jobs verbs: - create - patch

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS 1PE *OUFSOBMT

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS "HFOU 1:1* 'FUDI4XJNNZ *OUFSOBMT

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS "HFOU 1:1* *OUFSOBMT )BEPPQ ,BGLB %BUB'SBNF 3FEJT /VN1Z ,FFQ.JOJNBM %FQFOEFODJFT

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS "HFOU 1:1* 0QFO;.2$POOFDUJPO *OUFSOBMT

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS "HFOU 1:1* 0QFO;.2$POOFDUJPO 8IZ;.2 'BTU 4NBMM 'MFYJCMF )JHITQFFEBTZODISPOPVT*0 5JOZTJOHMFXIFFM 'JOFHSBJOFEqPXDPOUSPM *OUFSOBMT

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS "HFOU 1:1* )551GPSpMFT *OUFSOBMT

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS "HFOU 1:1* )FBMUI$IFDL *OUFSOBMT

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS "HFOU 1:1* &YUFOEBDUJWF%FBEMJOF4FDPOET *OUFSOBMT kind: Role rules: - apiGroups: - "" resources: - configmaps - pods - pods/status - pods/log - events verbs: - get - list - watch - apiGroups: - batch resources: - jobs verbs: - create - patch

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS "HFOU 1:1* 3FBEQPEMPHTBOEFWFOUT *OUFSOBMT kind: Role rules: - apiGroups: - "" resources: - configmaps - pods - pods/status - pods/log - events verbs: - get - list - watch - apiGroups: - batch resources: - jobs verbs: - create - patch

hello(name: str) -> str: return foo.bang(f"Hello, {name}") async def main(): async with swimmy.KubeCluster(name="swimmy-test") as cluster: x = await cluster.run( swimmy.PodSpec( image="example.com/py37", cpu="200m", mem="200Mi", count=2, pyfiles=["foo.py"], ) ) print(await x.remote(hello("Swimmy"))) logging.basicConfig(level=logging.INFO) asyncio.run(main()) %SJWFS Hello, Swimmy! ,VCFSOFUFT "1*4FSWFS +PC %SJWFS "HFOU 1:1* 31$ *OUFSOBMT

What is it used for? - Distributed Machine Learning Library
- Redis Monitoring - Scalable Load Testing

async def mpi_train(train, preprocess): async with swimmy.KubeCluster(name="mpi-train") as cluster: cp
= await cluster.run(preprocess.podspec, name='preprocess') ct = await cluster.run(train.podspec, name='train') conn = connection_manager(cp, ct) fp = cp.remote(mpi_preprocess(preprocess.func)) ft = ct.remote(mpi_fit(train.func)) tasks = [asyncio.ensure_future(t) for t in [conn, fp, ft]] await asyncio.gather(*tasks) Ghee $16 /PEF $16 /PEF (16 /PEF 1PE 5SBJO 1PE 1PE 1PE 1PE HDFS (16 /PEF 5SBJO 1PE 1Z"SSPX .1* ;.2

Redis Monitoring Redis Redis Redis Redis Redis Redis 4XJNNZ%SJWFS $MJFOU
"HFOU "HFOU "HFOU 10x performance Scanning 100M keys from the Redis-cluster, it took more than an hour

Load Testing Script 5BSHFUTFSWFST 4XJNNZ %SJWFS XSL 1ZUIPO 4DFOBSJPCBTFEUFTUJOH BOEWBMJEBUJPO
-PBE5FTUJOHXJUI-VB

Thank you! - Python library for Kubernetes Job - Send
function and files - Minimal dependencies - Pod template - Health check

RPC Library for Distributed Processing Used for...

RPC Library for Distributed Processing Used for Machine Learning

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Featured

Transcript