Kubeflow: Tus herramientas de ML en Kubernetes

Kubeflow Escalando tus modelos de TensorFlow con Kubernetes Laura Morillo-Velarde
Iván Cañaveral

Laura Morillo-Velarde • Tech Lead at seedtag • GDG Madrid
Organizer / WTM Lead • GDE Google Cloud • 11 years working with different technologies • Twitter: @Laura_Morillo

Iván Cañaveral • Data scientist at seedtag • Math lover
• :) poner foto de verdad

ML Code Development Production Serving Training at scale Data transformation
Data analysis Monitoring Logging

Kubernetes

Cluster Kubernetes Node 1 (VM or physical machine) Node 2
(VM or physical machine) Node 3 (VM or physical machine)

KubeFlow

Project to make deployment of machine learning on Kubernetes simple,
portable and scalable. Based on ksonnet. Kubeflow

Ksonnet

A framework for extensible Kubernetes configurations. Ksonnet

Ksonnet

Kubeflow + ksonnet TensorFlow Training Prototype (kubeflow) TensorFlow Serving Prototype
(kubeflow) JupyterHub Prototype (kubeflow)

TensorFlow

TensorFlow Machine Learning framework, primarily conceived for deep neural network
models. It provides several high level APIs to build complex and powered-up estimators fairly accessible to developers. It also packs a variety of tools for serving, logging and tracking the performance of the models.

training structure distributed strategies distributed tensorflow TFjob

[training] distributed basics

[training] clusters, jobs, tasks Cluster

[training] clusters, jobs, tasks task 0 task 1 task 2
task 3 Cluster

Vestibulum congue tempus [training] clusters, jobs, tasks Job 2 Job
1 task 0 task 1 task 2 task 3 Cluster

[training] parameter servers and workers Workers Lorem ipsum dolor sit
dolor amet, consectetur nec adipiscing elit, sed do ipsum eiusmod tempor. Donec facilisis lacus eget sit nec lorem mauris. Parameter servers Lorem ipsum dolor sit dolor amet, consectetur nec adipiscing elit, sed do ipsum eiusmod tempor. Donec facilisis lacus eget sit nec lorem mauris.

[training] cluster spec and servers cluster_spec = tf.train.ClusterSpec({ "ps": [
"127.0.0.1:2221", # /job:ps/task:0 ], "worker": [ "127.0.0.1:2223", # /job:worker/task:0 "127.0.0.1:2224", # /job:worker/task:1 ]}) The cluster specification dictionary maps job names to lists of network addresses. A ClusterSpec represents the set of processes that participate in a distributed TensorFlow computation.

[training] cluster spec and servers server = tf.train.Server( cluster_spec, job_name="worker",
task_index=0 ) A Server object contains a set of local devices, a set of connections to other tasks in its ClusterSpec, and a Session that can use these to perform a distributed computation. Each server is a member of a specific named job and has a task index within that job. A server can communicate with any other server in the cluster.

[training] cluster spec and servers

[training] distributing the work

[training] distributing the work parameter server

[training] distributing the work

[training] distributed tensorflow

[training] distributed training with tensorflow distributed TensorFlow architecture m Lorem
ipsum dolor w2 Lorem ipsum dolor ps Lorem ipsum dolor ps Lorem ipsum dolor w1 Lorem ipsum dolor Lorem ipsum dolor

[training] distributed training with tensorflow

[training] kubeflow

[training] TFJob apiVersion: kubeflow.org/v1alpha2 kind: TFJob metadata: name: my-tfjob spec:
tfReplicaSpecs: Worker: replicas: 2 template: spec: containers: - image: repo/image name: tensorflow restartPolicy: OnFailure kind: TFJob tfReplicaSpecs: TfReplicaType WORKER, PS replicas: number of replicas for this job template: as usual

[training] TF_CONFIG { "master":[ "distributed-mnist-master-5oz2-0:2222" ], "ps":[ "distributed-mnist-ps-5oz2-0:2222" ], "worker":[
"distributed-mnist-worker-5oz2-0:2222", "distributed-mnist-worker-5oz2-1:2222" ] } cluster_spec = ...

[training] up and running

serving TensorFlow Serving Docker Kubernetes/Kubeflow

[serving] tf serving

[serving] tfserving def myModel(x,y): compute things return value

[serving] docker and local server TensorFlow Serving TensorFlow Serving is
a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.

[serving] $ ks generate tf-serving dino-serving --name=trex $ ks param
set dino-serving modelPath gs://demo-dino $ ks param set dino-serving deployHttpProxy true

[serving] $ ks param set dino-serving serviceType LoadBalancer

[serving] now in kubernetes/kubeflow

Resources: • https://www.kubeflow.org/ • https://github.com/kubeflow/kubeflow • https://www.tensorflow.org/extend/architecture

Thanks! Laura Morillo-Velarde: @Laura_Morillo [email protected]

Kubeflow: Tus herramientas de ML en Kubernetes

Kubeflow: Tus herramientas de ML en Kubernetes

More Decks by Laura Morillo-Velarde

Other Decks in Technology

Featured

Transcript