Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubeflow: Tus herramientas de ML en Kubernetes

Kubeflow: Tus herramientas de ML en Kubernetes

Las herramientas de ML están cada vez más presentes y nos ayudan a resolver más problemas en nuestro día a día. Por ello necesitamos ser capaces de montar infraestructuras que simplifiquen el flujo de trabajo de científicos de datos y admiradores de estas tecnologías.

En esta charla veremos cómo Kubeflow nos permite ejecutar y desplegar diferentes de esas herramientas de una forma fácil y rápida en un cluster de Kubernetes. De esta forma podremos aprovechar la portabilidad y diversidad de hardware que Kubernetes nos provee y utilizarlo para entrenar y servir nuestros modelos.

Laura Morillo-Velarde

November 24, 2018
Tweet

More Decks by Laura Morillo-Velarde

Other Decks in Technology

Transcript

  1. Laura Morillo-Velarde • Tech Lead at seedtag • GDG Madrid

    Organizer / WTM Lead • GDE Google Cloud • 11 years working with different technologies • Twitter: @Laura_Morillo
  2. Cluster Kubernetes Node 1 (VM or physical machine) Node 2

    (VM or physical machine) Node 3 (VM or physical machine)
  3. Project to make deployment of machine learning on Kubernetes simple,

    portable and scalable. Based on ksonnet. Kubeflow
  4. TensorFlow Machine Learning framework, primarily conceived for deep neural network

    models. It provides several high level APIs to build complex and powered-up estimators fairly accessible to developers. It also packs a variety of tools for serving, logging and tracking the performance of the models.
  5. [training] parameter servers and workers Workers Lorem ipsum dolor sit

    dolor amet, consectetur nec adipiscing elit, sed do ipsum eiusmod tempor. Donec facilisis lacus eget sit nec lorem mauris. Parameter servers Lorem ipsum dolor sit dolor amet, consectetur nec adipiscing elit, sed do ipsum eiusmod tempor. Donec facilisis lacus eget sit nec lorem mauris.
  6. [training] cluster spec and servers cluster_spec = tf.train.ClusterSpec({ "ps": [

    "127.0.0.1:2221", # /job:ps/task:0 ], "worker": [ "127.0.0.1:2223", # /job:worker/task:0 "127.0.0.1:2224", # /job:worker/task:1 ]}) The cluster specification dictionary maps job names to lists of network addresses. A ClusterSpec represents the set of processes that participate in a distributed TensorFlow computation.
  7. [training] cluster spec and servers server = tf.train.Server( cluster_spec, job_name="worker",

    task_index=0 ) A Server object contains a set of local devices, a set of connections to other tasks in its ClusterSpec, and a Session that can use these to perform a distributed computation. Each server is a member of a specific named job and has a task index within that job. A server can communicate with any other server in the cluster.
  8. [training] distributed training with tensorflow distributed TensorFlow architecture m Lorem

    ipsum dolor w2 Lorem ipsum dolor ps Lorem ipsum dolor ps Lorem ipsum dolor w1 Lorem ipsum dolor Lorem ipsum dolor
  9. [training] TFJob apiVersion: kubeflow.org/v1alpha2 kind: TFJob metadata: name: my-tfjob spec:

    tfReplicaSpecs: Worker: replicas: 2 template: spec: containers: - image: repo/image name: tensorflow restartPolicy: OnFailure kind: TFJob tfReplicaSpecs: TfReplicaType WORKER, PS replicas: number of replicas for this job template: as usual
  10. [training] TF_CONFIG { "master":[ "distributed-mnist-master-5oz2-0:2222" ], "ps":[ "distributed-mnist-ps-5oz2-0:2222" ], "worker":[

    "distributed-mnist-worker-5oz2-0:2222", "distributed-mnist-worker-5oz2-1:2222" ] } cluster_spec = ...
  11. [serving] docker and local server TensorFlow Serving TensorFlow Serving is

    a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.
  12. [serving] $ ks generate tf-serving dino-serving --name=trex $ ks param

    set dino-serving modelPath gs://demo-dino $ ks param set dino-serving deployHttpProxy true