Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leveraging Kubernetes for Machine Learning

Leveraging Kubernetes for Machine Learning

Christopher M Luciano

June 15, 2017
Tweet

More Decks by Christopher M Luciano

Other Decks in Programming

Transcript

  1. ./speakerQuery.sh • Christopher M Luciano • Advisory Software Engineer @

    IBM • Part of Open Source Technology Team in IBM Digital Business Group • Contributor to Kubernetes (SIG Network & SIG Node) • Work on Cloud Native Computing Foundation (CNCF) projects • @cmluciano_ on Twitter • github.com/cmluciano
  2. The Goal • Base Knowledge -> • Points of Analysis

    -> • Corpus of Unstructured -> • Error Correction -> • Rinse and Repeat
  3. Stacks on Stacks on Stacks Bare-metal -> Openstack -> Virtual

    Machines -> Runtime -> Kubernetes -> Tensorflow
  4. GPU Characteristics • Multiple Video Cards • Driver installation •

    Heterogenous model distribution • Resource fragmentation • Failure scenarios
  5. Kubernetes 1.6 GPU features • Multi-GPU pod scheduling • Video

    card discovery • Basic failure recovery • Only works with Docker
  6. Where Are We Going? • Advanced – Device Recovery –

    Health checking features – Topology – Metrics, metrics, metrics • Cleanup – Support in Container Resource Interface (CRI) • Container runtime independence – Use of NVML or libnvidia-container
  7. Ask Me Anything on Kubernetes • Use Cases for GPU/HPC

    (High Performance Computing) • Kubernetes Networking • Kubernetes Features • Prometheus • Other CNCF
  8. If You Just Want to Talk • Cars • Coffee

    • Cooking • Fishing • World Culture
  9. Thank You! • Christopher M Luciano • Advisory Software Engineer

    @ IBM • Part of Open Source Technology Team in IBM Digital Business Group • Contributor to Kubernetes (SIG Network & SIG Node) • Work on Cloud Native Computing Foundation (CNCF) projects • @cmluciano_ on Twitter • github.com/cmluciano