Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dask Summit: Native Cloud Deployment with Dask-Cloudprovider

Dask Summit: Native Cloud Deployment with Dask-Cloudprovider

Ca3d0556227d66b3c15be1eadf69473b?s=128

Jacob Tomlinson

February 27, 2020
Tweet

Transcript

  1. dask-cloudprovider

  2. Zero to Dask on the cloud

  3. Overview of a cluster manager Cluster Scheduler Worker Worker Worker

    Client Cloud resource Cloud resource Cloud resource Cloud resource Array Dataframe Bag ML Xarray...
  4. Cloud platform service types Functions Machine learning Kubernetes Containers Batch

    VMs ...as a service
  5. Ephemeral nature of clusters Cluster Cluster Cloud resource Cloud resource

    $ 0 $$ $$ 0
  6. AWS from dask_cloudprovider import FargateCluster cluster = FargateCluster() cluster.scale(10) from

    dask_cloudprovider import ECSCluster cluster = ECSCluster( cluster_arn="arn" ) cluster.scale(10) AWS Fargate • Managed container platform • Scale by CPU and Memory • Billing per CPU/memory second • Low account limits (~50 workers) AWS Elastic Container Service • Unmanaged container platform • Full control over VM type (GPU, ARM) • Scale by VMs • Billing per VM second
  7. AzureML - Targets Data Scientists from all backgrounds in enterprise

    settings - Easy to use interfaces for interacting with cloud resources (GUI, Python SDK, R SDK, ML CLI) - Powerful hundred node clusters of Azure CPU or GPU VMs for various workloads
  8. AzureML - Targets Data Scientists from all backgrounds in enterprise

    settings - Easy to use interfaces for interacting with cloud resources (GUI, Python SDK, R SDK, ML CLI) - Powerful hundred node clusters of Azure CPU or GPU VMs for various workloads Data science, ML Software development
  9. AzureML - Targets Data Scientists from all backgrounds in enterprise

    settings - Easy to use interfaces for interacting with cloud resources (GUI, Python SDK, R SDK, ML CLI) - Powerful hundred node clusters of Azure CPU or GPU VMs for various workloads Data science, ML Software development Distributed systems and HPC
  10. AzureML | dask-cloudprovider # import from Azure ML Python SDK

    and Dask from azureml.core import Workspace from dask.distributed import Client from dask_cloudprovider import AzureMLCluster # specify Workspace - authenticate interactively or otherwise ws = Workspace.from_config() # see https://aka.ms/azureml/workspace # get (or create) desired Compute Target and Environment (base image + conda/pip installs) ct = ws.compute_targets[‘cpu-cluster’] # see https://aka.ms/azureml/computetarget env = ws.environments[‘AzureML-Dask-CPU’] # see https://aka.ms/azureml/environments # start cluster, print widget and links cluster = AzureMLCluster(ws, ct, env, initial_node_count=100, jupyter=True) # optionally, use directly in client c = Client(cluster) # optionally, use directly in Client
  11. V100s! DS14_V2

  12. AzureML | Architecture • Derives from the distributed.deploy.cluster.Cluster class •

    Starts the scheduler via an experiment run • Headnode also runs a worker (maximize resources utilization) • Submits an experiment run for each worker • Port forwarding: • Port mapping via socat if on the same VNET • SSH-tunnel port forward otherwise (needs SSH creds) https://github.com/dask/dask-cloudprovider/pull/67
  13. AzureML | Links https://github.com/dask/dask-cloudprovider/pull/67 - cody.peterson@microsoft.com - Cody - PM

    @ Azure ML - todrabas@microsoft.com - Tom - Senior Data Scientist @ Azure ML - https://github.com/lostmygithubaccount/dasky - CPU demos - https://github.com/drabastomek/GTC - GPU demos - @tomekdrabas @codydkdc - Twitter - NVIDIA’s GTC in San Jose and Microsoft’s //build in Seattle
  14. AzureML | GPU overview

  15. AzureML | Run architecture