Slide 1

Slide 1 text

dask-cloudprovider

Slide 2

Slide 2 text

Zero to Dask on the cloud

Slide 3

Slide 3 text

Overview of a cluster manager Cluster Scheduler Worker Worker Worker Client Cloud resource Cloud resource Cloud resource Cloud resource Array Dataframe Bag ML Xarray...

Slide 4

Slide 4 text

Cloud platform service types Functions Machine learning Kubernetes Containers Batch VMs ...as a service

Slide 5

Slide 5 text

Ephemeral nature of clusters Cluster Cluster Cloud resource Cloud resource $ 0 $$ $$ 0

Slide 6

Slide 6 text

AWS from dask_cloudprovider import FargateCluster cluster = FargateCluster() cluster.scale(10) from dask_cloudprovider import ECSCluster cluster = ECSCluster( cluster_arn="arn" ) cluster.scale(10) AWS Fargate ● Managed container platform ● Scale by CPU and Memory ● Billing per CPU/memory second ● Low account limits (~50 workers) AWS Elastic Container Service ● Unmanaged container platform ● Full control over VM type (GPU, ARM) ● Scale by VMs ● Billing per VM second

Slide 7

Slide 7 text

AzureML - Targets Data Scientists from all backgrounds in enterprise settings - Easy to use interfaces for interacting with cloud resources (GUI, Python SDK, R SDK, ML CLI) - Powerful hundred node clusters of Azure CPU or GPU VMs for various workloads

Slide 8

Slide 8 text

AzureML - Targets Data Scientists from all backgrounds in enterprise settings - Easy to use interfaces for interacting with cloud resources (GUI, Python SDK, R SDK, ML CLI) - Powerful hundred node clusters of Azure CPU or GPU VMs for various workloads Data science, ML Software development

Slide 9

Slide 9 text

AzureML - Targets Data Scientists from all backgrounds in enterprise settings - Easy to use interfaces for interacting with cloud resources (GUI, Python SDK, R SDK, ML CLI) - Powerful hundred node clusters of Azure CPU or GPU VMs for various workloads Data science, ML Software development Distributed systems and HPC

Slide 10

Slide 10 text

AzureML | dask-cloudprovider # import from Azure ML Python SDK and Dask from azureml.core import Workspace from dask.distributed import Client from dask_cloudprovider import AzureMLCluster # specify Workspace - authenticate interactively or otherwise ws = Workspace.from_config() # see https://aka.ms/azureml/workspace # get (or create) desired Compute Target and Environment (base image + conda/pip installs) ct = ws.compute_targets[‘cpu-cluster’] # see https://aka.ms/azureml/computetarget env = ws.environments[‘AzureML-Dask-CPU’] # see https://aka.ms/azureml/environments # start cluster, print widget and links cluster = AzureMLCluster(ws, ct, env, initial_node_count=100, jupyter=True) # optionally, use directly in client c = Client(cluster) # optionally, use directly in Client

Slide 11

Slide 11 text

V100s! DS14_V2

Slide 12

Slide 12 text

AzureML | Architecture • Derives from the distributed.deploy.cluster.Cluster class • Starts the scheduler via an experiment run • Headnode also runs a worker (maximize resources utilization) • Submits an experiment run for each worker • Port forwarding: • Port mapping via socat if on the same VNET • SSH-tunnel port forward otherwise (needs SSH creds) https://github.com/dask/dask-cloudprovider/pull/67

Slide 13

Slide 13 text

AzureML | Links https://github.com/dask/dask-cloudprovider/pull/67 - [email protected] - Cody - PM @ Azure ML - [email protected] - Tom - Senior Data Scientist @ Azure ML - https://github.com/lostmygithubaccount/dasky - CPU demos - https://github.com/drabastomek/GTC - GPU demos - @tomekdrabas @codydkdc - Twitter - NVIDIA’s GTC in San Jose and Microsoft’s //build in Seattle

Slide 14

Slide 14 text

AzureML | GPU overview

Slide 15

Slide 15 text

AzureML | Run architecture