Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOps for Machine Learning

DevOps for Machine Learning

With machine learning becoming more and more an engineering problem the need to track, work together and easily deploy ML experiments with integrated CI/CD tooling is becoming more relevant than ever.

In this session we take a deep-dive into the DevOps process that comes with Azure Machine Learning service, a cloud service that you can use to track as you build, train, deploy and manage models. We zoom into how the data science process can be made traceable and deploy the model with Azure DevOps to a Kubernetes cluster.

At the end of this session you have a good grasp of the technological building blocks of Azure machine learning services and can bring a machine learning project safely in

Henk Boelman

October 08, 2019
Tweet

More Decks by Henk Boelman

Other Decks in Programming

Transcript

  1. Sophisticated pretrained models To simplify solution development Azure Databricks Machine

    Learning VMs Popular frameworks To build advanced deep learning solutions TensorFlow Keras Pytorch Onnx Azure Machine Learning Language Speech … Azure Search Vision On-premises Cloud Edge Productive services To empower data science and development teams Powerful infrastructure To accelerate deep learning Flexible deployment To deploy and manage models on intelligent cloud and edge Machine Learning on Azure Cognitive Services
  2. DevOps is the union of people, process, and products to

    enable continuous delivery of value to your end users. “ ”
  3. Ask a sharp question Collect the data Prepare the data

    Select the algorithm Train the model Use the answer The data science process
  4. Azure Machine Learning Services A fully-managed cloud service that enables

    you to easily build, deploy, and share predictive analytics solutions.
  5. Sophisticated pretrained models To simplify solution development Azure Databricks Machine

    Learning VMs Popular frameworks To build advanced deep learning solutions TensorFlow Keras Pytorch Onnx Azure Machine Learning Language Speech … Azure Search Vision On-premises Cloud Edge Productive services To empower data science and development teams Powerful infrastructure To accelerate deep learning Flexible deployment To deploy and manage models on intelligent cloud and edge Machine Learning on Azure Cognitive Services
  6. Create a workspace ws = Workspace.create( name='<NAME>', subscription_id='<SUBSCRIPTION ID>', resource_group='<RESOURCE

    GROUP>', location='westeurope') ws.write_config() ws = Workspace.from_config() Create a workspace
  7. Datasets – registered, known data sets Experiments – Training runs

    Models – Registered, versioned models Endpoints: Real-time Endpoints – Deployed model endpoints Pipeline Endpoints – Training workflows Compute – Managed compute Datastores – Connections to data Azure Machine Learning Service
  8. Create Compute cfg = AmlCompute.provisioning_configuration( vm_size='STANDARD_NC6', min_nodes=1, max_nodes=6) cc =

    ComputeTarget.create(ws, '<NAME>', cfg) Create a workspace Create compute
  9. Azure Machine Learning Service Pipelines Workflows of steps that can

    use Data Sources, Datasets and Compute targets Unattended runs Reusability Tracking and versioning
  10. Azure Pipelines Orchestration for Continuous Integration and Continuous Delivery Gates,

    tasks and processes for quality Integration with other services Trigger on code and non-code events
  11. Create a pipeline step Input Output Runs a script on

    a Compute Target in a Docker container. Parameters
  12. Create a pipeline Dataset of Simpsons Images Prepare data Train

    the Model with a Tensorflow estimator Processed dataset model Register the model Blob Storage Account Model Management
  13. Create an estimator params = {'--data-folder': ws.get_default_datastore().as_mount()} trainEstimator = TensorFlow(

    source_directory = script_folder, script_params = params, compute_target = computeCluster, entry_script = 'train.py’, use_gpu = True, conda_packages = ['scikit-learn','keras','opencv’], framework_version='1.10') Create an Experiment Create a training file Create an estimator
  14. Create an Estimator Step Create an Experiment Create a training

    file Create an estimator trainOnGpuStep = EstimatorStep( name= 'Train Estimator Step', estimator= trainEstimator, inputs= [training_data_location], outputs= [model], compute_target= computeCluster, estimator_entry_script_arguments = estimator_script_params )
  15. Create, publish and run a pipeline Create an Experiment Create

    a training file Create an estimator prep_train_register = [preProcessStep,trainOnGpuStep,registerStep] pipeline = Pipeline(workspace=ws, steps=[prep_train_register]) mlpipeline = pipeline.publish(name="Marge Or Homer") pipeline_run = mlpipeline.submit(ws,experiment_name)
  16. Create an Experiment Create a training file Submit to the

    AI cluster Create an estimator Demo: Create an Azure ML Pipeline
  17. Azure Notebook Compute Target Experiment Docker Image Data store 1.

    Snapshot folder and send to experiment 2. create docker image 3. Deploy docker and snapshot to compute 4. Mount datastore to compute 6. Stream stdout, logs, metrics 5. Launch the script 7. Copy over outputs
  18. Create an Experiment Create a training file Submit to the

    AI cluster Create an estimator Register the model Demo: Test the model
  19. Code and comments only (not Jupyter output) Plus every part

    of the pipeline And Infrastructure and dependencies And maybe a subset of data Source Control
  20. Everything should be in source control! Except your training data

    which should be a known, shared data source
  21. Triggered on code change Refresh and execute AML Pipeline Code

    quality, linting, and unit testing Pull request process aka.ms/mitt/azuredevops Continuous Integration
  22. Project  App.py => run the model as an API

     Dockerfile => Docker Configuration  Requirements.txt => Python packages to be installed  aml_config/config.json => Azure Machine Learning config  downloadmodel.py => Build step to retrieve the latest model