Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Starting & Scaling your Data Science Project wi...

Starting & Scaling your Data Science Project with Azure by Salvatore Dino

You want to learn about Data Science and Machine Learning on Azure? You came to the right place. Salvatore, Cloud Solution Architect at Microsoft is going to show us how you can get started with Data Science on the Microsoft Azure Cloud today. He will give an overview about the services in Azure and what they are capable of. Topics include Machine Learning, Bots and Intelligence.

Speaker: Salvatore Dino, Microsoft
Salvatore Dino has an enterprise Data Warehousing & Business Intelligence background in various sectors such as Banking, Public Sector, Telco, Fashion and Ecommerce. Since January 2016 he works at Microsoft as a Cloud Solution Architect and covers hot Data and AI topics such as Modern Data Warehousing, Advanced Analytics with ML, IoT, Bots and AI.

Azure Zurich User Group

October 26, 2017
Tweet

More Decks by Azure Zurich User Group

Other Decks in Technology

Transcript

  1. APIs: 30 Azure Cognitive Services Vision, speech, language, knowledge, search,

    Bot Framework, etc. Drag and Drop: Azure ML Drag and drop, pre-built template Writing Code using managed services: Spark on HDInsight (pySpark), ML Server, H2O, etc. E2E Customization: Deep learning algos on Azure GPUs (TensorFlow, CNTK, Torch, etc.)
  2. Agenda: Microsoft Machine Learning & AI Portfolio When to use

    what? What engine(s) do you want to use? Deployment target Which experience do you want? Build your own or consume pre- trained models? Microsoft ML & AI products Build your own Azure Machine Learning Code first (On-prem) ML Server On- prem Hadoop SQL Server (cloud) AML services (Preview) SQL Server Spark Hadoop Azure Batch DSVM Azure Container Service Visual tooling (cloud) AML Studio Consume Cognitive services, bots
  3. Towards General AI AI-complete problems are hypothesized to include general

    computer vision, natural language understanding, and dealing with unexpected circumstances while solving any real world problem.
  4. Data cleaning & integration Feature Extraction & Engineering Model fitting

    & Selection Operationalization Embedding in Data Products Insight ML Data Science AI
  5. Supervised learning: Finding the mapping between inputs and outputs using

    correct values to “train” a model Unsupervised learning: Finding patterns in the input data
  6. Is this A or B? How much/many ? Is this

    weird? How is it organized? What should I do next? Classification Is this transaction legitimate or fraudulent? Will this customer churn or renew the contract? Will this machine part break or not? Regression How much revenue will this customer generate for us? How many of these items will we sell within the next 3 months? How long will production of this batch take? Anomaly Detection Is this pressure reading unusual? Is this combination of purchases very different from past purchases? Are these voltages normal for this season and time of day? Clustering Which viewers like the same kind of movies? What is a natural way to group these documents into 5 categories? During which weekday does this power station face similar demand? Recommenders Should I adjust the temperature higher, lower or leave it as is? How many shares of this stock should I buy right now? Given a yellow light, should I accelerate, brake or maintain same speed?
  7. product recommendations intelligent search routing robotics ad placement predictive maintenance

    image, video recognition sentiment analysis text comprehension natural language processing robotics bots augmented reality predictive maintenance Retail Financial services Healthcare Manufacturing loyalty programs customer acquisition pricing strategy supply chain mgnt customer churn fraud detection risk & compliance cross-sell & upsell personalization bill collection operational efficiency patient demographics pay for performance demand forecasting pricing strategy supply chain optimization predictive maintenance remote monitoring
  8. Cognitive Services and Bots What engine(s) do you want to

    use? Deployment target Which experience do you want? Build your own or consume pre- trained models? Microsoft ML & AI products Build your own Azure Machine Learning Code first (On-prem) ML Server On- prem Hadoop SQL Server (cloud) AML services (Preview) SQL Server Spark Hadoop Azure Batch DSVM Azure Container Service Visual tooling (cloud) AML Studio Consume Cognitive services, bots
  9. Recommendations Knowledge Exploration Entity Linking Academic Knowledge QnA Maker Custom

    Decision Bing Entity Search LUIS Web Language Model Text Analytics Linguistic Analysis Speaker Recognition Custom Speech Emotion Video Custom Vision Video Indexer
  10. AML Studio What engine(s) do you want to use? Deployment

    target Which experience do you want? Build your own or consume pre- trained models? Microsoft ML & AI products Build your own Azure Machine Learning Code first (On-prem) ML Server On- prem Hadoop SQL Server (cloud) AML services (Preview) SQL Server Spark Hadoop Azure Batch DSVM Azure Container Service Visual tooling (cloud) AML Studio Consume Cognitive services, bots
  11. Azure Machine Learning Studio Platform for emerging data scientists to

    graphically build and deploy experiments • Rapid experiment composition • > 100 easily configured modules for data prep, training, evaluation • Extensibility through R & Python • Serverless training and deployment Some numbers: • 100’s of thousands of deployed models serving billions of requests
  12. Data Science Virtual Machine (DSVM) What engine(s) do you want

    to use? Deployment target Which experience do you want? Build your own or consume pre- trained models? Microsoft ML & AI products Build your own Azure Machine Learning Code first (On-prem) ML Server On- prem Hadoop SQL Server (cloud) AML services (Preview) SQL Server Spark Hadoop Azure Batch DSVM Azure Container Service Visual tooling (cloud) AML Studio Consume Cognitive services, bots
  13. Pre-Configured environments in the cloud for Data Science & AI

    Modeling, Development & Deployment. Data Science Virtual Machines (DSVM)
  14. Editions Data Science Virtual Machines (DSVM) DSVM – Windows Server

    2016 DSVM – Linux – Ubuntu Deep Learning Virtual Machines
  15. The Deep Learning VM Windows Server 2016 and Ubuntu releases

    Use NC class Azure VMs with GPUs Great for Deep Learning workloads Physical Card = 2 x NVidia TESLA GPUs
  16. • Local tools • Local Debug • Faster experimentation Single

    VM Development • Larger VMs • GPU Scale Up • Multi Node • Remote Spark • Batch Nodes • VM Scale Sets Scale Out
  17. ML on Hadoop What engine(s) do you want to use?

    Deployment target Which experience do you want? Build your own or consume pre- trained models? Microsoft ML & AI products Build your own Azure Machine Learning Code first (On-prem) ML Server On- prem Hadoop SQL Server (cloud) AML services (Preview) SQL Server Spark Hadoop Azure Batch DSVM Azure Container Service Visual tooling (cloud) AML Studio Consume Cognitive services, bots
  18. HDInsight: fully managed Hadoop and Spark solutions on Microsoft Azure

    with 99.9% SLA. Partner with Hortonworks, 100% open source Create a cluster in just a few minutes Save 63% of your cost compared with on-prem Hadoop cluster Scaling compute and storage separately
  19. • Provisions Azure compute resources with Spark installed and configured.

    • Data is stored in Azure Blob storage (wasb://) or Azure Data Lake Store (adl://)
  20.  Enough data + simple algorithm > few data +

    complex algorithm  Spark, as a Big Data framework, fits machine learning tasks because the in-memory cache and computation
  21.  100% compatible with open source R  Ability to

    parallelize any R function  Wide range of scalable and distributed “rx” pre-fixed functions in “RevoScaleR” package.  Simplify E2E solution R Server Operationalization (DeployR)  Switch different context easily for debugging purpose
  22.  scikit-learn – powerful ML library on single machine 

    PySpark + MLlib – ideal choice for distributed ML
  23.  Spark fits machine learning tasks because the in-memory cache

    and computation  Doing Deep Learning on Big Data?
  24. • Allows to write deep learning applications as standard Spark

    programs https://github.com/hdinsight/BigDLonHDInsight
  25. Azure Data Factory (data ingestion) Data Lake Store or BLOB

    storage (Data storage) Distributed model training: Spark on HDInsight PowerBI (visualization) On-prem data source On-Prem Scoring On-prem experimentation Data Factory (data ingestion)
  26. Azure Data Lake Store or BLOB storage (Data storage) Distributed

    model training: Spark on HDInsight PowerBI (visualization) Experimentation Azure Machine Learning Microsoft R Server Azure Data Science VM Model operationalization (Scoring) DSVM & AML
  27. Example of DL Deployment using Azure Containers and Azure Batch

    What engine(s) do you want to use? Deployment target Which experience do you want? Build your own or consume pre- trained models? Microsoft ML & AI products Build your own Azure Machine Learning Code first (On-prem) ML Server On- prem Hadoop SQL Server (cloud) AML services (Preview) SQL Server Spark Hadoop Azure Batch DSVM Azure Container Service Visual tooling (cloud) AML Studio Consume Cognitive services, bots
  28. Challenge • Traditional power line inspection services are costly •

    Demand for low cost image scoring and support for multiple concurrent customers • Needed powerful AI to execute on a drone solution Solution • Deep learning to analyze multiple streaming data feeds • Azure GPUs support Single Shot multibox detectors • Reliable, consistent, and highly elastic scalability with Azure Batch Shipyards Drone-based electric grid inspector powered by deep learning https://www.youtube.com/watch?v=8kuPPYk1SOo
  29. The new Azure Machine Learning Services What engine(s) do you

    want to use? Deployment target Which experience do you want? Build your own or consume pre- trained models? Microsoft ML & AI products Build your own Azure Machine Learning Code first (On-prem) ML Server On- prem Hadoop SQL Server (cloud) AML services (Preview) SQL Server Spark Hadoop Azure Batch DSVM Azure Container Service Visual tooling (cloud) AML Studio Consume Cognitive services, bots
  30. Accelerating adoption of AI by developers (consuming models) Rise of

    hybrid training and scoring scenarios Push scoring/inference to the event (edge, cloud, on-prem) Some developers moving into deep learning as non-traditional path to DS / AI dev Growth of diverse hardware arms race across all form factors (CPU / GPU / FPGA / ASIC / device) Data prep Model deployment & management Model lineage & auditing Explain-ability D ATA S C I E N C E & A I C H A L L E N G E S K E Y T R E N D S
  31. Apps + insights Social LOB Graph IoT Image CRM INGEST

    STORE PREP & TRAIN MODEL & SERVE Data orchestration and monitoring Data lake and storage Hadoop/Spark/SQL and ML . IoT Azure Machine Learning T H E A I D E V E L O P M E N T L I F E C Y C L E
  32. Notebooks IDEs Azure Machine Learning Workbench VS Code Tools for

    AI N E W C A PA B I L I T I E S Experimentation and Model Management Services AZURE MACHINE LEARNING SERVICES Spark SQL Server Virtual machines GPUs Container services SQL Server Machine Learning Server ON-PREMISES EDGE Azure IoT Edge TRAIN & DEPLOY OPTIONS AZURE
  33. Local machine Scale up to DSVM Scale out with Spark

    on HDInsight Azure Batch AI (Coming Soon) ML Server Experiment Everywhere A ZURE ML EXPERIMENTATION Command line tools IDEs Notebooks in Workbench VS Code Tools for AI
  34. Manage project dependencies Manage training jobs locally, scaled-up or scaled-out

    Git based checkpointing and version control Service side capture of run metrics, output logs and models Use your favorite IDE, and any framework Experimentation service U S E T H E M O S T P O P U L A R I N N O VAT I O N S U S E A N Y TO O L U S E A N Y F R A M E W O R K O R L I B R A R Y
  35. DOCKER Single node deployment (cloud/on-prem) Azure Container Service Azure IoT

    Edge Microsoft ML Server Spark clusters SQL Server Deploy Everywhere A ZURE ML MODEL MANAGEMENT
  36. Deployment and management of models as HTTP services Container-based hosting

    of real time and batch processing Management and monitoring through Azure Application Insights First class support for SparkML, Python, Cognitive Toolkit, TF, R, extensible to support others (Caffe, MXnet) Service authoring in Python Manage models
  37. VS Code extension with deep integration to Azure ML End

    to end development environment, from new project through training Support for remote training Job management On top of all of the goodness of VS Code (Python, Jupyter, Git, etc) VS Code Tools for AI
  38. Windows and Mac based companion for AI development Full environment

    set up (Python, Jupyter, etc) Embedded notebooks Run History and Comparison experience New data wrangling tools What Is It?
  39. AI Powered Data Wrangling Rapidly sample, understand, and prep data

    Leverage PROSE and more for intelligent, data prep by example Extend/customize transforms and featurization through Python Generate Python and Pyspark for execution at scale
  40. SQL Server (SQL DB) ML Services What engine(s) do you

    want to use? Deployment target Which experience do you want? Build your own or consume pre- trained models? Microsoft ML & AI products Build your own Azure Machine Learning Code first (On-prem) ML Server On- prem Hadoop SQL Server (cloud) AML services (Preview) SQL Server Spark Hadoop Azure Batch DSVM Azure Container Service Visual tooling (cloud) AML Studio Consume Cognitive services, bots
  41. Reduce or eliminate data movement with in-database analytics Operationalize machine

    learning models Get enterprise scale, performance, and security
  42. Regular Database + App Intelligence Database + App Application +

    Intelligence Database Application Intelligence + Database VS
  43. Eliminate data movement Operationalize ML scripts and models Enterprise grade

    performance and scale SQL Transformations Relational data Analytics library
  44. Any R/Python IDE Data Scientist Workstation SQL Server Pull Data

    1 train <- sqlQuery(connection, “select * from nyctaxi_sample”) model <- glm(formula, train) 3 Model Output 2 Execution
  45. Any R/Python IDE Data Scientist Workstation rx* output 3 Execution

    2 SQL Server 2017 SQL Server R/Python Runtime Machine Learning Services Script 1 cc <- RxInSqlServer( connectionString, computeContext) rxLogit(formula, cc) Model or Predictions 4
  46. execute sp_execute_external_script @language = N'R' , @script = N' x

    <- as.matrix(InputDataSet); y <- array(dim1:dim2); OutputDataSet <- as.data.frame(x %*% y);' , @input_data_1 = N'SELECT [Col1] from MyData;' , @params = N'@dim1 int, @dim2 int' , @dim1 = 12, @dim2 = 15 WITH RESULT SETS (([Col1] int, [Col2] int, [Col3] int, [Col4] int)); Choices are ‘R’ or ‘Python’ script. Use a @var or read from a R file or Python file Input data for script. Can be any T-SQL SELECT. Traceable Parameters for script. OUTPUT supported. varbinary(max) used for trained models Result set binding (Optional) Messages can also be returned including STDOUT and STDERR R Primer Python Primer R dataframe or Python Pandas dataframe
  47. SQL Server and Machine Learning @ Scale Jack Henry A

    leading provider for banking solutions for credit unions across Americas In-memory OLTP ColumnStore Age, Original Balance, Interest Rate, Loan Remaining Months, Credit Score 20M Vehicle Loans PowerBI Dashboard In-Database Analytics at Scale R Business User Prepare for analytics Store Predictions Visualize 1 million predictions per second using SQL-R
  48. Deploy predictive analytics Demo Develop Train Deploy Consume Develop, explore

    and experiment in your favorite IDE Train models with sp_execute_external_ script and save the models in database Deploy your ML scripts with sp_execute_external_ script and predict using the models Make your app/reports intelligent by consuming predictions
  49. Application exec sp_execute_external_script @language = ‘Python’ , @script = --

    Python code -- The stored procedure contains R or Python code and executes in-database Application Developer - Model Operationalization Stored Proc call Results 1 3 Execution SQL Server 2 R/Python Runtime Machine Learning Services
  50. R Services ML Services SQL Server Machine Learning Services SQL

    Server Developer Tutorials SSMS Reports for ML Services ML cheat sheet https://docs.microsoft.com/en-us/sql/advanced-analytics/sql-native-scoring https://docs.microsoft.com/en-us/sql/advanced-analytics/r/how-to-do-realtime-scoring https://docs.microsoft.com/en-us/sql/advanced-analytics/real-time-scoring