Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A pragmatic overview of Azure Cortana Suite

A pragmatic overview of Azure Cortana Suite

Valdas Maksimavičius

April 11, 2017
Tweet

More Decks by Valdas Maksimavičius

Other Decks in Technology

Transcript

  1. About me Valdas Maksimavičius • MS Dynamics, .Net, Web, BI

    • Analytics Architect at Cognizant • Between Poland and Lithuania
  2. A pragmatic overview of Azure Cortana Suite 1. Project overview

    2. Storage options 3. Data preparation and analysis 4. Automation and orchestration 5. Deployment
  3. Before we start ... • PaaS • Release cycles (Preview,

    Available, On-demand) • No unified security mechanism • Help (Tech Evangelists, HiPo teams, support) • Pricing is not straightforward • Sometimes instead of fixing issues MS is baking new functionality
  4. General purpose storage • Azure Blob Storage • Stores any

    type of text or binary data • Locally / globally redundant / read-access globally redundant • Not optimized for analytics workloads • Azure Data Lake Storage • Locally redundant (multiple copies of data in one Azure region) • Role-based security and auditing • WebHDFS-compatible REST API
  5. NoSQL • Azure Document DB (PaaS) • DocumentDB exposes resources

    through a REST API • Create UDF / Trigger / Stored Procedures using JavaScript • Available MongoDB API • Tricky pricing (Requests units) • Any DB on VM (IaaS) • Deployment scripts available
  6. Query Azure Document DB using SQL [ { "givenName": "

    Katie" }, { "givenName": "Lisa"} ] SELECT c.givenName FROM Families f JOIN c IN f.children WHERE f.id = 'WakefieldFamily' ORDER BY f.address.city ASC { "id": "WakefieldFamily", "children": [ { "givenName": “Katie", "gender": "female", "grade": 1 }, { "givenName": "Lisa", "gender": "female", "grade": 8 } ], "address": {"county": "Manhattan", "city": "NY" } }
  7. SQL • Azure SQL (PaaS) • Good fit for new

    cloud-based apps • Databases of up to 1 TB • Firewall, Backups • Azure Data Warehouse • No data limit • Pause/Resume functionality • Any DB on VM (IaaS)
  8. Analytics Storage Microsoft Hadoop Stack Azure HDInsight R Server Local

    (HDFS) or Cloud (Azure Blob/Azure Data Lake Store)
  9. HDInsight cluster types • Hadoop • HBase • Storm •

    Spark • R Server • (preview) Kafka • (preview) Interactive Hive (LLAP Live Long and Process)
  10. HDInsight cluster customizations • PaaS or IaaS? • Bash scripts

    run during cluster provisioning • Hue, Giraph, R, Solr, Zeppelin, Hive libs, etc • Virtual Networks and firewalls
  11. HDInsight general info • 30 minutes to provision • No

    Pause, just Delete • External metastore for Hive or Oozie • Manual Hive Ad-hoc / ETL optimization required • Data and operational files saved in storage accounts
  12. HDInsight pricing Ex. D12_v2 - 4 cores, 28 GB RAM,

    200 GB disk - 0.405 Eur/h • Hadoop / Spark • Minimum 3 instances (Zookeeper for free) 1,430.00 Eur/Mo or ~350 Eur/working hours • HBase • As Hadoop/Spark + min 3 instances for Zookeeper 1,730.00 Eur/Mo or ~ 420 Eur/working hours
  13. Azure ML web services • Scoring + Training experiments •

    Batch + Request/Response • Default 20 concurrent requests per endpoint, can be increased to 200 • Cumbersome re-training • European datacenters less performant than USA based
  14. Azure ML pricing • 8.43 Eur per seat per month

    • 0.85 Eur per experimentation hour • Max experiment duration – 7 days • Web services • Standard S1 - 100,000 transactions, 25 hours, 10 web services – 85 Eur • Standard S3 – 50,000,000 transactions, 12,500 hours, 500 web services - 8500 Eur
  15. Azure Stream Analytics • Fully managed event-processing engine • Supported

    inputs: Event Hub, IoT Hub, Azure Blob Storage • Supported outputs: Azure Blob, Azure SQL, PowerBI, DocumentDB • Queries are expressed in a SQL-like query language • Development in Azure Portal or via Powershell (limited)
  16. Azure Stream Analytics - example query Two objectives 1. Store

    all messages to Azure Blob Storage 2. Display the count of messages in PowerBI SELECT * INTO [blobOutput] FROM [input] SELECT count(*) INTO [dashboardOutput] FROM [input] GROUP BY TumblingWindow(minute, 1)
  17. Azure Stream Analytics pricing • Price per streaming unit –

    0.102 Eur/hr (max throughput of a streaming unit 1 MB/s) • Plus data storage, data transfer, Event Hub’s, PowerBI subscription
  18. Azure Automation • Automates tasks that are performed in a

    cloud environment • For PowerShell fans • Example usage • Start / Stop clusters, VMs • Act on alerts • Clean-ups • Deployments (web-hooks)
  19. Azure Data Factory • PaaS orchestration service • Runs Hive

    / Pig / Spark / SQL SP / Copy / ML Scoring and Training • Multiple connectors: S3, Redshift, Cassandra, MongoDB, Teradata, SAP Hana, Salesforce • Provisions on-demand HDInsight clusters • Multi-tenant • JSON config files
  20. Azure Data Gateway • Enables moving data to/from an on-premises

    data store • Makes outbound HTTP-based connections to open internet • Single gateway is tied to only one Azure Data Factory
  21. Azure Data Factory pricing • Low frequency activities (no more

    than once in a day) 0.51 Eur per activity per month • High frequency activities (hourly, every 15 mins) 0.85 Eur per activity per month • Activities Involving Data Management Gateway Multiply by 2.5
  22. Deployment • Azure Management Certificate • Octopus, TeamCity, Jenkins, …

    • PowerShell ‘EM All • Azure-CLI struggles • Deployment templates available
  23. Azure Bot Service Templates: • Basic - a simple bot

    that uses dialogs to respond to user input. • Form - a bot that uses a guided conversation to collect user input. • Language understanding - A bot that uses natural language models (LUIS) to understand user intent. • Proactive - A bot that uses Azure Functions to alert bot users of events.
  24. Summary • Great for prototyping • Features are important, but

    don’t forget about security • Transaction, throughput, seat units – pricing headaches • Before using any PaaS search on Google AZURE_SERVICE_NAME + limitations [email protected]
  25. Azure Blob Storege vs Data Lake Store https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage Azure SQL

    vs Azure Warehouse http://www.jamesserra.com/archive/2016/08/azure-sql-database-vs-sql-data-warehouse/ Azure Document DB https://docs.microsoft.com/en-us/azure/documentdb/documentdb-introduction Azure Document DB SQL query https://docs.microsoft.com/en-us/azure/documentdb/documentdb-sql-query Azure HDInsight https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-introduction Azure Machine Learning https://docs.microsoft.com/en-us/azure/machine-learning/ Azure Machine Learning Blog http://blogs.technet.com/b/machinelearning Azure Stream Analytics https://docs.microsoft.com/en-us/azure/stream-analytics/ Azure Deployment Templates https://github.com/Azure/azure-quickstart-templates Azure Cognitive Services https://docs.microsoft.com/en-us/azure/cognitive-services/ Azure Bot Service https://docs.botframework.com/en-us/azure-bot-service/