Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A pragmatic overview of Azure Cortana Suite

A pragmatic overview of Azure Cortana Suite

Avatar for Valdas Maksimavičius

Valdas Maksimavičius

April 11, 2017
Tweet

More Decks by Valdas Maksimavičius

Other Decks in Technology

Transcript

  1. About me Valdas Maksimavičius • MS Dynamics, .Net, Web, BI

    • Analytics Architect at Cognizant • Between Poland and Lithuania
  2. A pragmatic overview of Azure Cortana Suite 1. Project overview

    2. Storage options 3. Data preparation and analysis 4. Automation and orchestration 5. Deployment
  3. Before we start ... • PaaS • Release cycles (Preview,

    Available, On-demand) • No unified security mechanism • Help (Tech Evangelists, HiPo teams, support) • Pricing is not straightforward • Sometimes instead of fixing issues MS is baking new functionality
  4. General purpose storage • Azure Blob Storage • Stores any

    type of text or binary data • Locally / globally redundant / read-access globally redundant • Not optimized for analytics workloads • Azure Data Lake Storage • Locally redundant (multiple copies of data in one Azure region) • Role-based security and auditing • WebHDFS-compatible REST API
  5. NoSQL • Azure Document DB (PaaS) • DocumentDB exposes resources

    through a REST API • Create UDF / Trigger / Stored Procedures using JavaScript • Available MongoDB API • Tricky pricing (Requests units) • Any DB on VM (IaaS) • Deployment scripts available
  6. Query Azure Document DB using SQL [ { "givenName": "

    Katie" }, { "givenName": "Lisa"} ] SELECT c.givenName FROM Families f JOIN c IN f.children WHERE f.id = 'WakefieldFamily' ORDER BY f.address.city ASC { "id": "WakefieldFamily", "children": [ { "givenName": “Katie", "gender": "female", "grade": 1 }, { "givenName": "Lisa", "gender": "female", "grade": 8 } ], "address": {"county": "Manhattan", "city": "NY" } }
  7. SQL • Azure SQL (PaaS) • Good fit for new

    cloud-based apps • Databases of up to 1 TB • Firewall, Backups • Azure Data Warehouse • No data limit • Pause/Resume functionality • Any DB on VM (IaaS)
  8. Analytics Storage Microsoft Hadoop Stack Azure HDInsight R Server Local

    (HDFS) or Cloud (Azure Blob/Azure Data Lake Store)
  9. HDInsight cluster types • Hadoop • HBase • Storm •

    Spark • R Server • (preview) Kafka • (preview) Interactive Hive (LLAP Live Long and Process)
  10. HDInsight cluster customizations • PaaS or IaaS? • Bash scripts

    run during cluster provisioning • Hue, Giraph, R, Solr, Zeppelin, Hive libs, etc • Virtual Networks and firewalls
  11. HDInsight general info • 30 minutes to provision • No

    Pause, just Delete • External metastore for Hive or Oozie • Manual Hive Ad-hoc / ETL optimization required • Data and operational files saved in storage accounts
  12. HDInsight pricing Ex. D12_v2 - 4 cores, 28 GB RAM,

    200 GB disk - 0.405 Eur/h • Hadoop / Spark • Minimum 3 instances (Zookeeper for free) 1,430.00 Eur/Mo or ~350 Eur/working hours • HBase • As Hadoop/Spark + min 3 instances for Zookeeper 1,730.00 Eur/Mo or ~ 420 Eur/working hours
  13. Azure ML web services • Scoring + Training experiments •

    Batch + Request/Response • Default 20 concurrent requests per endpoint, can be increased to 200 • Cumbersome re-training • European datacenters less performant than USA based
  14. Azure ML pricing • 8.43 Eur per seat per month

    • 0.85 Eur per experimentation hour • Max experiment duration – 7 days • Web services • Standard S1 - 100,000 transactions, 25 hours, 10 web services – 85 Eur • Standard S3 – 50,000,000 transactions, 12,500 hours, 500 web services - 8500 Eur
  15. Azure Stream Analytics • Fully managed event-processing engine • Supported

    inputs: Event Hub, IoT Hub, Azure Blob Storage • Supported outputs: Azure Blob, Azure SQL, PowerBI, DocumentDB • Queries are expressed in a SQL-like query language • Development in Azure Portal or via Powershell (limited)
  16. Azure Stream Analytics - example query Two objectives 1. Store

    all messages to Azure Blob Storage 2. Display the count of messages in PowerBI SELECT * INTO [blobOutput] FROM [input] SELECT count(*) INTO [dashboardOutput] FROM [input] GROUP BY TumblingWindow(minute, 1)
  17. Azure Stream Analytics pricing • Price per streaming unit –

    0.102 Eur/hr (max throughput of a streaming unit 1 MB/s) • Plus data storage, data transfer, Event Hub’s, PowerBI subscription
  18. Azure Automation • Automates tasks that are performed in a

    cloud environment • For PowerShell fans • Example usage • Start / Stop clusters, VMs • Act on alerts • Clean-ups • Deployments (web-hooks)
  19. Azure Data Factory • PaaS orchestration service • Runs Hive

    / Pig / Spark / SQL SP / Copy / ML Scoring and Training • Multiple connectors: S3, Redshift, Cassandra, MongoDB, Teradata, SAP Hana, Salesforce • Provisions on-demand HDInsight clusters • Multi-tenant • JSON config files
  20. Azure Data Gateway • Enables moving data to/from an on-premises

    data store • Makes outbound HTTP-based connections to open internet • Single gateway is tied to only one Azure Data Factory
  21. Azure Data Factory pricing • Low frequency activities (no more

    than once in a day) 0.51 Eur per activity per month • High frequency activities (hourly, every 15 mins) 0.85 Eur per activity per month • Activities Involving Data Management Gateway Multiply by 2.5
  22. Deployment • Azure Management Certificate • Octopus, TeamCity, Jenkins, …

    • PowerShell ‘EM All • Azure-CLI struggles • Deployment templates available
  23. Azure Bot Service Templates: • Basic - a simple bot

    that uses dialogs to respond to user input. • Form - a bot that uses a guided conversation to collect user input. • Language understanding - A bot that uses natural language models (LUIS) to understand user intent. • Proactive - A bot that uses Azure Functions to alert bot users of events.
  24. Summary • Great for prototyping • Features are important, but

    don’t forget about security • Transaction, throughput, seat units – pricing headaches • Before using any PaaS search on Google AZURE_SERVICE_NAME + limitations [email protected]
  25. Azure Blob Storege vs Data Lake Store https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage Azure SQL

    vs Azure Warehouse http://www.jamesserra.com/archive/2016/08/azure-sql-database-vs-sql-data-warehouse/ Azure Document DB https://docs.microsoft.com/en-us/azure/documentdb/documentdb-introduction Azure Document DB SQL query https://docs.microsoft.com/en-us/azure/documentdb/documentdb-sql-query Azure HDInsight https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-introduction Azure Machine Learning https://docs.microsoft.com/en-us/azure/machine-learning/ Azure Machine Learning Blog http://blogs.technet.com/b/machinelearning Azure Stream Analytics https://docs.microsoft.com/en-us/azure/stream-analytics/ Azure Deployment Templates https://github.com/Azure/azure-quickstart-templates Azure Cognitive Services https://docs.microsoft.com/en-us/azure/cognitive-services/ Azure Bot Service https://docs.botframework.com/en-us/azure-bot-service/