Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open source & Azure Data -- now and next | Open Azure Day 2020 | Sunil Kamath

Open source & Azure Data -- now and next | Open Azure Day 2020 | Sunil Kamath

This discussion with Sunil Kamath explores the Azure Data mission and our commitment to make it easier to build and deploy your apps in Azure on top of open source technologies such as Kubernetes and Postgres. Sunil will cover how open source fits into the Azure Data team’s efforts and how the Azure Data team is working on key technologies (e.g., MySQL, Redis, CosmosDB) to help you be productive and innovate. You’ll hear examples of how we’re working with partners and customers to make an impact, and how we’re working to make Azure the best place to run open source. You’ll learn about several recent projects, bringing real-world examples of how our customers are responding to an ever-changing environment and how Azure Data is enabling them.

143117954187136b825331f24da0e201?s=128

Azure Postgres

November 18, 2020
Tweet

Transcript

  1. Open source & Azure Data— now and next Sunil Kamath,

    Principal Director, Program Management, Azure Open Source Databases, Azure Data
  2. Microsoft is all-in on open source. We have been on

    a journey with open source, and today we are active in the open source ecosystem, we contribute to open source projects, and some of our most vibrant developer tools and frameworks are open source.” —Satya Nadella, CEO Microsoft “
  3. I imagine many of you each have your own journey

    with open source
  4. First worked on open source in Toronto, back in 2002

  5. Helped to launch MySQL, Postgres, & MariaDB open source database

    services on Azure
  6. Here today to represent our entire Azure Data team

  7. Deliver the one data platform that empowers the world to

    make sense of all data anywhere, responsibly, at any scale Our mission at Azure Data
  8. Contribute • Apache® Spark™, Postgres, ONNX

  9. Contribute • Apache® Spark™, Postgres, ONNX Enable • Making open

    source available as a service • So you can focus on your application • Partnerships = key
  10. Contribute • Apache® Spark™, Postgres, ONNX Enable • Making open

    source available as a service • So you can focus on your application • Partnerships = key Integrate with • Running on top of open source • APIs to get access to your data
  11. Door #1 - Contributing & giving back to the community

  12. Contributing & giving back to the community Apache® Spark™ Postgres

    ONNX
  13. Highlights of recent contributions to Apache Spark 2020 Open sourced

    Hyperspace, an indexing subsystem for Apache® Spark™ 2020 MASC native Apache® Spark™ connector for Apache® Accumulo 2019 Open source Data Accelerator 2019 Invest in Databricks 2019 Announce .NET for Apache® Spark™
  14. Postgres

  15. Microsoft Azure Welcomes PostgreSQL Committers aka.ms/blog-postgres-committers

  16. Analyzing the Limits of Connection Scalability in Postgres One common

    challenge with Postgres for those of you who manage busy Postgres databases, and those of you who foresee being in that situation, is that Postgres does not handle large numbers of connections particularly well. While it is possible to have a few thousand established connections without running into problems, there are some real and hard-to-avoid problems. Since joining Microsoft last year in the Azure Database for PostgreSQL team—where I work on open source Postgres—I have spent a lot of time analyzing and addressing some of the issues with connection scalability in Postgres. In this post I will explain why I think it is important to improve Postgres' handling of large number of connections. Followed by an analysis of the different limiting aspects to connection scalability in Postgres. aka.ms/pg-limits-connection-scalability
  17. Citus is an open source extension to Postgres

  18. Transforms Postgres into a distributed database

  19. aka.ms/citus

  20. What’s new in pg_auto_failover 1.4 for Postgres high availability Postgres

    is an amazing RDBMS implementation. Postgres is open source and it’s one of the most standard-compliant SQL implementations that you will find (if not the most compliant.) Postgres is packed with extensions to the standard, and it makes writing and deploying your applications simple and easy. After all, Postgres has your back and manages all the complexities of concurrent transactions for you. In this post I am excited to announce that a new version of pg_auto_failover has been released, pg_auto_failover 1.4. pg_auto_failover is an extension to Postgres built for high availability (HA), that monitors and manages failover for Postgres clusters. Our guiding principles from day one have been simplicity, and correctness. Since pg_auto_failover is open source, you can find it on GitHub and it’s easy to try out. Let’s walk through what’s new in pg_auto_failover, and let’s explore the new capabilities you can take advantage of. aka.ms/blog-pg-auto-failover-1.4
  21. ONNX

  22. Built-in optimizations that deliver up to 17X faster inferencing and

    up to 1.4X faster training Support for a variety of frameworks, operating systems and hardware platforms Used in Office 365, Visual Studio and Bing, delivering over 20 billion inferences every day
  23. Door #2 - Enabling you to use open source, as

    a service
  24. Enabling you to use open source as a service MySQL

    Redis™ Databricks™ & Apache Spark Hadoop, Apache Spark, Kafka, more Postgres
  25. None
  26. aka.ms/azure-mysql

  27. aka.ms/tbd-by-andrea aka.ms/blog-mysql-what-is-flexible

  28. North Europe Region – East US West US 2 AZ1

    AZ2 AZ3 Locally-redundant backup storage Flexible Server Architecture Flexible Server Availability Zone 1 Availability Zone 2 Availability Zone 3 Linux VM Azure VM AKS App Service Premium Storage Data, Logs MySQL
  29. None
  30. Migrating Minecraft Realms from AWS to Azure Database for MySQL,

    with DMS
  31. Redis ™

  32. Redis Enterprise + Azure Cache for Redis Microsoft and Redis

    Labs partnered to create the first native integration between Redis Enterprise technology and a major cloud platform
  33. Powerful Redis Modules Enjoy access to advanced Redis modules with

    search, timeseries, and data analysis functionality 10X Larger Cache Sizes Run on NVMe flash storage for caches up to 13 TB in size, all at a lower price per GB Enhanced Availability Run Redis with confidence by experiencing uptime of up to 99.99 percent Redis Enterprise + Azure Cache for Redis
  34. None
  35. What is Azure Databricks? A fast, easy and collaborative Apache®

    Spark™ based analytics platform optimized for Azure
  36. Designed in collaboration with the founders of Apache® Spark™ •

    One-click set up; streamlined workflows • Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. • Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) • Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)
  37. Azure HDInsight

  38. Azure HDInsight: Hadoop, Spark, Kafka, & more Microsoft supported distribution

    of Apache Hadoop and Spark. Full compatibility with latest version of Hadoop
  39. Real-time data processing & intelligent insights for taxi hailing and

    food delivery company
  40. Postgres The World’s Most Advanced Open Source Relational Database

  41. aka.ms/azure-postgres

  42. aka.ms/tbd-by-andrea aka.ms/blog-pg-what-is-flexible

  43. Simplified developer experience • Simple end-to-end deployment • Fully compatible

    with community MySQL & Postgres • Easy cost optimization with Stop/Start & Burstable servers Build resilient apps across availability zones • Zone redundant HA • Fast failover with zero data loss • Co-locate app & database in same zone Maximum control for your databases • Network isolation with VNET integration • More server parameters for fine-grained tuning • Custom maintenance windows About Flexible Server (Preview) & Postgres
  44. Citus extension to Postgres

  45. Hyperscale (Citus) is now available as a built-in deployment option

    in Azure Database for PostgreSQL
  46. SELECT create_distributed_table( 'table_name', 'distribution_column');

  47. APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns

    company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … How Citus distributes queries across the database cluster
  48. “It was a whole different environment once we moved to

    Hyperscale (Citus). Queries that often took up to 10 minutes with the old system are now processed instantaneously.” - Sami Räsänen: Product Owner and Team Lead, HSL Read more at aka.ms/story-azure-postgres-hsl
  49. Distributed PostgreSQL is a game changer. We can support more

    than 6M queries every day, on 2 PB of data. With Citus, response times for 75% of queries are less than 0.2 seconds.” “
  50. Architecting petabyte-scale analytics by scaling out Postgres on Azure with

    the Citus extension aka.ms/blog-petabyte-scale-analytics
  51. Kubernetes

  52. Elastic scale PostgreSQL Hyperscale Scale up, scale out on demand

    Automation at scale Always current Self-service provisioning in seconds Automated updates Evergreen SQL Managed Instance Unified management Single view for on-prem, clouds, and edge Consistent tools and workflows Built-in monitoring and security Azure Arc enabled data services Azure data services in your datacenter, multi-cloud, and edge Connected or Disconnected
  53. Building a Hybrid data platform with Azure Arc enabled data

    services Travis Wright, Microsoft aka.ms/video-azure-arc-ignite20
  54. Door #3 – Integrate with open source

  55. Integrate with open source Cassandra & Gremlin

  56. Gremlin Cassandra

  57. Large retailer uses Gremlin API to ingest large volumes of

    localization & inventory data
  58. Spark Postgres Citus ONNX MySQL Redis Hadoop Kafka Kubernetes Gremlin

    Cassandra
  59. So now what?

  60. Big Data industry best practices – a deep look into

    Cloudera Data Platform Ram Venkatesh, Jonathan Hsieh-Demo, Priyank Patel / Cloudera Useful data talks at Open Azure Day Building resilient, mission- critical applications with Azure Database for PostgreSQL Sridhar Ranganathan Architecting secure enterprise- ready solutions for the cloud with Azure and MySQL Andrea Lam Open Source at Microsoft John Gossman, Stormy Peters How Minecraft Realms migrated to MySQL on Azure for improved gameplay, better interoperability, & cost efficiency Amol Bhatnagar Build Mission Critical Apps with the New Azure Cache for Redis, Enterprise Tiers Amiram Mizne / Redis Labs
  61. Stephen Hawking Inspires Developers Worldwide aka.ms/video-hawking-inspires-devs

  62. “If I have seen further it is by standing on

    the shoulders of giants.” —Isaac Newton
  63. © Copyright Microsoft Corporation. All rights reserved. danke schön dank

    u merci teşekkürler thank you grazie gracias tack @kamathsun @AzureDBMySQL @AzureDBPostgres Sunil Kamath