Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open source & Azure Data -- now and next | Open Azure Day 2020 | Sunil Kamath

Open source & Azure Data -- now and next | Open Azure Day 2020 | Sunil Kamath

This discussion with Sunil Kamath explores the Azure Data mission and our commitment to make it easier to build and deploy your apps in Azure on top of open source technologies such as Kubernetes and Postgres. Sunil will cover how open source fits into the Azure Data team’s efforts and how the Azure Data team is working on key technologies (e.g., MySQL, Redis, CosmosDB) to help you be productive and innovate. You’ll hear examples of how we’re working with partners and customers to make an impact, and how we’re working to make Azure the best place to run open source. You’ll learn about several recent projects, bringing real-world examples of how our customers are responding to an ever-changing environment and how Azure Data is enabling them.

Azure Database for PostgreSQL

November 18, 2020
Tweet

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. Open source & Azure Data— now and next Sunil Kamath,

    Principal Director, Program Management, Azure Open Source Databases, Azure Data
  2. Microsoft is all-in on open source. We have been on

    a journey with open source, and today we are active in the open source ecosystem, we contribute to open source projects, and some of our most vibrant developer tools and frameworks are open source.” —Satya Nadella, CEO Microsoft “
  3. Deliver the one data platform that empowers the world to

    make sense of all data anywhere, responsibly, at any scale Our mission at Azure Data
  4. Contribute • Apache® Spark™, Postgres, ONNX Enable • Making open

    source available as a service • So you can focus on your application • Partnerships = key
  5. Contribute • Apache® Spark™, Postgres, ONNX Enable • Making open

    source available as a service • So you can focus on your application • Partnerships = key Integrate with • Running on top of open source • APIs to get access to your data
  6. Highlights of recent contributions to Apache Spark 2020 Open sourced

    Hyperspace, an indexing subsystem for Apache® Spark™ 2020 MASC native Apache® Spark™ connector for Apache® Accumulo 2019 Open source Data Accelerator 2019 Invest in Databricks 2019 Announce .NET for Apache® Spark™
  7. Analyzing the Limits of Connection Scalability in Postgres One common

    challenge with Postgres for those of you who manage busy Postgres databases, and those of you who foresee being in that situation, is that Postgres does not handle large numbers of connections particularly well. While it is possible to have a few thousand established connections without running into problems, there are some real and hard-to-avoid problems. Since joining Microsoft last year in the Azure Database for PostgreSQL team—where I work on open source Postgres—I have spent a lot of time analyzing and addressing some of the issues with connection scalability in Postgres. In this post I will explain why I think it is important to improve Postgres' handling of large number of connections. Followed by an analysis of the different limiting aspects to connection scalability in Postgres. aka.ms/pg-limits-connection-scalability
  8. What’s new in pg_auto_failover 1.4 for Postgres high availability Postgres

    is an amazing RDBMS implementation. Postgres is open source and it’s one of the most standard-compliant SQL implementations that you will find (if not the most compliant.) Postgres is packed with extensions to the standard, and it makes writing and deploying your applications simple and easy. After all, Postgres has your back and manages all the complexities of concurrent transactions for you. In this post I am excited to announce that a new version of pg_auto_failover has been released, pg_auto_failover 1.4. pg_auto_failover is an extension to Postgres built for high availability (HA), that monitors and manages failover for Postgres clusters. Our guiding principles from day one have been simplicity, and correctness. Since pg_auto_failover is open source, you can find it on GitHub and it’s easy to try out. Let’s walk through what’s new in pg_auto_failover, and let’s explore the new capabilities you can take advantage of. aka.ms/blog-pg-auto-failover-1.4
  9. Built-in optimizations that deliver up to 17X faster inferencing and

    up to 1.4X faster training Support for a variety of frameworks, operating systems and hardware platforms Used in Office 365, Visual Studio and Bing, delivering over 20 billion inferences every day
  10. Enabling you to use open source as a service MySQL

    Redis™ Databricks™ & Apache Spark Hadoop, Apache Spark, Kafka, more Postgres
  11. North Europe Region – East US West US 2 AZ1

    AZ2 AZ3 Locally-redundant backup storage Flexible Server Architecture Flexible Server Availability Zone 1 Availability Zone 2 Availability Zone 3 Linux VM Azure VM AKS App Service Premium Storage Data, Logs MySQL
  12. Redis Enterprise + Azure Cache for Redis Microsoft and Redis

    Labs partnered to create the first native integration between Redis Enterprise technology and a major cloud platform
  13. Powerful Redis Modules Enjoy access to advanced Redis modules with

    search, timeseries, and data analysis functionality 10X Larger Cache Sizes Run on NVMe flash storage for caches up to 13 TB in size, all at a lower price per GB Enhanced Availability Run Redis with confidence by experiencing uptime of up to 99.99 percent Redis Enterprise + Azure Cache for Redis
  14. What is Azure Databricks? A fast, easy and collaborative Apache®

    Spark™ based analytics platform optimized for Azure
  15. Designed in collaboration with the founders of Apache® Spark™ •

    One-click set up; streamlined workflows • Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. • Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) • Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)
  16. Azure HDInsight: Hadoop, Spark, Kafka, & more Microsoft supported distribution

    of Apache Hadoop and Spark. Full compatibility with latest version of Hadoop
  17. Simplified developer experience • Simple end-to-end deployment • Fully compatible

    with community MySQL & Postgres • Easy cost optimization with Stop/Start & Burstable servers Build resilient apps across availability zones • Zone redundant HA • Fast failover with zero data loss • Co-locate app & database in same zone Maximum control for your databases • Network isolation with VNET integration • More server parameters for fine-grained tuning • Custom maintenance windows About Flexible Server (Preview) & Postgres
  18. APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns

    company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … How Citus distributes queries across the database cluster
  19. “It was a whole different environment once we moved to

    Hyperscale (Citus). Queries that often took up to 10 minutes with the old system are now processed instantaneously.” - Sami Räsänen: Product Owner and Team Lead, HSL Read more at aka.ms/story-azure-postgres-hsl
  20. Distributed PostgreSQL is a game changer. We can support more

    than 6M queries every day, on 2 PB of data. With Citus, response times for 75% of queries are less than 0.2 seconds.” “
  21. Architecting petabyte-scale analytics by scaling out Postgres on Azure with

    the Citus extension aka.ms/blog-petabyte-scale-analytics
  22. Elastic scale PostgreSQL Hyperscale Scale up, scale out on demand

    Automation at scale Always current Self-service provisioning in seconds Automated updates Evergreen SQL Managed Instance Unified management Single view for on-prem, clouds, and edge Consistent tools and workflows Built-in monitoring and security Azure Arc enabled data services Azure data services in your datacenter, multi-cloud, and edge Connected or Disconnected
  23. Building a Hybrid data platform with Azure Arc enabled data

    services Travis Wright, Microsoft aka.ms/video-azure-arc-ignite20
  24. Big Data industry best practices – a deep look into

    Cloudera Data Platform Ram Venkatesh, Jonathan Hsieh-Demo, Priyank Patel / Cloudera Useful data talks at Open Azure Day Building resilient, mission- critical applications with Azure Database for PostgreSQL Sridhar Ranganathan Architecting secure enterprise- ready solutions for the cloud with Azure and MySQL Andrea Lam Open Source at Microsoft John Gossman, Stormy Peters How Minecraft Realms migrated to MySQL on Azure for improved gameplay, better interoperability, & cost efficiency Amol Bhatnagar Build Mission Critical Apps with the New Azure Cache for Redis, Enterprise Tiers Amiram Mizne / Redis Labs
  25. “If I have seen further it is by standing on

    the shoulders of giants.” —Isaac Newton
  26. © Copyright Microsoft Corporation. All rights reserved. danke schön dank

    u merci teşekkürler thank you grazie gracias tack @kamathsun @AzureDBMySQL @AzureDBPostgres Sunil Kamath