Open source & Azure Data -- now and next | Open Azure Day 2020

Slide 1

Slide 1 text

Open source & Azure Data— now and next Sunil Kamath, Principal Director, Program Management, Azure Open Source Databases, Azure Data

Slide 2

Slide 2 text

Microsoft is all-in on open source. We have been on a journey with open source, and today we are active in the open source ecosystem, we contribute to open source projects, and some of our most vibrant developer tools and frameworks are open source.” —Satya Nadella, CEO Microsoft “

Slide 3

Slide 3 text

I imagine many of you each have your own journey with open source

Slide 4

Slide 4 text

First worked on open source in Toronto, back in 2002

Slide 5

Slide 5 text

Helped to launch MySQL, Postgres, & MariaDB open source database services on Azure

Slide 6

Slide 6 text

Here today to represent our entire Azure Data team

Slide 7

Slide 7 text

Deliver the one data platform that empowers the world to make sense of all data anywhere, responsibly, at any scale Our mission at Azure Data

Slide 8

Slide 8 text

Contribute • Apache® Spark™, Postgres, ONNX

Slide 9

Slide 9 text

Contribute • Apache® Spark™, Postgres, ONNX Enable • Making open source available as a service • So you can focus on your application • Partnerships = key

Slide 10

Slide 10 text

Contribute • Apache® Spark™, Postgres, ONNX Enable • Making open source available as a service • So you can focus on your application • Partnerships = key Integrate with • Running on top of open source • APIs to get access to your data

Slide 11

Slide 11 text

Door #1 - Contributing & giving back to the community

Slide 12

Slide 12 text

Contributing & giving back to the community Apache® Spark™ Postgres ONNX

Slide 13

Slide 13 text

™

Slide 14

Slide 14 text

Highlights of recent contributions to Apache Spark 2020 Open sourced Hyperspace, an indexing subsystem for Apache® Spark™ 2020 MASC native Apache® Spark™ connector for Apache® Accumulo 2019 Open source Data Accelerator 2019 Invest in Databricks 2019 Announce .NET for Apache® Spark™

Slide 15

Slide 15 text

Postgres

Slide 16

Slide 16 text

Microsoft Azure Welcomes PostgreSQL Committers aka.ms/blog-postgres-committers

Slide 17

Slide 17 text

Analyzing the Limits of Connection Scalability in Postgres One common challenge with Postgres for those of you who manage busy Postgres databases, and those of you who foresee being in that situation, is that Postgres does not handle large numbers of connections particularly well. While it is possible to have a few thousand established connections without running into problems, there are some real and hard-to-avoid problems. Since joining Microsoft last year in the Azure Database for PostgreSQL team—where I work on open source Postgres—I have spent a lot of time analyzing and addressing some of the issues with connection scalability in Postgres. In this post I will explain why I think it is important to improve Postgres' handling of large number of connections. Followed by an analysis of the different limiting aspects to connection scalability in Postgres. aka.ms/pg-limits-connection-scalability

Slide 18

Slide 18 text

Citus is an open source extension to Postgres

Slide 19

Slide 19 text

Transforms Postgres into a distributed database

Slide 20

Slide 20 text

aka.ms/citus

Slide 21

Slide 21 text

What’s new in pg_auto_failover 1.4 for Postgres high availability Postgres is an amazing RDBMS implementation. Postgres is open source and it’s one of the most standard-compliant SQL implementations that you will find (if not the most compliant.) Postgres is packed with extensions to the standard, and it makes writing and deploying your applications simple and easy. After all, Postgres has your back and manages all the complexities of concurrent transactions for you. In this post I am excited to announce that a new version of pg_auto_failover has been released, pg_auto_failover 1.4. pg_auto_failover is an extension to Postgres built for high availability (HA), that monitors and manages failover for Postgres clusters. Our guiding principles from day one have been simplicity, and correctness. Since pg_auto_failover is open source, you can find it on GitHub and it’s easy to try out. Let’s walk through what’s new in pg_auto_failover, and let’s explore the new capabilities you can take advantage of. aka.ms/blog-pg-auto-failover-1.4

Slide 22

Slide 22 text

ONNX

Slide 23

Slide 23 text

Built-in optimizations that deliver up to 17X faster inferencing and up to 1.4X faster training Support for a variety of frameworks, operating systems and hardware platforms Used in Office 365, Visual Studio and Bing, delivering over 20 billion inferences every day

Slide 24

Slide 24 text

Door #2 - Enabling you to use open source, as a service

Slide 25

Slide 25 text

Enabling you to use open source as a service MySQL Redis™ Databricks™ & Apache Spark Hadoop, Apache Spark, Kafka, more Postgres

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

aka.ms/azure-mysql

Slide 28

Slide 28 text

aka.ms/tbd-by-andrea aka.ms/blog-mysql-what-is-flexible

Slide 29

Slide 29 text

North Europe Region – East US West US 2 AZ1 AZ2 AZ3 Locally-redundant backup storage Flexible Server Architecture Flexible Server Availability Zone 1 Availability Zone 2 Availability Zone 3 Linux VM Azure VM AKS App Service Premium Storage Data, Logs MySQL

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Migrating Minecraft Realms from AWS to Azure Database for MySQL, with DMS

Slide 32

Slide 32 text

Redis ™

Slide 33

Slide 33 text

Redis Enterprise + Azure Cache for Redis Microsoft and Redis Labs partnered to create the first native integration between Redis Enterprise technology and a major cloud platform

Slide 34

Slide 34 text

Powerful Redis Modules Enjoy access to advanced Redis modules with search, timeseries, and data analysis functionality 10X Larger Cache Sizes Run on NVMe flash storage for caches up to 13 TB in size, all at a lower price per GB Enhanced Availability Run Redis with confidence by experiencing uptime of up to 99.99 percent Redis Enterprise + Azure Cache for Redis

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

What is Azure Databricks? A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure

Slide 37

Slide 37 text

Designed in collaboration with the founders of Apache® Spark™ • One-click set up; streamlined workflows • Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. • Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) • Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)

Slide 38

Slide 38 text

Azure HDInsight

Slide 39

Slide 39 text

Azure HDInsight: Hadoop, Spark, Kafka, & more Microsoft supported distribution of Apache Hadoop and Spark. Full compatibility with latest version of Hadoop

Slide 40

Slide 40 text

Real-time data processing & intelligent insights for taxi hailing and food delivery company

Slide 41

Slide 41 text

Postgres The World’s Most Advanced Open Source Relational Database

Slide 42

Slide 42 text

aka.ms/azure-postgres

Slide 43

Slide 43 text

aka.ms/tbd-by-andrea aka.ms/blog-pg-what-is-flexible

Slide 44

Slide 44 text

Simplified developer experience • Simple end-to-end deployment • Fully compatible with community MySQL & Postgres • Easy cost optimization with Stop/Start & Burstable servers Build resilient apps across availability zones • Zone redundant HA • Fast failover with zero data loss • Co-locate app & database in same zone Maximum control for your databases • Network isolation with VNET integration • More server parameters for fine-grained tuning • Custom maintenance windows About Flexible Server (Preview) & Postgres

Slide 45

Slide 45 text

Citus extension to Postgres

Slide 46

Slide 46 text

Hyperscale (Citus) is now available as a built-in deployment option in Azure Database for PostgreSQL

Slide 47

Slide 47 text

SELECT create_distributed_table( 'table_name', 'distribution_column');

Slide 48

Slide 48 text

APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … How Citus distributes queries across the database cluster

Slide 49

Slide 49 text

“It was a whole different environment once we moved to Hyperscale (Citus). Queries that often took up to 10 minutes with the old system are now processed instantaneously.” - Sami Räsänen: Product Owner and Team Lead, HSL Read more at aka.ms/story-azure-postgres-hsl

Slide 50

Slide 50 text

Distributed PostgreSQL is a game changer. We can support more than 6M queries every day, on 2 PB of data. With Citus, response times for 75% of queries are less than 0.2 seconds.” “

Slide 51

Slide 51 text

Architecting petabyte-scale analytics by scaling out Postgres on Azure with the Citus extension aka.ms/blog-petabyte-scale-analytics

Slide 52

Slide 52 text

Kubernetes

Slide 53

Slide 53 text

Elastic scale PostgreSQL Hyperscale Scale up, scale out on demand Automation at scale Always current Self-service provisioning in seconds Automated updates Evergreen SQL Managed Instance Unified management Single view for on-prem, clouds, and edge Consistent tools and workflows Built-in monitoring and security Azure Arc enabled data services Azure data services in your datacenter, multi-cloud, and edge Connected or Disconnected

Slide 54

Slide 54 text

Building a Hybrid data platform with Azure Arc enabled data services Travis Wright, Microsoft aka.ms/video-azure-arc-ignite20

Slide 55

Slide 55 text

Door #3 – Integrate with open source

Slide 56

Slide 56 text

Integrate with open source Cassandra & Gremlin

Slide 57

Slide 57 text

Gremlin Cassandra

Slide 58

Slide 58 text

Large retailer uses Gremlin API to ingest large volumes of localization & inventory data

Slide 59

Slide 59 text

Spark Postgres Citus ONNX MySQL Redis Hadoop Kafka Kubernetes Gremlin Cassandra

Slide 60

Slide 60 text

So now what?

Slide 61

Slide 61 text

Big Data industry best practices – a deep look into Cloudera Data Platform Ram Venkatesh, Jonathan Hsieh-Demo, Priyank Patel / Cloudera Useful data talks at Open Azure Day Building resilient, mission- critical applications with Azure Database for PostgreSQL Sridhar Ranganathan Architecting secure enterprise- ready solutions for the cloud with Azure and MySQL Andrea Lam Open Source at Microsoft John Gossman, Stormy Peters How Minecraft Realms migrated to MySQL on Azure for improved gameplay, better interoperability, & cost efficiency Amol Bhatnagar Build Mission Critical Apps with the New Azure Cache for Redis, Enterprise Tiers Amiram Mizne / Redis Labs

Slide 62

Slide 62 text

Stephen Hawking Inspires Developers Worldwide aka.ms/video-hawking-inspires-devs

Slide 63

Slide 63 text

“If I have seen further it is by standing on the shoulders of giants.” —Isaac Newton