Open source & Azure Data -- now and next | Open Azure Day 2020 | Sunil Kamath

Open source & Azure Data— now and next Sunil Kamath,
Principal Director, Program Management, Azure Open Source Databases, Azure Data

Microsoft is all-in on open source. We have been on
a journey with open source, and today we are active in the open source ecosystem, we contribute to open source projects, and some of our most vibrant developer tools and frameworks are open source.” —Satya Nadella, CEO Microsoft “

I imagine many of you each have your own journey
with open source

First worked on open source in Toronto, back in 2002

Helped to launch MySQL, Postgres, & MariaDB open source database
services on Azure

Here today to represent our entire Azure Data team

Deliver the one data platform that empowers the world to
make sense of all data anywhere, responsibly, at any scale Our mission at Azure Data

Contribute • Apache® Spark™, Postgres, ONNX

Contribute • Apache® Spark™, Postgres, ONNX Enable • Making open
source available as a service • So you can focus on your application • Partnerships = key

Contribute • Apache® Spark™, Postgres, ONNX Enable • Making open
source available as a service • So you can focus on your application • Partnerships = key Integrate with • Running on top of open source • APIs to get access to your data

Door #1 - Contributing & giving back to the community

Contributing & giving back to the community Apache® Spark™ Postgres
ONNX

Highlights of recent contributions to Apache Spark 2020 Open sourced
Hyperspace, an indexing subsystem for Apache® Spark™ 2020 MASC native Apache® Spark™ connector for Apache® Accumulo 2019 Open source Data Accelerator 2019 Invest in Databricks 2019 Announce .NET for Apache® Spark™

Postgres

Microsoft Azure Welcomes PostgreSQL Committers aka.ms/blog-postgres-committers

Analyzing the Limits of Connection Scalability in Postgres One common
challenge with Postgres for those of you who manage busy Postgres databases, and those of you who foresee being in that situation, is that Postgres does not handle large numbers of connections particularly well. While it is possible to have a few thousand established connections without running into problems, there are some real and hard-to-avoid problems. Since joining Microsoft last year in the Azure Database for PostgreSQL team—where I work on open source Postgres—I have spent a lot of time analyzing and addressing some of the issues with connection scalability in Postgres. In this post I will explain why I think it is important to improve Postgres' handling of large number of connections. Followed by an analysis of the different limiting aspects to connection scalability in Postgres. aka.ms/pg-limits-connection-scalability

Citus is an open source extension to Postgres

Transforms Postgres into a distributed database

aka.ms/citus

What’s new in pg_auto_failover 1.4 for Postgres high availability Postgres
is an amazing RDBMS implementation. Postgres is open source and it’s one of the most standard-compliant SQL implementations that you will find (if not the most compliant.) Postgres is packed with extensions to the standard, and it makes writing and deploying your applications simple and easy. After all, Postgres has your back and manages all the complexities of concurrent transactions for you. In this post I am excited to announce that a new version of pg_auto_failover has been released, pg_auto_failover 1.4. pg_auto_failover is an extension to Postgres built for high availability (HA), that monitors and manages failover for Postgres clusters. Our guiding principles from day one have been simplicity, and correctness. Since pg_auto_failover is open source, you can find it on GitHub and it’s easy to try out. Let’s walk through what’s new in pg_auto_failover, and let’s explore the new capabilities you can take advantage of. aka.ms/blog-pg-auto-failover-1.4

Built-in optimizations that deliver up to 17X faster inferencing and
up to 1.4X faster training Support for a variety of frameworks, operating systems and hardware platforms Used in Office 365, Visual Studio and Bing, delivering over 20 billion inferences every day

Door #2 - Enabling you to use open source, as
a service

Enabling you to use open source as a service MySQL
Redis™ Databricks™ & Apache Spark Hadoop, Apache Spark, Kafka, more Postgres

aka.ms/azure-mysql

aka.ms/tbd-by-andrea aka.ms/blog-mysql-what-is-flexible

North Europe Region – East US West US 2 AZ1
AZ2 AZ3 Locally-redundant backup storage Flexible Server Architecture Flexible Server Availability Zone 1 Availability Zone 2 Availability Zone 3 Linux VM Azure VM AKS App Service Premium Storage Data, Logs MySQL

Migrating Minecraft Realms from AWS to Azure Database for MySQL,
with DMS

Redis ™

Redis Enterprise + Azure Cache for Redis Microsoft and Redis
Labs partnered to create the first native integration between Redis Enterprise technology and a major cloud platform

Powerful Redis Modules Enjoy access to advanced Redis modules with
search, timeseries, and data analysis functionality 10X Larger Cache Sizes Run on NVMe flash storage for caches up to 13 TB in size, all at a lower price per GB Enhanced Availability Run Redis with confidence by experiencing uptime of up to 99.99 percent Redis Enterprise + Azure Cache for Redis

What is Azure Databricks? A fast, easy and collaborative Apache®
Spark™ based analytics platform optimized for Azure

Designed in collaboration with the founders of Apache® Spark™ •
One-click set up; streamlined workflows • Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. • Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) • Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)

Azure HDInsight

Azure HDInsight: Hadoop, Spark, Kafka, & more Microsoft supported distribution
of Apache Hadoop and Spark. Full compatibility with latest version of Hadoop

Real-time data processing & intelligent insights for taxi hailing and
food delivery company

Postgres The World’s Most Advanced Open Source Relational Database

aka.ms/azure-postgres

aka.ms/tbd-by-andrea aka.ms/blog-pg-what-is-flexible

Simplified developer experience • Simple end-to-end deployment • Fully compatible
with community MySQL & Postgres • Easy cost optimization with Stop/Start & Burstable servers Build resilient apps across availability zones • Zone redundant HA • Fast failover with zero data loss • Co-locate app & database in same zone Maximum control for your databases • Network isolation with VNET integration • More server parameters for fine-grained tuning • Custom maintenance windows About Flexible Server (Preview) & Postgres

Citus extension to Postgres

Hyperscale (Citus) is now available as a built-in deployment option
in Azure Database for PostgreSQL

SELECT create_distributed_table( 'table_name', 'distribution_column');

APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns
company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … How Citus distributes queries across the database cluster

“It was a whole different environment once we moved to
Hyperscale (Citus). Queries that often took up to 10 minutes with the old system are now processed instantaneously.” - Sami Räsänen: Product Owner and Team Lead, HSL Read more at aka.ms/story-azure-postgres-hsl

Distributed PostgreSQL is a game changer. We can support more
than 6M queries every day, on 2 PB of data. With Citus, response times for 75% of queries are less than 0.2 seconds.” “

Architecting petabyte-scale analytics by scaling out Postgres on Azure with
the Citus extension aka.ms/blog-petabyte-scale-analytics

Kubernetes

Elastic scale PostgreSQL Hyperscale Scale up, scale out on demand
Automation at scale Always current Self-service provisioning in seconds Automated updates Evergreen SQL Managed Instance Unified management Single view for on-prem, clouds, and edge Consistent tools and workflows Built-in monitoring and security Azure Arc enabled data services Azure data services in your datacenter, multi-cloud, and edge Connected or Disconnected

Building a Hybrid data platform with Azure Arc enabled data
services Travis Wright, Microsoft aka.ms/video-azure-arc-ignite20

Door #3 – Integrate with open source

Integrate with open source Cassandra & Gremlin

Gremlin Cassandra

Large retailer uses Gremlin API to ingest large volumes of
localization & inventory data

Spark Postgres Citus ONNX MySQL Redis Hadoop Kafka Kubernetes Gremlin
Cassandra

So now what?

Big Data industry best practices – a deep look into
Cloudera Data Platform Ram Venkatesh, Jonathan Hsieh-Demo, Priyank Patel / Cloudera Useful data talks at Open Azure Day Building resilient, mission- critical applications with Azure Database for PostgreSQL Sridhar Ranganathan Architecting secure enterprise- ready solutions for the cloud with Azure and MySQL Andrea Lam Open Source at Microsoft John Gossman, Stormy Peters How Minecraft Realms migrated to MySQL on Azure for improved gameplay, better interoperability, & cost efficiency Amol Bhatnagar Build Mission Critical Apps with the New Azure Cache for Redis, Enterprise Tiers Amiram Mizne / Redis Labs

Stephen Hawking Inspires Developers Worldwide aka.ms/video-hawking-inspires-devs

“If I have seen further it is by standing on
the shoulders of giants.” —Isaac Newton

Open source & Azure Data -- now and next | Open...

Open source & Azure Data -- now and next | Open Azure Day 2020 | Sunil Kamath

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Featured

Transcript