Data Management for Serverless Apps

© 2020, Amazon Web Services, Inc. or its Affiliates. All
rights reserved. Danilo Poccia, Chief Evangelist (EMEA) @danilop Data Management for Serverless Apps

rights reserved. Serverless functions AWS Lambda Function Function Function Something happens! Event Any public or private resource

rights reserved. What about data? AWS Lambda Function Function Function Something happens! Event Unstructured Structured Semi-Structured Transient Data

rights reserved. How to store data? Unstructured Structured Semi-Structured Object Storage Database LOB (Large OBject) NoSQL Database Relational Database Relational Database with JSON Document Extensions NoSQL Database Transient Memory Data

rights reserved. Storage & database platforms on AWS Object Storage NoSQL Database Relational Database Amazon S3 Amazon QLDB Amazon Aurora with MySQL or PostgreSQL compatibility (Serverless) Amazon DocumentDB with MongoDB compatibility Amazon DynamoDB Amazon Neptune Amazon ElastiCache Memory Amazon Keyspaces (for Apache Cassandra)

rights reserved. Amazon Simple Storage Service (S3) • An object is identified by a bucket + key combination • Your application can achieve at least • 3,500 PUT/COPY/POST/DELETE or • 5,500 GET/HEAD requests per second per prefix in a bucket • S3 URLs can be stored in any repository – “s3://bucket/key”

rights reserved. Amazon Simple Storage Service (S3) • Read-after-write consistency for PUTS of new objects • If no HEAD or GET requests to a key name before the object is created • Can perform SQL-like SELECT on JSON, CSV, or Apache Parquet files • Writes, updates, and deletes can send an event • Reads can be traced with Amazon CloudTrail, and can be used as events

rights reserved. Dynamo: Amazon’s Highly Available Key-value Store (2007) Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels Amazon.com ABSTRACT Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use. Categories and Subject Descriptors D.4.2 [Operating Systems]: Storage Management; D.4.5 [Operating Systems]: Reliability; D.4.2 [Operating Systems]: Performance; General Terms Algorithms, Management, Measurement, Performance, Design, Reliability. 1. INTRODUCTION Amazon runs a world-wide e-commerce platform that serves tens of millions customers at peak times using tens of thousands of servers located in many data centers around the world. There are strict operational requirements on Amazon’s platform in terms of performance, reliability and efficiency, and to support continuous growth the platform needs to be highly scalable. Reliability is one of the most important requirements because even the slightest outage has significant financial consequences and impacts customer trust. In addition, to support continuous growth, the platform needs to be highly scalable. One of the lessons our organization has learned from operating Amazon’s platform is that the reliability and scalability of a system is dependent on how its application state is managed. Amazon uses a highly decentralized, loosely coupled, service oriented architecture consisting of hundreds of services. In this environment there is a particular need for storage technologies that are always available. For example, customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados. Therefore, the service responsible for managing shopping carts requires that it can always write to and read from its data store, and that its data needs to be available across multiple data centers. Dealing with failures in an infrastructure comprised of millions of components is our standard mode of operation; there are always a small but significant number of server and network components that are failing at any given time. As such Amazon’s software systems need to be constructed in a manner that treats failure handling as the normal case without impacting availability or performance. To meet the reliability and scaling needs, Amazon has developed a number of storage technologies, of which the Amazon Simple Storage Service (also available outside of Amazon and known as Amazon S3), is probably the best known. This paper presents the design and implementation of Dynamo, another highly available and scalable distributed data store built for Amazon’s platform. Dynamo is used to manage the state of services that have very high reliability requirements and need tight control over the tradeoffs between availability, consistency, cost-effectiveness and performance. Amazon’s platform has a very diverse set of applications with different storage requirements. A select set of applications requires a storage technology that is flexible enough to let application designers configure their data store appropriately based on these tradeoffs to achieve high availability and guaranteed performance in the most cost effective manner. There are many services on Amazon’s platform that only need primary-key access to a data store. For many services, such as those that provide best seller lists, shopping carts, customer preferences, session management, sales rank, and product catalog, the common pattern of using a relational database would lead to inefficiencies and limit scale and availability. Dynamo provides a simple primary-key only interface to meet the requirements of these applications. Dynamo uses a synthesis of well known techniques to achieve scalability and availability: Data is partitioned and replicated using consistent hashing [10], and consistency is facilitated by object versioning [12]. The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. Dynamo employs Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SOSP’07, October 14–17, 2007, Stevenson, Washington, USA. Copyright 2007 ACM 978-1-59593-591-5/07/0010...$5.00. 195 205

rights reserved. Amazon DynamoDB • Data is organized in tables • Each item has a primary key • partition key • (optional) sort key • Throughput can be on-demand / provisioned / auto scaling • ACID transactions across one or more tables in a region • Atomicity, Consistency, Isolation, Durability

rights reserved. Amazon DynamoDB • NoSQL Workbench for Amazon DynamoDB is available • For Windows, macOS, and Linux • Best Practices for Designing and Architecting with DynamoDB • DynamoDB Streams can be consumed by Lambda functions • TTL + DynamoDB Streams is a common architectural pattern • Global Tables – multi-region, multi-master

rights reserved. Data modeling with NoSQL Workbench for Amazon DynamoDB For example, let’s build an application managing URL bookmarks for multiple customers https://aws.amazon.com/blogs/database/data-modeling-with-nosql-workbench-for-amazon-dynamodb/

rights reserved. Data modeling with NoSQL Workbench for Amazon DynamoDB Customer customerId email fullName userPreferences creationDate updateDate Bookmark url customerId folder title description creationDate updateDate https://aws.amazon.com/blogs/database/data-modeling-with-nosql-workbench-for-amazon-dynamodb/

rights reserved. NoSQL Workbench – Data modeler

rights reserved. NoSQL Workbench – Data modeler CustomerBookmark customerId sk email fullName userPreferences folder title description creationDate updateDate “CUST#id” url

rights reserved. NoSQL Workbench – Visualizer

rights reserved. NoSQL Workbench – Visualizer by Index

rights reserved. NoSQL Workbench – Facets

rights reserved. Alex DeBrie – The DynamoDB Book https://www.dynamodbbook.com

rights reserved. Amazon Keyspaces (for Apache Cassandra) • Built on Apache Cassandra • Your existing Cassandra Query Language (CQL) code works with little or no changes • Data is organized in keyspaces and tables • Each row has a primary key • partition key • (optional) clustering column(s) • Throughput can be on-demand / provisioned / auto scaling

rights reserved. Amazon Keyspaces (for Apache Cassandra) Amazon API Gateway HTTP API Endpoint AWS Lambda Function Amazon Keyspaces (for Apache Cassandra) Keyspace Table Users https://aws.amazon.com/blogs/aws/new-amazon-keyspaces-for-apache-cassandra-is-now-generally-available/

rights reserved. Amazon Quantum Ledger Database (QLDB) • A ledger database that provides a transparent, immutable, and cryptographically verifiable (SHA-256) transaction log • Supports PartiQL – an open source, SQL-compatible query language designed to work with all data types and structures • Implements a flexible document-oriented data model to store and process both structured and semi-structured data (Amazon Ion)

rights reserved. Amazon Quantum Ledger Database (QLDB) • Transactions are ACID compliant and have full serializability for the highest level of isolation • Near real-time flow of any changes to your data stored in QLDB via Amazon Kinesis Data Streams

rights reserved. Amazon Quantum Ledger Database (QLDB)

rights reserved. Amazon DocumentDB • MongoDB 3.6 compatible • Role-based access control (RBAC) with built-in roles • Integrated with AWS Identity and Access Management (IAM) • Connecting Programmatically to Amazon DocumentDB • Running AWS Lambda-based applications with Amazon DocumentDB

rights reserved. Amazon Neptune – Sample Use Cases Recommendation Engines

rights reserved. Amazon Neptune – Sample Use Cases Fraud Detection

rights reserved. Amazon Neptune – Sample Use Cases Knowledge Graphs

rights reserved. Amazon Neptune – Graph Database • Supports open graph APIs for both Gremlin and SPARQL • Apache TinkerPop Gremlin // What are the names of Danilo's friends' friends? g.V().has("name",”Danilo"). out("knows").out("knows").values("name") • W3C standard Resource Description Framework (RDF) model and its standard query language, SPARQL :x ns:p "cat"@en . SELECT ?v WHERE { ?v ?p "cat"@en }

rights reserved. Amazon Neptune – Graph Database • Purpose-built to store and navigate relationships • High throughput, low latency for graph queries • With Neptune Streams you can retrieve change records from the log stream using an HTTP REST API • Returns Gremlin or SPARQL change data

rights reserved. Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases (2017) Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, Xiaofeng Bao Amazon Web Services ABSTRACT Amazon Aurora is a relational database service for OLTP workloads offered as part of Amazon Web Services (AWS). In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale- out storage service, purpose-built for Aurora. We describe how doing so not only reduces network traffic, but also allows for fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage. We then describe how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols. Finally, having operated Aurora as a production service for over 18 months, we share lessons we have learned from our customers on what modern cloud applications expect from their database tier. Keywords Databases; Distributed Systems; Log Processing; Quorum Models; Replication; Recovery; Performance; OLTP 1. INTRODUCTION IT workloads are increasingly moving to public cloud providers. Significant reasons for this industry-wide transition include the ability to provision capacity on a flexible on-demand basis and to pay for this capacity using an operational expense as opposed to capital expense model. Many IT workloads require a relational OLTP database; providing equivalent or superior capabilities to on-premise databases is critical to support this secular transition. In modern distributed cloud services, resilience and scalability are increasingly achieved by decoupling compute from storage [10][24][36][38][39] and by replicating storage across multiple nodes. Doing so lets us handle operations such as replacing misbehaving or unreachable hosts, adding replicas, failing over from a writer to a replica, scaling the size of a database instance up or down, etc. The I/O bottleneck faced by traditional database systems changes in this environment. Since I/Os can be spread across many nodes and many disks in a multi-tenant fleet, the individual disks and nodes are no longer hot. Instead, the bottleneck moves to the network between the database tier requesting I/Os and the storage tier that performs these I/Os. Beyond the basic bottlenecks of packets per second (PPS) and bandwidth, there is amplification of traffic since a performant database will issue writes out to the storage fleet in parallel. The performance of the outlier storage node, disk or network path can dominate response time. Although most operations in a database can overlap with each other, there are several situations that require synchronous operations. These result in stalls and context switches. One such situation is a disk read due to a miss in the database buffer cache. A reading thread cannot continue until its read completes. A cache miss may also incur the extra penalty of evicting and flushing a dirty cache page to accommodate the new page. Background processing such as checkpointing and dirty page writing can reduce the occurrence of this penalty, but can also cause stalls, context switches and resource contention. Transaction commits are another source of interference; a stall in committing one transaction can inhibit others from progressing. Handling commits with multi-phase synchronization protocols such as 2-phase commit (2PC) [3][4][5] is challenging in a cloud- scale distributed system. These protocols are intolerant of failure and high-scale distributed systems have a continual “background noise” of hard and soft failures. They are also high latency, as high scale systems are distributed across multiple data centers. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, require prior specific permission and/or a fee. Request permissions from [email protected] SIGMOD’17, May 14 – 19, 2017, Chicago, IL, USA. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-4197-4/17/05…$15.00 DOI: http://dx.doi.org/10.1145/3035918.3056101 Control Plane Data Plane Amazon DynamoDB Amazon SWF Logging + Storage SQL Transactions Caching Amazon S3 Figure 1: Move logging and storage off the database engine 1041

rights reserved. Move logging and storage off the database engine `

rights reserved. Network IO in mirrored MySQL (not Amazon Aurora)

rights reserved. Network IO in Amazon Aurora `

rights reserved. Amazon Aurora • MySQL 5.6 and 5.7 compatible • PostgreSQL 9.6 and 10 compatible • Serverless • MySQL 5.6 and PostgreSQL 10.7 compatible • Built-in synchronous Data API with an • HTTP endpoint • Integration with AWS SDKs

rights reserved. Amazon Aurora • Integrated with Machine Learning services • Amazon Comprehend • Amazon SageMaker • Global Database • Sub-Second Data Access in Any Region • Cross-Region Disaster Recovery

rights reserved. Using Machine Learning directly from your databases ` https://aws.amazon.com/blogs/aws/new-for-amazon-aurora-use-machine-learning-directly-from-your-databases/

rights reserved. Using Amazon Aurora Serverless Sample workload using Aurora Serverless PostgreSQL

rights reserved. Amazon RDS Proxy – How It Works Preview Preview support Amazon RDS MySQL & PostgreSQL Amazon Aurora MySQL & PostgreSQL

rights reserved. Amazon ElastiCache • Fully managed • Redis • Memcached • For applications that require sub-millisecond response times • Caching • Session stores • Gaming • Geospatial services • Real-time analytics • Queuing

rights reserved. Amazon ElastiCache for Redis - Global Datastore • Write locally • Read globally • Cross-region disaster recovery ``

rights reserved. GraphQL can be the entry point for storage and logic

rights reserved. Takeaways Using an object storage provides lots of benefits, plan for the right level of consistency You are free to use the best database for your use case: relational, key/value, document-oriented, graph, ledger Consider when you have to manage connections, put them in the initialization of your serverless functions, manage reconnections Using IAM authentication & authorization can simplify configuration and management, and improve security

rights reserved. Thank you! @danilop Please give me your feedback J

Data Management for Serverless Apps

Data Management for Serverless Apps

More Decks by Danilo Poccia

Other Decks in Programming

Featured

Transcript