Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Management for Serverless Apps

Data Management for Serverless Apps

ServerlessDays Milan, June 4th, 2020

Danilo Poccia

June 04, 2020
Tweet

More Decks by Danilo Poccia

Other Decks in Programming

Transcript

  1. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Danilo Poccia, Chief Evangelist (EMEA)
    @danilop
    Data Management for
    Serverless Apps

    View Slide

  2. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Serverless functions
    AWS Lambda
    Function
    Function
    Function
    Something
    happens!
    Event
    Any public or
    private resource

    View Slide

  3. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    What about data?
    AWS Lambda
    Function
    Function
    Function
    Something
    happens!
    Event
    Unstructured
    Structured
    Semi-Structured
    Transient
    Data

    View Slide

  4. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    How to store data?
    Unstructured
    Structured
    Semi-Structured
    Object Storage
    Database LOB (Large OBject)
    NoSQL Database
    Relational Database
    Relational Database
    with JSON Document Extensions
    NoSQL Database
    Transient Memory
    Data

    View Slide

  5. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Storage & database platforms on AWS
    Object Storage
    NoSQL Database
    Relational Database
    Amazon S3
    Amazon QLDB
    Amazon Aurora
    with MySQL or PostgreSQL
    compatibility (Serverless)
    Amazon DocumentDB
    with MongoDB
    compatibility
    Amazon DynamoDB Amazon Neptune
    Amazon ElastiCache
    Memory
    Amazon Keyspaces
    (for Apache Cassandra)

    View Slide

  6. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Simple Storage Service (S3)
    • An object is identified by a bucket + key combination
    • Your application can achieve at least
    • 3,500 PUT/COPY/POST/DELETE or
    • 5,500 GET/HEAD requests per second per prefix in a bucket
    • S3 URLs can be stored in any repository – “s3://bucket/key”

    View Slide

  7. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Simple Storage Service (S3)
    • Read-after-write consistency for PUTS of new objects
    • If no HEAD or GET requests to a key name before the object is created
    • Can perform SQL-like SELECT on JSON, CSV, or Apache Parquet files
    • Writes, updates, and deletes can send an event
    • Reads can be traced with Amazon CloudTrail, and can be used as
    events

    View Slide

  8. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Dynamo: Amazon’s Highly Available Key-value Store (2007)
    Dynamo: Amazon’s Highly Available Key-value Store
    Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,
    Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall
    and Werner Vogels
    Amazon.com
    ABSTRACT
    Reliability at massive scale is one of the biggest challenges we
    face at Amazon.com, one of the largest e-commerce operations in
    the world; even the slightest outage has significant financial
    consequences and impacts customer trust. The Amazon.com
    platform, which provides services for many web sites worldwide,
    is implemented on top of an infrastructure of tens of thousands of
    servers and network components located in many datacenters
    around the world. At this scale, small and large components fail
    continuously and the way persistent state is managed in the face
    of these failures drives the reliability and scalability of the
    software systems.
    This paper presents the design and implementation of Dynamo, a
    highly available key-value storage system that some of Amazon’s
    core services use to provide an “always-on” experience. To
    achieve this level of availability, Dynamo sacrifices consistency
    under certain failure scenarios. It makes extensive use of object
    versioning and application-assisted conflict resolution in a manner
    that provides a novel interface for developers to use.
    Categories and Subject Descriptors
    D.4.2 [Operating Systems]: Storage Management; D.4.5
    [Operating Systems]: Reliability; D.4.2 [Operating Systems]:
    Performance;
    General Terms
    Algorithms, Management, Measurement, Performance, Design,
    Reliability.
    1. INTRODUCTION
    Amazon runs a world-wide e-commerce platform that serves tens
    of millions customers at peak times using tens of thousands of
    servers located in many data centers around the world. There are
    strict operational requirements on Amazon’s platform in terms of
    performance, reliability and efficiency, and to support continuous
    growth the platform needs to be highly scalable. Reliability is one
    of the most important requirements because even the slightest
    outage has significant financial consequences and impacts
    customer trust. In addition, to support continuous growth, the
    platform needs to be highly scalable.
    One of the lessons our organization has learned from operating
    Amazon’s platform is that the reliability and scalability of a
    system is dependent on how its application state is managed.
    Amazon uses a highly decentralized, loosely coupled, service
    oriented architecture consisting of hundreds of services. In this
    environment there is a particular need for storage technologies
    that are always available. For example, customers should be able
    to view and add items to their shopping cart even if disks are
    failing, network routes are flapping, or data centers are being
    destroyed by tornados. Therefore, the service responsible for
    managing shopping carts requires that it can always write to and
    read from its data store, and that its data needs to be available
    across multiple data centers.
    Dealing with failures in an infrastructure comprised of millions of
    components is our standard mode of operation; there are always a
    small but significant number of server and network components
    that are failing at any given time. As such Amazon’s software
    systems need to be constructed in a manner that treats failure
    handling as the normal case without impacting availability or
    performance.
    To meet the reliability and scaling needs, Amazon has developed
    a number of storage technologies, of which the Amazon Simple
    Storage Service (also available outside of Amazon and known as
    Amazon S3), is probably the best known. This paper presents the
    design and implementation of Dynamo, another highly available
    and scalable distributed data store built for Amazon’s platform.
    Dynamo is used to manage the state of services that have very
    high reliability requirements and need tight control over the
    tradeoffs between availability, consistency, cost-effectiveness and
    performance. Amazon’s platform has a very diverse set of
    applications with different storage requirements. A select set of
    applications requires a storage technology that is flexible enough
    to let application designers configure their data store appropriately
    based on these tradeoffs to achieve high availability and
    guaranteed performance in the most cost effective manner.
    There are many services on Amazon’s platform that only need
    primary-key access to a data store. For many services, such as
    those that provide best seller lists, shopping carts, customer
    preferences, session management, sales rank, and product catalog,
    the common pattern of using a relational database would lead to
    inefficiencies and limit scale and availability. Dynamo provides a
    simple primary-key only interface to meet the requirements of
    these applications.
    Dynamo uses a synthesis of well known techniques to achieve
    scalability and availability: Data is partitioned and replicated
    using consistent hashing [10], and consistency is facilitated by
    object versioning [12]. The consistency among replicas during
    updates is maintained by a quorum-like technique and a
    decentralized replica synchronization protocol. Dynamo employs
    Permission to make digital or hard copies of all or part of this work for
    personal or classroom use is granted without fee provided that copies are
    not made or distributed for profit or commercial advantage and that
    copies bear this notice and the full citation on the first page. To copy
    otherwise, or republish, to post on servers or to redistribute to lists,
    requires prior specific permission and/or a fee.
    SOSP’07, October 14–17, 2007, Stevenson, Washington, USA.
    Copyright 2007 ACM 978-1-59593-591-5/07/0010...$5.00.
    195
    205

    View Slide

  9. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon DynamoDB
    • Data is organized in tables
    • Each item has a primary key
    • partition key
    • (optional) sort key
    • Throughput can be on-demand / provisioned / auto scaling
    • ACID transactions across one or more tables in a region
    • Atomicity, Consistency, Isolation, Durability

    View Slide

  10. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon DynamoDB
    • NoSQL Workbench for Amazon DynamoDB is available
    • For Windows, macOS, and Linux
    • Best Practices for Designing and Architecting with DynamoDB
    • DynamoDB Streams can be consumed by Lambda functions
    • TTL + DynamoDB Streams is a common architectural pattern
    • Global Tables – multi-region, multi-master

    View Slide

  11. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Data modeling with NoSQL Workbench for Amazon DynamoDB
    For example, let’s build an application
    managing URL bookmarks
    for multiple customers
    https://aws.amazon.com/blogs/database/data-modeling-with-nosql-workbench-for-amazon-dynamodb/

    View Slide

  12. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Data modeling with NoSQL Workbench for Amazon DynamoDB
    Customer
    customerId
    email
    fullName
    userPreferences
    creationDate
    updateDate
    Bookmark
    url
    customerId
    folder
    title
    description
    creationDate
    updateDate
    https://aws.amazon.com/blogs/database/data-modeling-with-nosql-workbench-for-amazon-dynamodb/

    View Slide

  13. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    NoSQL Workbench – Data modeler

    View Slide

  14. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    NoSQL Workbench – Data modeler
    CustomerBookmark
    customerId
    sk
    email
    fullName
    userPreferences
    folder
    title
    description
    creationDate
    updateDate
    “CUST#id”
    url

    View Slide

  15. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    NoSQL Workbench – Visualizer

    View Slide

  16. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    NoSQL Workbench – Visualizer by Index

    View Slide

  17. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    NoSQL Workbench – Visualizer by Index

    View Slide

  18. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    NoSQL Workbench – Visualizer by Index

    View Slide

  19. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    NoSQL Workbench – Facets

    View Slide

  20. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Alex DeBrie – The DynamoDB Book
    https://www.dynamodbbook.com

    View Slide

  21. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Keyspaces (for Apache Cassandra)
    • Built on Apache Cassandra
    • Your existing Cassandra Query Language (CQL) code works
    with little or no changes
    • Data is organized in keyspaces and tables
    • Each row has a primary key
    • partition key
    • (optional) clustering column(s)
    • Throughput can be on-demand / provisioned / auto scaling

    View Slide

  22. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Keyspaces (for Apache Cassandra)
    Amazon API
    Gateway
    HTTP API
    Endpoint
    AWS
    Lambda
    Function
    Amazon Keyspaces
    (for Apache
    Cassandra)
    Keyspace
    Table
    Users
    https://aws.amazon.com/blogs/aws/new-amazon-keyspaces-for-apache-cassandra-is-now-generally-available/

    View Slide

  23. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Quantum Ledger Database (QLDB)
    • A ledger database that provides a transparent, immutable, and
    cryptographically verifiable (SHA-256) transaction log
    • Supports PartiQL – an open source, SQL-compatible query language
    designed to work with all data types and structures
    • Implements a flexible document-oriented data model to store and
    process both structured and semi-structured data (Amazon Ion)

    View Slide

  24. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Quantum Ledger Database (QLDB)
    • Transactions are ACID compliant and have full serializability for the
    highest level of isolation
    • Near real-time flow of any changes to your data stored in QLDB via
    Amazon Kinesis Data Streams

    View Slide

  25. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Quantum Ledger Database (QLDB)

    View Slide

  26. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Quantum Ledger Database (QLDB)

    View Slide

  27. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Quantum Ledger Database (QLDB)

    View Slide

  28. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Quantum Ledger Database (QLDB)

    View Slide

  29. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon DocumentDB
    • MongoDB 3.6 compatible
    • Role-based access control (RBAC) with built-in roles
    • Integrated with AWS Identity and Access Management (IAM)
    • Connecting Programmatically to Amazon DocumentDB
    • Running AWS Lambda-based applications with Amazon DocumentDB

    View Slide

  30. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Neptune – Sample Use Cases
    Recommendation
    Engines

    View Slide

  31. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Neptune – Sample Use Cases
    Fraud
    Detection

    View Slide

  32. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Neptune – Sample Use Cases
    Knowledge
    Graphs

    View Slide

  33. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Neptune – Graph Database
    • Supports open graph APIs for both Gremlin and SPARQL
    • Apache TinkerPop Gremlin
    // What are the names of Danilo's friends' friends?
    g.V().has("name",”Danilo").
    out("knows").out("knows").values("name")
    • W3C standard Resource Description Framework (RDF) model and its
    standard query language, SPARQL
    :x ns:p "cat"@en .
    SELECT ?v WHERE { ?v ?p "cat"@en }

    View Slide

  34. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Neptune – Graph Database
    • Purpose-built to store and navigate relationships
    • High throughput, low latency for graph queries
    • With Neptune Streams you can retrieve change records from the log
    stream using an HTTP REST API
    • Returns Gremlin or SPARQL change data

    View Slide

  35. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Aurora: Design Considerations for High
    Throughput Cloud-Native Relational Databases (2017)
    Amazon Aurora: Design Considerations for High
    Throughput Cloud-Native Relational Databases
    Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta,
    Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, Xiaofeng Bao
    Amazon Web Services
    ABSTRACT
    Amazon Aurora is a relational database service for OLTP
    workloads offered as part of Amazon Web Services (AWS). In
    this paper, we describe the architecture of Aurora and the design
    considerations leading to that architecture. We believe the central
    constraint in high throughput data processing has moved from
    compute and storage to the network. Aurora brings a novel
    architecture to the relational database to address this constraint,
    most notably by pushing redo processing to a multi-tenant scale-
    out storage service, purpose-built for Aurora. We describe how
    doing so not only reduces network traffic, but also allows for fast
    crash recovery, failovers to replicas without loss of data, and
    fault-tolerant, self-healing storage. We then describe how Aurora
    achieves consensus on durable state across numerous storage
    nodes using an efficient asynchronous scheme, avoiding
    expensive and chatty recovery protocols. Finally, having operated
    Aurora as a production service for over 18 months, we share
    lessons we have learned from our customers on what modern
    cloud applications expect from their database tier.
    Keywords
    Databases; Distributed Systems; Log Processing; Quorum
    Models; Replication; Recovery; Performance; OLTP
    1. INTRODUCTION
    IT workloads are increasingly moving to public cloud providers.
    Significant reasons for this industry-wide transition include the
    ability to provision capacity on a flexible on-demand basis and to
    pay for this capacity using an operational expense as opposed to
    capital expense model. Many IT workloads require a relational
    OLTP database; providing equivalent or superior capabilities to
    on-premise databases is critical to support this secular transition.
    In modern distributed cloud services, resilience and scalability are
    increasingly achieved by decoupling compute from storage
    [10][24][36][38][39] and by replicating storage across multiple
    nodes. Doing so lets us handle operations such as replacing
    misbehaving or unreachable hosts, adding replicas, failing over
    from a writer to a replica, scaling the size of a database instance
    up or down, etc.
    The I/O bottleneck faced by traditional database systems changes
    in this environment. Since I/Os can be spread across many nodes
    and many disks in a multi-tenant fleet, the individual disks and
    nodes are no longer hot. Instead, the bottleneck moves to the
    network between the database tier requesting I/Os and the storage
    tier that performs these I/Os. Beyond the basic bottlenecks of
    packets per second (PPS) and bandwidth, there is amplification of
    traffic since a performant database will issue writes out to the
    storage fleet in parallel. The performance of the outlier storage
    node, disk or network path can dominate response time.
    Although most operations in a database can overlap with each
    other, there are several situations that require synchronous
    operations. These result in stalls and context switches. One such
    situation is a disk read due to a miss in the database buffer cache.
    A reading thread cannot continue until its read completes. A cache
    miss may also incur the extra penalty of evicting and flushing a
    dirty cache page to accommodate the new page. Background
    processing such as checkpointing and dirty page writing can
    reduce the occurrence of this penalty, but can also cause stalls,
    context switches and resource contention.
    Transaction commits are another source of interference; a stall in
    committing one transaction can inhibit others from progressing.
    Handling commits with multi-phase synchronization protocols
    such as 2-phase commit (2PC) [3][4][5] is challenging in a cloud-
    scale distributed system. These protocols are intolerant of failure
    and high-scale distributed systems have a continual “background
    noise” of hard and soft failures. They are also high latency, as
    high scale systems are distributed across multiple data centers.
    Permission to make digital or hard copies of all or part of this work for personal or
    classroom use is granted without fee provided that copies are not made or
    distributed for profit or commercial advantage and that copies bear this notice and
    the full citation on the first page. Copyrights for components of this work owned
    by others than the author(s) must be honored. Abstracting with credit is permitted.
    To copy otherwise, or republish, to post on servers or to redistribute to lists, require
    prior specific permission and/or a fee. Request permissions from
    [email protected]
    SIGMOD’17, May 14 – 19, 2017, Chicago, IL, USA.
    Copyright is held by the owner/author(s). Publication rights licensed to ACM.
    ACM 978-1-4503-4197-4/17/05…$15.00
    DOI: http://dx.doi.org/10.1145/3035918.3056101
    Control Plane
    Data Plane
    Amazon
    DynamoDB
    Amazon SWF
    Logging + Storage
    SQL
    Transactions
    Caching
    Amazon S3
    Figure 1: Move logging and storage off the database engine
    1041

    View Slide

  36. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Move logging and storage off the database engine
    `

    View Slide

  37. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Network IO in mirrored MySQL (not Amazon Aurora)

    View Slide

  38. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Network IO in Amazon Aurora
    `

    View Slide

  39. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Aurora
    • MySQL 5.6 and 5.7 compatible
    • PostgreSQL 9.6 and 10 compatible
    • Serverless
    • MySQL 5.6 and PostgreSQL 10.7 compatible
    • Built-in synchronous Data API with an
    • HTTP endpoint
    • Integration with AWS SDKs

    View Slide

  40. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon Aurora
    • Integrated with Machine Learning services
    • Amazon Comprehend
    • Amazon SageMaker
    • Global Database
    • Sub-Second Data Access in Any Region
    • Cross-Region Disaster Recovery

    View Slide

  41. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Using Machine Learning directly from your databases
    `
    https://aws.amazon.com/blogs/aws/new-for-amazon-aurora-use-machine-learning-directly-from-your-databases/

    View Slide

  42. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Using Amazon Aurora Serverless
    Sample workload using Aurora Serverless PostgreSQL

    View Slide

  43. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon RDS Proxy – How It Works
    Preview
    Preview support
    Amazon RDS MySQL & PostgreSQL
    Amazon Aurora MySQL & PostgreSQL

    View Slide

  44. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon ElastiCache
    • Fully managed
    • Redis
    • Memcached
    • For applications that require sub-millisecond response times
    • Caching
    • Session stores
    • Gaming
    • Geospatial services
    • Real-time analytics
    • Queuing

    View Slide

  45. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Amazon ElastiCache for Redis - Global Datastore
    • Write locally
    • Read globally
    • Cross-region
    disaster recovery
    ``

    View Slide

  46. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    GraphQL can be the entry point for storage and logic

    View Slide

  47. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Takeaways
    Using an object storage provides lots of benefits,
    plan for the right level of consistency
    You are free to use the best database for your use case:
    relational, key/value, document-oriented, graph, ledger
    Consider when you have to manage connections, put them in the
    initialization of your serverless functions, manage reconnections
    Using IAM authentication & authorization
    can simplify configuration and management, and improve security

    View Slide

  48. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
    Thank you!
    @danilop Please give me your feedback J

    View Slide