Choosing The Right Database For The Right Applications in 2019 [AWS Community Day @ Sofia]

Slide 1

Slide 1 text

Slide 2

Slide 2 text

About me • Software Engineer & Web Developer • Worked in a startup for 4.5 years • ServerlessDays Organizer • AWS Customer since 2013

Slide 3

Slide 3 text

1970 1980 1990 2000 Oracle DB2 SQL Server MySQL PostgreSQL DynamoDB Redis MongoDB Elasticsearch Neptune Cassandra Access Aurora 2010 Timestream QLDB Amazon DocumentDB

Slide 4

Slide 4 text

Data categories and common use cases Relational Key value Document In-memory Graph Search Time series Ledger Referential integrity, ACID transactions, schema- on-write Low-latency, key lookups with high throughput and fast ingestion of data Indexing and storing documents with support for query on any attribute Microseconds latency, key- based queries, and specialized data structures Creating and navigating data relations easily and quickly Lift and shift, EMR, CRM, finance Real-time bidding, shopping cart, social Content management, personalization, mobile Leaderboards, real-time analytics, caching Fraud detection, social networks, recommendatio n engines Indexing and searching semistructured logs and data Product catalog, help, and FAQs, full text Collect, store, and process data sequenced by time IoT applications, event tracking Complete, immutable, and verifiable history of all changes to application data Systems of record, supply chain, healthcare, registrations, financial

Slide 5

Slide 5 text

Purpose built The right tool for the right job allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html

Slide 6

Slide 6 text

AWS: Purpose-built databases Relational Key value Document In-memory Graph Search Amazon DynamoDB Amazon Neptune Amazon RDS Aurora Commercial Community Amazon ElastiCache Amazon Elasticsearch Service Amazon DocumentDB Time series Ledger Amazon Timestream Amazon Quantum Ledger Database Memcached Redis

Slide 7

Slide 7 text

400,000+ customers using AWS databases and analytics

Slide 8

Slide 8 text

Two fundamental areas of focus “Lift and shift” existing apps to the cloud Quickly build new apps in the cloud

Slide 9

Slide 9 text

AWS Database Migration Service (AWS DMS) M i g r a t i n g d a t a b a s e s t o A W S Migrate between on-premises and AWS Migrate between databases Automated schema conversion Data replication for migration with zero downtime 100,000+ databases migrated

Slide 10

Slide 10 text

Modern apps create new requirements Users: 1 million+ Data volume: TB–PB–EB Locality: Global Performance: Milliseconds–microseconds Request rate: Millions Access: Web, mobile, IoT, devices Scale: Up-down, Out-in Economics: Pay for what you use Developer access: No assembly required Social media Ride hailing Media streaming Dating

Slide 11

Slide 11 text

User search history: Amazon DynamoDB • Massive data volume • Need quick lookups for personalized search Session state: Amazon ElastiCache • In-memory store for sub-millisecond fetch Relational data: Amazon RDS • Referential integrity • Primary transactional database

Slide 12

Slide 12 text

• DynamoDB: 31B items tracking language exercises • Aurora: Primary transactional db (user data) • ElastiCache: Instant access to common words and phrases 300M total users 7B exercises per month

Slide 13

Slide 13 text

Relational data

Slide 14

Slide 14 text

Traditional SQL • TCP based wire protocol • Well Known, lots of uses • Common drivers (JDBC) • Frequently used with ORMs • Scale UP individual instances • Scale OUT with read replicas • Sharding at application level • Lots of flavors but very similar language • Joins ! INSERT INTO users (id, first_name, last_name) VALUES (1, ‘Alex’, ‘Casalboni’); SELECT col1, col2, col3 FROM table1 WHERE col4 = 1 AND col5 = 2 GROUP BY col1 HAVING count(*) > 1 ORDER BY col2

Slide 15

Slide 15 text

Relational model Data model • Data is stored in rows and tables • Data is normalized • Strict schema • Relationships established via keys enforced by the system • Data accuracy and consistency • Complex queries to extract and reshape data on-demand Patient * Patient ID First Name Last Name Gender DOB * Doctor ID Visit * Visit ID * Patient ID * Hospital ID Date * Treatment ID Medical Treatment * Treatment ID Procedure How Performed Adverse Outcome Contraindication Doctor * Doctor ID First Name Last Name Medical Specialty * Hospital Affiliation Hospital * Hospital ID Name Address Rating

Slide 16

Slide 16 text

Patient * Patient ID First Name Last Name Gender DOB * Doctor ID Visit * Visit ID * Patient ID * Hospital ID Date * Treatment ID Medical Treatment * Treatment ID Procedure How Performed Adverse Outcome Contraindication Doctor * Doctor ID First Name Last Name Medical Specialty * Hospital Affiliation Hospital * Hospital ID Name Address Rating Query model: SQL SELECT d.first_name, d.last_name, count(*) FROM visit as v, hospital as h, doctor as d WHERE v.hospital_id = h.hospital_id AND h.hospital_id = d.hospital AND v.t_date > date_trunc('week’, CURRENT_TIMESTAMP - interval '1 week') GROUP BY d.first_name, d.last_name; Relational model

Slide 17

Slide 17 text

Amazon Relational Database Service (RDS) Managed relational database service with a choice of six popular database engines Easy to administer Available and durable Highly scalable Fast and secure No need for infrastructure provisioning, installing, and maintaining DB software Automatic Multi-AZ data replication; automated backup, snapshots, failover Scale database compute and storage with a few clicks with no app downtime SSD storage and guaranteed provisioned I/O; data encryption at rest and in transit

Slide 18

Slide 18 text

Amazon Aurora ascendant: How we designed a cloud-native relational database allthingsdistributed.com

Slide 19

Slide 19 text

Amazon Aurora MySQL and PostgreSQL-compatible relational database built for the cloud Performance and availability of commercial-grade databases at 1/10th the cost Performance and scalability Availability and durability Highly secure Fully managed 5x perf of standard MySQL 3x perf of standard PostgreSQL Up to 15 read replicas Self-healing storage 6 copies of data across 3 AZ Continuous backup to S3 Network isolation, encryption at rest/transit No hardware provisioning, software patching, setup, configuration, or backups

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

NoSQL: Key-Value and Documents

Slide 22

Slide 22 text

SQL vs NoSQL Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale SQL NoSQL

Slide 23

Slide 23 text

Key-value data • Simple key-value pairs • Partitioned by keys • Resilient to failure • High throughput, low- latency reads and writes • Consistent performance at scale Table 1 … … Partitions … Highly partitionable data

Slide 24

Slide 24 text

Gamers Primary Key Attributes Gamer Tag Type Hammer57 Rank Level Points Tier 87 4050 Elite Status Health Progress 90 30 Weapon Class Damage Range Taser 87% 50 FluffyDuffy Rank Level Points Tier 5 1072 Trainee Status Health Progress 37 8 // Status of Hammer57 GET { TableName:"Gamers", Key: { "GamerTag":"Hammer57", "Type":"Status” } } // Return all Hammer57 Gamers GamerTag = :a :a Hammer57 Key-value data

Slide 25

Slide 25 text

Amazon DynamoDB Fast and flexible key value database service for any scale Performance at scale Serverless Comprehensive security Global database for global apps Single-digit millisecond response times at any scale; Virtually unlimited throughput No server provisioning, software patching/upgrades; automatic scaling and continuous data backups Encrypts all data by default; fully integrated with AWS IAM for robust security Build global apps by easily replicating tables across multiple regions

Slide 26

Slide 26 text

Amazon DynamoDB Data Structure Table Items Attributes Partition Key Sort Key Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities All items for key ==, <, >, >=, <= “begins with” “between” “contains” “in” sorted results counts top/bottom N values

Slide 27

Slide 27 text

DynamoDB Schema and Queries • Connects over HTTP • Global Secondary Indexes and Local Secondary Indexes • Speed up queries with DAX • Global tables (multi-region-multi- master) • Transactions across multiple tables • Change Streams • Rich query language with expressions • Provision read and write capacity units separately • Also supports pay-per-request model import boto3 votes = boto3.resource("dynamodb").Table("votes") resp = votes.update_item( Key={"name": editor}, UpdateExpression="ADD votes :incr", ExpressionAttributeValues={":incr": 1}, ReturnValues="ALL_NEW" )

Slide 28

Slide 28 text

DynamoDB Advancements over the last 24 months VPC endpoints April 2017 Auto scaling June 2017 DynamoDB Accelerator (DAX) April 2017 Time To Live (TTL) February 2017 Global tables On-demand backup Encryption at rest November 2017 November 2017 November 2017 Point-in-time recovery March 2018 SLA June 2018 99.999% SLA August 2018 Adaptive capacity ACID November 2018 Transactions November 2018 On-demand

Slide 29

Slide 29 text

Document databases • Data is stored in JSON-like documents • Documents map naturally to how humans model data • Flexible schema and indexing • Expressive query language built for documents (ad hoc queries and aggregations) JSON documents are first-class objects of the database { id: 1, name: "sue", age: 26, email: "[email protected]", promotions: ["new user", "5%", "dog lover"], memberDate: 2018-2-22, shoppingCart: [ {product:"abc", quantity:2, cost:19.99}, {product:"edf", quantity:3, cost: 2.99} ] }

Slide 30

Slide 30 text

!= == Evolution of document databases JSON became the de facto data interchange format Friction when converting JSON to the relational model Object-relational mappings (ORMs) were created to help with this friction Document databases solved the problem (Client) (App) (Database) JSON Relational JSON

Slide 31

Slide 31 text

Use cases for document data User profiles { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"} } { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"} } { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"}, ExploidingSnails: { hi_score: 3185400, global_rank: 5139, bonus_levels: true }, promotions: ["new user","5%","snail lover"] } { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"}, ExploidingSnails: { hi_score: 3185400, global_rank: 5139, bonus_levels: true } }

Slide 32

Slide 32 text

Use cases for document data Mobile Retail and marketing User profiles Catalog Content management Personalization

Slide 33

Slide 33 text

Challenges of existing document databases Hard to manage Hard to scale Hard to secure Hard to back up

Slide 34

Slide 34 text

Amazon DocumentDB Fast, scalable, and fully managed MongoDB-compatible database service Fast Scalable Fully managed MongoDB compatible Millions of RPS with ms latency 2x throughput of MongoDB Separation of compute and storage Up to 15 read replicas Auto patching and quick setup Automatic backups Compatible with MongoDB 3.6 Same SDKs and tools

Slide 35

Slide 35 text

In-Memory stores

Slide 36

Slide 36 text

In-memory • No persistence, in-memory • Microsecond performance • Simple commands for manipulating in memory data structures • Strings, hashes, lists, sets, and sorted sets Database Memory (buffer pool) Disk Query processor Get/Put APIs Memory Milliseconds to microseconds (10x faster) Storage engine

Slide 37

Slide 37 text

In-memory ops set a "hello" // Set key "a" with a string value and no expiration OK get a // Get value for key "a" "hello" get b // Get value for key "b" results in miss (nil) set b "Good-bye" EX 5 // Set key "b" with a string value and a 5 second expiration "Good-bye" get b // Get value for key "b" "Good-bye" // wait >= 5 seconds get b (nil) // key has expired, nothing returned

Slide 38

Slide 38 text

Amazon ElastiCache Redis and Memcached compatible, in-memory data store and cache Redis and Memcached compatible Extreme performance Secure and reliable Easily scalable Fully compatible with open source tools Redis and Memcached In-memory data store and cache Microsecond response time Network isolation, encryption, HIPAA, PIC Multi-AZ, automatic failover Scales reads and write Both sharding and replicas

Slide 39

Slide 39 text

Searching data

Slide 40

Slide 40 text

Search: Full text search

Slide 41

Slide 41 text

Search The bright blue butterfly hangs on the breeze It’s best to forget the great sky and to retire from every wind. Under blue sky, in bright sunlight, one need search around. Document 1 Document 2 Document 3 a and around every for from in is it not on one the to under ID Term Document 1 best 2 2 blue 1, 3 3 bright 1,3 4 breeze 1 5 butterfly 1 6 forget 2 7 great 2 8 hangs 1 9 need 3 10 retire 2 11 search 3 12 sky 2, 3 13 wind 2 Inverted index Stopword list

Slide 42

Slide 42 text

Search _search?q=house "hits": { "total": 85, "max_score": 6.6137657, "hits": [{ "_index": "movies", "_type": "movie", "_id": "tt0077975", "_score": 6.6137657, "_source": { "directors": [ "John Landis" ], "release_date": "1978-07-27T00:00:00Z", "rating": 7.5, "genres": [ "Comedy", "Romance" ], "image_url": "http://ia.jpg "plot": "At a 1962 College, Dean Vernon Wormer…", "title": "Animal House", "rank": 527, "running_time_secs": 6540, "actors": [ "John Belushi","Karen Allen","Tom Hulce" ], "year": 1978, "id": "tt0077975" } },

Slide 43

Slide 43 text

Amazon Elasticsearch Service Fully managed, reliable, and scalable Elasticsearch service Easy to use Scalable Highly available Secure Deploy a production- ready Elasticsearch cluster in minutes Resize your cluster with a few clicks or a single API call Replicate across AZs, with monitoring and automated self-healing Deploy into VPC and restrict access using security groups and IAM policies

Slide 44

Slide 44 text

Graph data

Slide 45

Slide 45 text

Graph data • Relationships are first-class objects • Data is modeled and queried as a graph • Vertices connected by Edges • Creating and navigating relations between data easily and quickly Purchased Purchased Follows Purchased Knows Product Sport Follows

Slide 46

Slide 46 text

Different approaches for highly connected data Purpose-built for a business process Purpose-built to answer questions about relationships

Slide 47

Slide 47 text

Graph use cases Social networking Life sciences Network & IT operations Fraud detection Recommendation engines Knowledge graphs

Slide 48

Slide 48 text

Product recommendation to a user PURCHASED PURCHASED PURCHASED PURCHASED PURCHASED KNOWS BOOK #1 BOOK #2 PURCHASED BOOK #3 Graph example

Slide 49

Slide 49 text

Amazon Neptune Fully managed graph database Fast Reliable Easy Open Query billions of relationships with millisecond latency 6 replicas of your data across 3 AZs with fully backup and restore Build powerful queries with Gremlin and SPARQL Supports Apache TinkerPop & W3C RDF

Slide 50

Slide 50 text

Open Source Apache TinkerPop™ Gremlin Traversal Language W3C Standard SPARQL Query Language R E S O U R C E D E S C R I P T I O N F R A M E W O R K ( R D F ) P R O P E R T Y G R A P H LEADING GRAPH MODELS AND FRAMEWORKS

Slide 51

Slide 51 text

Amazon Neptune high level architecture Bulk load from S3 Database Mgmt.

Slide 52

Slide 52 text

Gremlin Traversal Language g.addV('person').property(id, 1).property('name', ‘alex') g.V('1').property(single, 'age’, 30) g.addV('person').property(id, 2).property('name', ‘julien') g.addE('knows').from(g.V('1')).to(g.V('2')).property('weight', 1.0) g.V().hasLabel('person') g.V().has('name', ‘alex').out('knows').valueMap() tinkerpop.apache.org/docs/current/reference/#graph-traversal-steps

Slide 53

Slide 53 text

SPARQL and RDF w3.org/TR/sparql11-query/

Slide 54

Slide 54 text

Demo

Slide 55

Slide 55 text

Retail demo application Demo application: 1. Available today 2. On GitHub: /aws-samples/aws- bookstore-demo-app 3. One-click AWS CloudFormation deployment Search Indexing and searching semistructured logs and data Product search Amazon Neptune Amazon Elasticsearch Service Key-value High throughput, Low- latency reads and writes, endless scale Product catalog, shopping cart Graph Quickly and easily create and navigate relationships between data Product recommendation In-memory Query by key with microsecond latency Product leaderboard DynamoDB ElastiCache

Slide 56

Slide 56 text

Time series

Slide 57

Slide 57 text

Time series data What is time series data? What is special about a time series database? A sequence of data points recorded over a time interval Time is the single primary axis of the data model t

Slide 58

Slide 58 text

Time series use cases Application events IoT sensor readings DevOps data Humidity % Water vapor 91.0 94.0 86.0 93.0

Slide 59

Slide 59 text

Existing time-series databases Relational databases Difficult to maintain high availability Difficult to scale Limited data lifecycle management Inefficient time series data processing Unnatural for time series data Rigid schema inflexible for fast moving time series data Building with time series data is challenging

Slide 60

Slide 60 text

Amazon Timestream (sign up for the preview) Fast, scalable, fully managed time-series database 1,000x faster and 1/10 the cost of relational databases Collect data at the rate of millions of inserts per second (10M/second) Trillions of daily events Adaptive query processing engine maintains steady, predictable performance Time-series analytics Built-in functions for interpolation, smoothing, and approximation Serverless Automated setup, configuration, server provisioning, software patching

Slide 61

Slide 61 text

Ledgers & journals

Slide 62

Slide 62 text

Common customer use cases Ledgers with centralized control Healthcare Verify and track hospital equipment inventory Manufacturers Track distribution of a recalled product HR & payroll Track changes to an individual’s profile Government Track vehicle title history

Slide 63

Slide 63 text

Challenges with building ledgers Adds unnecessary complexity Blockchain RDBMS – audit tables Difficult to maintain Hard to use and slow Hard to build Custom audit functionality using triggers or stored procedures Impossible to verify No way to verify changes made to data by sys admins

Slide 64

Slide 64 text

Slide 65

Slide 65 text

ID Manufacturer Model Year VIN Owner ID Version Start End Manufacturer Model Year VIN Owner How it works ID Manufacturer Model Year VIN Owner 1 Tesla Model S 2012 12345678 9 Traci Russell INSERT INTO cars << { 'Manufacturer': 'Tesla', 'Model': 'Model S', 'Year': '2012', 'VIN': '123456789', 'Owner': 'Traci Russel' } >> FROM cars WHERE VIN = '123456789' UPDATE owner = 'Ronnie Nash' FROM cars WHERE VIN = '123456789' UPDATE owner = 'Elmer Hubbard' J ID Version Start End Manufacturer Model Year VIN Owner 1 1 07/16/2012 NULL Tesla Model S 2012 123456789 Traci Russell current.cars C history.cars H ID Version Start End Manufacturer Model Year VIN Owner 1 1 07/16/2012 08/03/2013 Tesla Model S 2012 123456789 Traci Russell 1 2 08/03/2013 NULL Tesla Model S 2012 123456789 Ronnie Nash ID Version Start End Manufacturer Model Year VIN Owner 1 1 07/16/2012 08/03/2013 Tesla Model S 2012 123456789 Traci Russell 1 2 08/03/2013 09/02/2016 Tesla Model S 2012 123456789 Ronnie Nash 1 3 09/02/2016 NULL Tesla Model S 2012 123456789 Elmer Hubbard ID Manufacturer Model Year VIN Owner 1 Tesla Model S 2012 12345678 9 Ronnie Nash ID Manufacturer Model Year VIN Owner 1 Tesla Model S 2012 12345678 9 Elmer Hubbard INSERT cars ID:1 Manufacturer: Tesla Model: Model S Year: 2012 VIN: 123456789 Owner: Traci Russell Metadata: { Date:07/16/2012 } H (x) UPDATE cars ID:1 Owner: Ronnie Nash Metadata: { Date:08/03/2013 } H (x) UPDATE cars ID:1 Owner: Elmer Hubbard Metadata: { Date: 09/02/2016 } H (x)

Slide 66

Slide 66 text

Amazon Quantum Ledger Database (QLDB) Fully managed ledger database Track and verify history of all changes made to your application’s data Immutable Maintains a sequenced record of all changes to your data, which cannot be deleted or modified; you can to query and analyze the full history Cryptographically verifiable Uses cryptography to generate a secure output file of your data’s history Easy to use Easy to use, letting you use familiar database capabilities like SQL APIs for querying the data Highly scalable Executes 2–3x as many transactions than ledgers in common blockchain frameworks

Slide 67

Slide 67 text

Purpose-built

Slide 68

Slide 68 text

Benefits of purpose-built databases Better performance Better scale More functionality Easier to debug Independence between teams

Slide 69

Slide 69 text

Slide 70

Slide 70 text

More information

Slide 71

Slide 71 text

Additional resources Andy Jassy’s re:Invent 2017/2018 keynotes youtu.be/1IxDLeFQKPk?t=37m47s youtu.be/ZOIkOnW640A?t=3238 Werner Vogel’s blog: A one size fits all database doesn't fit anyone allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html aws.amazon.com/products/databases/ aws.amazon.com/blogs/database/

Slide 72

Slide 72 text

re:Invent videos youtu.be/hwnNbLXN4vA youtu.be/-pb-DkD6cWg AWS re:Invent 2018: Databases on AWS: The Right Tool for the Right Job (DAT205-R1) AWS re:Invent 2018: Building with AWS Databases: Match Your Workload to the Right Database (DAT301)