Choosing The Right Database For The Right Applications in 2019 [AWS Community Day @ Sofia]

Alex Casalboni Technical Evangelist, AWS @alex_casalboni Choosing The Right Database
For The Right Applications in 2019 @ 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved

About me • Software Engineer & Web Developer • Worked
in a startup for 4.5 years • ServerlessDays Organizer • AWS Customer since 2013

1970 1980 1990 2000 Oracle DB2 SQL Server MySQL PostgreSQL
DynamoDB Redis MongoDB Elasticsearch Neptune Cassandra Access Aurora 2010 Timestream QLDB Amazon DocumentDB

Data categories and common use cases Relational Key value Document
In-memory Graph Search Time series Ledger Referential integrity, ACID transactions, schema- on-write Low-latency, key lookups with high throughput and fast ingestion of data Indexing and storing documents with support for query on any attribute Microseconds latency, key- based queries, and specialized data structures Creating and navigating data relations easily and quickly Lift and shift, EMR, CRM, finance Real-time bidding, shopping cart, social Content management, personalization, mobile Leaderboards, real-time analytics, caching Fraud detection, social networks, recommendatio n engines Indexing and searching semistructured logs and data Product catalog, help, and FAQs, full text Collect, store, and process data sequenced by time IoT applications, event tracking Complete, immutable, and verifiable history of all changes to application data Systems of record, supply chain, healthcare, registrations, financial

Purpose built The right tool for the right job allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html

AWS: Purpose-built databases Relational Key value Document In-memory Graph Search
Amazon DynamoDB Amazon Neptune Amazon RDS Aurora Commercial Community Amazon ElastiCache Amazon Elasticsearch Service Amazon DocumentDB Time series Ledger Amazon Timestream Amazon Quantum Ledger Database Memcached Redis

400,000+ customers using AWS databases and analytics

Two fundamental areas of focus “Lift and shift” existing apps
to the cloud Quickly build new apps in the cloud

AWS Database Migration Service (AWS DMS) M i g r
a t i n g d a t a b a s e s t o A W S Migrate between on-premises and AWS Migrate between databases Automated schema conversion Data replication for migration with zero downtime 100,000+ databases migrated

Modern apps create new requirements Users: 1 million+ Data volume:
TB–PB–EB Locality: Global Performance: Milliseconds–microseconds Request rate: Millions Access: Web, mobile, IoT, devices Scale: Up-down, Out-in Economics: Pay for what you use Developer access: No assembly required Social media Ride hailing Media streaming Dating

User search history: Amazon DynamoDB • Massive data volume •
Need quick lookups for personalized search Session state: Amazon ElastiCache • In-memory store for sub-millisecond fetch Relational data: Amazon RDS • Referential integrity • Primary transactional database

• DynamoDB: 31B items tracking language exercises • Aurora: Primary
transactional db (user data) • ElastiCache: Instant access to common words and phrases 300M total users 7B exercises per month

Relational data

Traditional SQL • TCP based wire protocol • Well Known,
lots of uses • Common drivers (JDBC) • Frequently used with ORMs • Scale UP individual instances • Scale OUT with read replicas • Sharding at application level • Lots of flavors but very similar language • Joins ! INSERT INTO users (id, first_name, last_name) VALUES (1, ‘Alex’, ‘Casalboni’); SELECT col1, col2, col3 FROM table1 WHERE col4 = 1 AND col5 = 2 GROUP BY col1 HAVING count(*) > 1 ORDER BY col2

Relational model Data model • Data is stored in rows
and tables • Data is normalized • Strict schema • Relationships established via keys enforced by the system • Data accuracy and consistency • Complex queries to extract and reshape data on-demand Patient * Patient ID First Name Last Name Gender DOB * Doctor ID Visit * Visit ID * Patient ID * Hospital ID Date * Treatment ID Medical Treatment * Treatment ID Procedure How Performed Adverse Outcome Contraindication Doctor * Doctor ID First Name Last Name Medical Specialty * Hospital Affiliation Hospital * Hospital ID Name Address Rating

Patient * Patient ID First Name Last Name Gender DOB
* Doctor ID Visit * Visit ID * Patient ID * Hospital ID Date * Treatment ID Medical Treatment * Treatment ID Procedure How Performed Adverse Outcome Contraindication Doctor * Doctor ID First Name Last Name Medical Specialty * Hospital Affiliation Hospital * Hospital ID Name Address Rating Query model: SQL SELECT d.first_name, d.last_name, count(*) FROM visit as v, hospital as h, doctor as d WHERE v.hospital_id = h.hospital_id AND h.hospital_id = d.hospital AND v.t_date > date_trunc('week’, CURRENT_TIMESTAMP - interval '1 week') GROUP BY d.first_name, d.last_name; Relational model

Amazon Relational Database Service (RDS) Managed relational database service with
a choice of six popular database engines Easy to administer Available and durable Highly scalable Fast and secure No need for infrastructure provisioning, installing, and maintaining DB software Automatic Multi-AZ data replication; automated backup, snapshots, failover Scale database compute and storage with a few clicks with no app downtime SSD storage and guaranteed provisioned I/O; data encryption at rest and in transit

Amazon Aurora ascendant: How we designed a cloud-native relational database
allthingsdistributed.com

Amazon Aurora MySQL and PostgreSQL-compatible relational database built for the
cloud Performance and availability of commercial-grade databases at 1/10th the cost Performance and scalability Availability and durability Highly secure Fully managed 5x perf of standard MySQL 3x perf of standard PostgreSQL Up to 15 read replicas Self-healing storage 6 copies of data across 3 AZ Continuous backup to S3 Network isolation, encryption at rest/transit No hardware provisioning, software patching, setup, configuration, or backups

NoSQL: Key-Value and Documents

SQL vs NoSQL Optimized for storage Optimized for compute Normalized/relational
Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale SQL NoSQL

Key-value data • Simple key-value pairs • Partitioned by keys
• Resilient to failure • High throughput, low- latency reads and writes • Consistent performance at scale Table 1 … … Partitions … Highly partitionable data

Gamers Primary Key Attributes Gamer Tag Type Hammer57 Rank Level
Points Tier 87 4050 Elite Status Health Progress 90 30 Weapon Class Damage Range Taser 87% 50 FluffyDuffy Rank Level Points Tier 5 1072 Trainee Status Health Progress 37 8 // Status of Hammer57 GET { TableName:"Gamers", Key: { "GamerTag":"Hammer57", "Type":"Status” } } // Return all Hammer57 Gamers GamerTag = :a :a Hammer57 Key-value data

Amazon DynamoDB Fast and flexible key value database service for
any scale Performance at scale Serverless Comprehensive security Global database for global apps Single-digit millisecond response times at any scale; Virtually unlimited throughput No server provisioning, software patching/upgrades; automatic scaling and continuous data backups Encrypts all data by default; fully integrated with AWS IAM for robust security Build global apps by easily replicating tables across multiple regions

Amazon DynamoDB Data Structure Table Items Attributes Partition Key Sort
Key Mandatory Key-value access pattern Determines data distribution Optional Model 1:N relationships Enables rich query capabilities All items for key ==, <, >, >=, <= “begins with” “between” “contains” “in” sorted results counts top/bottom N values

DynamoDB Schema and Queries • Connects over HTTP • Global
Secondary Indexes and Local Secondary Indexes • Speed up queries with DAX • Global tables (multi-region-multi- master) • Transactions across multiple tables • Change Streams • Rich query language with expressions • Provision read and write capacity units separately • Also supports pay-per-request model import boto3 votes = boto3.resource("dynamodb").Table("votes") resp = votes.update_item( Key={"name": editor}, UpdateExpression="ADD votes :incr", ExpressionAttributeValues={":incr": 1}, ReturnValues="ALL_NEW" )

DynamoDB Advancements over the last 24 months VPC endpoints April
2017 Auto scaling June 2017 DynamoDB Accelerator (DAX) April 2017 Time To Live (TTL) February 2017 Global tables On-demand backup Encryption at rest November 2017 November 2017 November 2017 Point-in-time recovery March 2018 SLA June 2018 99.999% SLA August 2018 Adaptive capacity ACID November 2018 Transactions November 2018 On-demand

Document databases • Data is stored in JSON-like documents •
Documents map naturally to how humans model data • Flexible schema and indexing • Expressive query language built for documents (ad hoc queries and aggregations) JSON documents are first-class objects of the database { id: 1, name: "sue", age: 26, email: "[email protected]", promotions: ["new user", "5%", "dog lover"], memberDate: 2018-2-22, shoppingCart: [ {product:"abc", quantity:2, cost:19.99}, {product:"edf", quantity:3, cost: 2.99} ] }

!= == Evolution of document databases JSON became the de
facto data interchange format Friction when converting JSON to the relational model Object-relational mappings (ORMs) were created to help with this friction Document databases solved the problem (Client) (App) (Database) JSON Relational JSON

Use cases for document data User profiles { id: 181276,
username: "sue1942", name: {first: "Susan", last: "Benoit"} } { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"} } { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"}, ExploidingSnails: { hi_score: 3185400, global_rank: 5139, bonus_levels: true }, promotions: ["new user","5%","snail lover"] } { id: 181276, username: "sue1942", name: {first: "Susan", last: "Benoit"}, ExploidingSnails: { hi_score: 3185400, global_rank: 5139, bonus_levels: true } }

Use cases for document data Mobile Retail and marketing User
profiles Catalog Content management Personalization

Challenges of existing document databases Hard to manage Hard to
scale Hard to secure Hard to back up

Amazon DocumentDB Fast, scalable, and fully managed MongoDB-compatible database service
Fast Scalable Fully managed MongoDB compatible Millions of RPS with ms latency 2x throughput of MongoDB Separation of compute and storage Up to 15 read replicas Auto patching and quick setup Automatic backups Compatible with MongoDB 3.6 Same SDKs and tools

In-Memory stores

In-memory • No persistence, in-memory • Microsecond performance • Simple
commands for manipulating in memory data structures • Strings, hashes, lists, sets, and sorted sets Database Memory (buffer pool) Disk Query processor Get/Put APIs Memory Milliseconds to microseconds (10x faster) Storage engine

In-memory ops set a "hello" // Set key "a" with
a string value and no expiration OK get a // Get value for key "a" "hello" get b // Get value for key "b" results in miss (nil) set b "Good-bye" EX 5 // Set key "b" with a string value and a 5 second expiration "Good-bye" get b // Get value for key "b" "Good-bye" // wait >= 5 seconds get b (nil) // key has expired, nothing returned

Amazon ElastiCache Redis and Memcached compatible, in-memory data store and
cache Redis and Memcached compatible Extreme performance Secure and reliable Easily scalable Fully compatible with open source tools Redis and Memcached In-memory data store and cache Microsecond response time Network isolation, encryption, HIPAA, PIC Multi-AZ, automatic failover Scales reads and write Both sharding and replicas

Searching data

Search: Full text search

Search The bright blue butterfly hangs on the breeze It’s
best to forget the great sky and to retire from every wind. Under blue sky, in bright sunlight, one need search around. Document 1 Document 2 Document 3 a and around every for from in is it not on one the to under ID Term Document 1 best 2 2 blue 1, 3 3 bright 1,3 4 breeze 1 5 butterfly 1 6 forget 2 7 great 2 8 hangs 1 9 need 3 10 retire 2 11 search 3 12 sky 2, 3 13 wind 2 Inverted index Stopword list

Search _search?q=house "hits": { "total": 85, "max_score": 6.6137657, "hits": [{
"_index": "movies", "_type": "movie", "_id": "tt0077975", "_score": 6.6137657, "_source": { "directors": [ "John Landis" ], "release_date": "1978-07-27T00:00:00Z", "rating": 7.5, "genres": [ "Comedy", "Romance" ], "image_url": "http://ia.jpg "plot": "At a 1962 College, Dean Vernon Wormer…", "title": "Animal House", "rank": 527, "running_time_secs": 6540, "actors": [ "John Belushi","Karen Allen","Tom Hulce" ], "year": 1978, "id": "tt0077975" } },

Amazon Elasticsearch Service Fully managed, reliable, and scalable Elasticsearch service
Easy to use Scalable Highly available Secure Deploy a production- ready Elasticsearch cluster in minutes Resize your cluster with a few clicks or a single API call Replicate across AZs, with monitoring and automated self-healing Deploy into VPC and restrict access using security groups and IAM policies

Graph data

Graph data • Relationships are first-class objects • Data is
modeled and queried as a graph • Vertices connected by Edges • Creating and navigating relations between data easily and quickly Purchased Purchased Follows Purchased Knows Product Sport Follows

Different approaches for highly connected data Purpose-built for a business
process Purpose-built to answer questions about relationships

Graph use cases Social networking Life sciences Network & IT
operations Fraud detection Recommendation engines Knowledge graphs

Product recommendation to a user PURCHASED PURCHASED PURCHASED PURCHASED PURCHASED
KNOWS BOOK #1 BOOK #2 PURCHASED BOOK #3 Graph example

Amazon Neptune Fully managed graph database Fast Reliable Easy Open
Query billions of relationships with millisecond latency 6 replicas of your data across 3 AZs with fully backup and restore Build powerful queries with Gremlin and SPARQL Supports Apache TinkerPop & W3C RDF

Open Source Apache TinkerPop™ Gremlin Traversal Language W3C Standard SPARQL
Query Language R E S O U R C E D E S C R I P T I O N F R A M E W O R K ( R D F ) P R O P E R T Y G R A P H LEADING GRAPH MODELS AND FRAMEWORKS

Amazon Neptune high level architecture Bulk load from S3 Database
Mgmt.

Gremlin Traversal Language g.addV('person').property(id, 1).property('name', ‘alex') g.V('1').property(single, 'age’, 30) g.addV('person').property(id,
2).property('name', ‘julien') g.addE('knows').from(g.V('1')).to(g.V('2')).property('weight', 1.0) g.V().hasLabel('person') g.V().has('name', ‘alex').out('knows').valueMap() tinkerpop.apache.org/docs/current/reference/#graph-traversal-steps

SPARQL and RDF w3.org/TR/sparql11-query/

Retail demo application Demo application: 1. Available today 2. On
GitHub: /aws-samples/aws- bookstore-demo-app 3. One-click AWS CloudFormation deployment Search Indexing and searching semistructured logs and data Product search Amazon Neptune Amazon Elasticsearch Service Key-value High throughput, Low- latency reads and writes, endless scale Product catalog, shopping cart Graph Quickly and easily create and navigate relationships between data Product recommendation In-memory Query by key with microsecond latency Product leaderboard DynamoDB ElastiCache

Time series

Time series data What is time series data? What is
special about a time series database? A sequence of data points recorded over a time interval Time is the single primary axis of the data model t

Time series use cases Application events IoT sensor readings DevOps
data Humidity % Water vapor 91.0 94.0 86.0 93.0

Existing time-series databases Relational databases Difficult to maintain high availability
Difficult to scale Limited data lifecycle management Inefficient time series data processing Unnatural for time series data Rigid schema inflexible for fast moving time series data Building with time series data is challenging

Amazon Timestream (sign up for the preview) Fast, scalable, fully
managed time-series database 1,000x faster and 1/10 the cost of relational databases Collect data at the rate of millions of inserts per second (10M/second) Trillions of daily events Adaptive query processing engine maintains steady, predictable performance Time-series analytics Built-in functions for interpolation, smoothing, and approximation Serverless Automated setup, configuration, server provisioning, software patching

Ledgers & journals

Common customer use cases Ledgers with centralized control Healthcare Verify
and track hospital equipment inventory Manufacturers Track distribution of a recalled product HR & payroll Track changes to an individual’s profile Government Track vehicle title history

Challenges with building ledgers Adds unnecessary complexity Blockchain RDBMS –
audit tables Difficult to maintain Hard to use and slow Hard to build Custom audit functionality using triggers or stored procedures Impossible to verify No way to verify changes made to data by sys admins

ID Manufacturer Model Year VIN Owner ID Version Start End
Manufacturer Model Year VIN Owner How it works ID Manufacturer Model Year VIN Owner 1 Tesla Model S 2012 12345678 9 Traci Russell INSERT INTO cars << { 'Manufacturer': 'Tesla', 'Model': 'Model S', 'Year': '2012', 'VIN': '123456789', 'Owner': 'Traci Russel' } >> FROM cars WHERE VIN = '123456789' UPDATE owner = 'Ronnie Nash' FROM cars WHERE VIN = '123456789' UPDATE owner = 'Elmer Hubbard' J ID Version Start End Manufacturer Model Year VIN Owner 1 1 07/16/2012 NULL Tesla Model S 2012 123456789 Traci Russell current.cars C history.cars H ID Version Start End Manufacturer Model Year VIN Owner 1 1 07/16/2012 08/03/2013 Tesla Model S 2012 123456789 Traci Russell 1 2 08/03/2013 NULL Tesla Model S 2012 123456789 Ronnie Nash ID Version Start End Manufacturer Model Year VIN Owner 1 1 07/16/2012 08/03/2013 Tesla Model S 2012 123456789 Traci Russell 1 2 08/03/2013 09/02/2016 Tesla Model S 2012 123456789 Ronnie Nash 1 3 09/02/2016 NULL Tesla Model S 2012 123456789 Elmer Hubbard ID Manufacturer Model Year VIN Owner 1 Tesla Model S 2012 12345678 9 Ronnie Nash ID Manufacturer Model Year VIN Owner 1 Tesla Model S 2012 12345678 9 Elmer Hubbard INSERT cars ID:1 Manufacturer: Tesla Model: Model S Year: 2012 VIN: 123456789 Owner: Traci Russell Metadata: { Date:07/16/2012 } H (x) UPDATE cars ID:1 Owner: Ronnie Nash Metadata: { Date:08/03/2013 } H (x) UPDATE cars ID:1 Owner: Elmer Hubbard Metadata: { Date: 09/02/2016 } H (x)

Amazon Quantum Ledger Database (QLDB) Fully managed ledger database Track
and verify history of all changes made to your application’s data Immutable Maintains a sequenced record of all changes to your data, which cannot be deleted or modified; you can to query and analyze the full history Cryptographically verifiable Uses cryptography to generate a secure output file of your data’s history Easy to use Easy to use, letting you use familiar database capabilities like SQL APIs for querying the data Highly scalable Executes 2–3x as many transactions than ledgers in common blockchain frameworks

Purpose-built

Benefits of purpose-built databases Better performance Better scale More functionality
Easier to debug Independence between teams

AWS: Purpose-built databases Relational Key value Document In-memory Graph Search
Amazon DynamoDB Amazon Neptune Amazon RDS Aurora Commercial Community Amazon ElastiCache Amazon Elasticsearch Service Amazon DocumentDB Time series Ledger Amazon Timestream Amazon Quantum Ledger Database Memcached Redis

More information

Additional resources Andy Jassy’s re:Invent 2017/2018 keynotes youtu.be/1IxDLeFQKPk?t=37m47s youtu.be/ZOIkOnW640A?t=3238 Werner
Vogel’s blog: A one size fits all database doesn't fit anyone allthingsdistributed.com/2018/06/purpose-built-databases-in-aws.html aws.amazon.com/products/databases/ aws.amazon.com/blogs/database/

re:Invent videos youtu.be/hwnNbLXN4vA youtu.be/-pb-DkD6cWg AWS re:Invent 2018: Databases on AWS:
The Right Tool for the Right Job (DAT205-R1) AWS re:Invent 2018: Building with AWS Databases: Match Your Workload to the Right Database (DAT301)

Choosing The Right Database For The Right Appli...

Choosing The Right Database For The Right Applications in 2019 [AWS Community Day @ Sofia]

More Decks by Alex Casalboni

Other Decks in Programming

Featured

Transcript