Slide 1

Slide 1 text

Comparison of Database Types Mark Taylor | @willCodeForAle

Slide 2

Slide 2 text

A brief history of databases ● 1725 – first punch card system to control a loom ● Flat file - fine for small amounts of data e.g. XML, CSV, JSON ● 1960s - Hierarchical – tree structure e.g. filesystem, DNS ● 1970 - Relational – structured table based data ● 2000 – NoSQL – 1970+ Key value (rose in popularity 2000+) – 2009 - Document – 2000s – Graph – 2000s – Column family – 2010s – Time series

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Relational databases

Slide 7

Slide 7 text

Relational Databases

Slide 8

Slide 8 text

Relational Databases - Pros ● Tried and tested, reliable ● Strong consistency - ACID (Atomicity, Consistency, Isolation, Durability) ● SQL is standardised and easy to get started with ● Developers tend to be already familiar ● Excellent range of tools and libraries available ● Excellent driver support for all languages ● Costs and risks generally understood ● Basically, they’re a safe bet!

Slide 9

Slide 9 text

Relational Databases - Cons ● Speed – The overhead of relational limits query speed. Queries with many joins or very large tables have poorer performance. ● Scaling – limited to the resources of the server, no partitioning due to consistency of the data model. Managing large amounts of data is difficult. ● Cost – CPU is expensive, storage is cheap. Costs start to increase exponentially at a certain point due to server limits. ● Requires predefined schema, which is more cumbersome to update.

Slide 10

Slide 10 text

Featured Relational DB - MySQL ● 2nd ranked below Oracle on DB Engines ● Affordable! ● Great community ● Excellent library and driver support ● Open source

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

NoSQL Databases ● High performance, non relational databases ● Flexible – data is generally stored in a flexible structure ● Scalable – partitioned horizontal scaling ● Resilient – high availability is a core design factor ● Fast – due to the way data is stored and queried ● Around since the 2000s

Slide 13

Slide 13 text

Types of NoSQL Database ● Document – data is stored heirarchially in JSON documents ● Key value – data is stored in key-value pairs ● Graph – data is stored as a graph with nodes, edges and properties ● Wide Column – Related data is stored as a set of nested key-value pairs in a single column

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

CAP Theorem ● Distributed data systems always offer a trade-off between consistency, availability and partition tolerance. ● Consistency – each node in the cluster responds with the most up to date data ● Availability – each node returns an immediate response, even if it’s not the most recent data ● Partition Tolerance – guarantees the system will continue to operate even if one of the nodes in the cluster fails

Slide 16

Slide 16 text

NoSQL – Key Value Stores

Slide 17

Slide 17 text

NoSQL – Key Value Databases ● Blazingly fast storage and retrieval ● Extremely scalable ● Made up of two data items which are linked ● Data stored is considered to be opaque to the database, no structured querying ● Typically used for caching, session store, carts ● No defined schema ● Basic querying

Slide 18

Slide 18 text

Featured Key Value Store - Redis ● Open source (BSD licence) ● Has many useful data structures like hashes, lists, sets ● Supports master-slave data replication with failover ● Supports transactions via a command queue ● Can persist data on disk ● Has a high availability offering via Redis Sentinel and automatic partitioning with Redis Cluster ● Great client library support

Slide 19

Slide 19 text

NoSQL - Document Databases

Slide 20

Slide 20 text

NoSQL - Document Databases ● Store documents using JSON, XML, YAML or BSON (binary JSON) ● Very flexible structure – optional fields ● Can use indexes for faster performance ● Sub-class of key-value store ● Documents can be queried, unlike key/value ● Allows partial updates of documents ● Some implementations offer basic joins

Slide 21

Slide 21 text

Featured Document DB - DynamoDB ● Fully managed SAAS ● Awesome for server-less applications ● High performance – powers Amazon, Netflix, Lyft, Medium ● Single-digit millisecond latency ● Multi purpose – key value and document data models ● ACID transactions ● Flexible modelling ● Flexible billing – on demand vs provisioned ● Real time processing with DynamoDB streams

Slide 22

Slide 22 text

Learning DynamoDB ● Check out Alex Debrie! - https://twitter.com/alexbdebrie ● https://www.dynamodbguide.com/ ● https://www.dynamodbbook.com/ ● AWS re:Invent – DynamoDB Deep Dive https://www.youtube.com/watch?v=HaEPXoXVf2k

Slide 23

Slide 23 text

NoSQL - Graph Databases

Slide 24

Slide 24 text

Graph Databases ● Kind of like document databases, but with relationships! ● Nodes represent entities, edges represent relationships ● Models are simpler and more expressive than relational ● Flexible properties ● Very flexible query languages e.g. Cypher, Gremlin ● Relationships should be first class citizens ● Joins are very expensive! Graph traversal is fast as relationships are known. ● Useful for highly connected data, e.g. Facebook friends.

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Featured Graph Database - Neo4J ● Awesome Cypher query language, very easy to get started ● Great community and learning resources – easy to learn ● Offers a open source community version ● Offers Enterprise version with clustering and HA ● ACID compliance ● Strong driver support ● Useful query browser

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Column Family Databases

Slide 29

Slide 29 text

Column Family Databases ● A column “family” is like a table in relational database ● Very high performance and highly scalable ● Efficient at data compression and partitioning ● Often used for Big Data, IoT due to fast insert and query speeds ● Used by Spotify to store user profile attributes, artists, songs

Slide 30

Slide 30 text

Column Family Databases ● Typically contains a row key as the first column, which uniquely identifies that row. The following columns then contain a column key, which uniquely identifies that column within the row

Slide 31

Slide 31 text

NoSQL – Time Series Databases

Slide 32

Slide 32 text

NoSQL – Time Series Databases ● Optimised for time-stamped or time series data through associated pairs of time(s) and value(s) ● Useful for high velocity logging metrics e.g. sensors, monitoring, clicks, stock trading ● Optimised for measuring change over time and querying through aggregations ● You’ve probably used it if you’ve used a product like New Relic or Graphana with Prometheus

Slide 33

Slide 33 text

Mark Taylor | @willCodeForAle