Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Comparison of Database Types

Mark Taylor
May 20, 2020
220

Comparison of Database Types

Often Developers will go with a commonly used relational database as it's what they're already comfortable with, however a relational database can often not be the best tool for the job. Learn instead how to choose the best database type for your project's requirements. Candidates for discussion will be Relational, NoSQL, Graph and Time Series databases.

Mark Taylor

May 20, 2020
Tweet

Transcript

  1. A brief history of databases • 1725 – first punch

    card system to control a loom • Flat file - fine for small amounts of data e.g. XML, CSV, JSON • 1960s - Hierarchical – tree structure e.g. filesystem, DNS • 1970 - Relational – structured table based data • 2000 – NoSQL – 1970+ Key value (rose in popularity 2000+) – 2009 - Document – 2000s – Graph – 2000s – Column family – 2010s – Time series
  2. Relational Databases - Pros • Tried and tested, reliable •

    Strong consistency - ACID (Atomicity, Consistency, Isolation, Durability) • SQL is standardised and easy to get started with • Developers tend to be already familiar • Excellent range of tools and libraries available • Excellent driver support for all languages • Costs and risks generally understood • Basically, they’re a safe bet!
  3. Relational Databases - Cons • Speed – The overhead of

    relational limits query speed. Queries with many joins or very large tables have poorer performance. • Scaling – limited to the resources of the server, no partitioning due to consistency of the data model. Managing large amounts of data is difficult. • Cost – CPU is expensive, storage is cheap. Costs start to increase exponentially at a certain point due to server limits. • Requires predefined schema, which is more cumbersome to update.
  4. Featured Relational DB - MySQL • 2nd ranked below Oracle

    on DB Engines • Affordable! • Great community • Excellent library and driver support • Open source
  5. NoSQL Databases • High performance, non relational databases • Flexible

    – data is generally stored in a flexible structure • Scalable – partitioned horizontal scaling • Resilient – high availability is a core design factor • Fast – due to the way data is stored and queried • Around since the 2000s
  6. Types of NoSQL Database • Document – data is stored

    heirarchially in JSON documents • Key value – data is stored in key-value pairs • Graph – data is stored as a graph with nodes, edges and properties • Wide Column – Related data is stored as a set of nested key-value pairs in a single column
  7. CAP Theorem • Distributed data systems always offer a trade-off

    between consistency, availability and partition tolerance. • Consistency – each node in the cluster responds with the most up to date data • Availability – each node returns an immediate response, even if it’s not the most recent data • Partition Tolerance – guarantees the system will continue to operate even if one of the nodes in the cluster fails
  8. NoSQL – Key Value Databases • Blazingly fast storage and

    retrieval • Extremely scalable • Made up of two data items which are linked • Data stored is considered to be opaque to the database, no structured querying • Typically used for caching, session store, carts • No defined schema • Basic querying
  9. Featured Key Value Store - Redis • Open source (BSD

    licence) • Has many useful data structures like hashes, lists, sets • Supports master-slave data replication with failover • Supports transactions via a command queue • Can persist data on disk • Has a high availability offering via Redis Sentinel and automatic partitioning with Redis Cluster • Great client library support
  10. NoSQL - Document Databases • Store documents using JSON, XML,

    YAML or BSON (binary JSON) • Very flexible structure – optional fields • Can use indexes for faster performance • Sub-class of key-value store • Documents can be queried, unlike key/value • Allows partial updates of documents • Some implementations offer basic joins
  11. Featured Document DB - DynamoDB • Fully managed SAAS •

    Awesome for server-less applications • High performance – powers Amazon, Netflix, Lyft, Medium • Single-digit millisecond latency • Multi purpose – key value and document data models • ACID transactions • Flexible modelling • Flexible billing – on demand vs provisioned • Real time processing with DynamoDB streams
  12. Learning DynamoDB • Check out Alex Debrie! - https://twitter.com/alexbdebrie •

    https://www.dynamodbguide.com/ • https://www.dynamodbbook.com/ • AWS re:Invent – DynamoDB Deep Dive https://www.youtube.com/watch?v=HaEPXoXVf2k
  13. Graph Databases • Kind of like document databases, but with

    relationships! • Nodes represent entities, edges represent relationships • Models are simpler and more expressive than relational • Flexible properties • Very flexible query languages e.g. Cypher, Gremlin • Relationships should be first class citizens • Joins are very expensive! Graph traversal is fast as relationships are known. • Useful for highly connected data, e.g. Facebook friends.
  14. Featured Graph Database - Neo4J • Awesome Cypher query language,

    very easy to get started • Great community and learning resources – easy to learn • Offers a open source community version • Offers Enterprise version with clustering and HA • ACID compliance • Strong driver support • Useful query browser
  15. Column Family Databases • A column “family” is like a

    table in relational database • Very high performance and highly scalable • Efficient at data compression and partitioning • Often used for Big Data, IoT due to fast insert and query speeds • Used by Spotify to store user profile attributes, artists, songs
  16. Column Family Databases • Typically contains a row key as

    the first column, which uniquely identifies that row. The following columns then contain a column key, which uniquely identifies that column within the row
  17. NoSQL – Time Series Databases • Optimised for time-stamped or

    time series data through associated pairs of time(s) and value(s) • Useful for high velocity logging metrics e.g. sensors, monitoring, clicks, stock trading • Optimised for measuring change over time and querying through aggregations • You’ve probably used it if you’ve used a product like New Relic or Graphana with Prometheus