Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NoSQL Introduction

NoSQL Introduction

Understand what NoSQL is and what it is not. Why would you want to use NoSQL within your project and which NoSQL database would you utilize. Explore the relationships between NoSQL and RDBMS. Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database(MongoDB), Key-Value Store, Graph Database, and Columnar databases or combinations of the above.

John Kerley-Weeks

May 07, 2014
Tweet

More Decks by John Kerley-Weeks

Other Decks in Technology

Transcript

  1. NoSQL Introduction • Understand what NoSQL is and what it

    is not. • Why would you want to use NoSQL within your project and which NoSQL database would you utilize? • Explore the relationships between NoSQL and RDBMS. • Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database (MongoDB), Key- Value Store, Graph Database, and Columnar databases or combinations of the above. Thursday May 8th 2014, 3:00pm-3:50pm SB 139 Slides and Feedback at: http://joind.in/11012 2
  2. NoSQL • History • Popular NoSQL Databases • NoSQL Database

    Comparisons • Terminology • Consistency, Replication, Performance • NoSQL Implementation CRUD Operations 3 Slides and Feedback at: http://joind.in/11012
  3. NoSQL Introduction • NoSQL is a commonly adopted misnomer •

    Typically does not use ANSI SQL – SQL = Structured Query Language – Structure exists but is more Flexible – Queries are performed – Language is closer to Programming Languages 4 Slides and Feedback at: http://joind.in/11012
  4. NoSQL History • 1998 Carlo Strozzi Command Line Database •

    June 11, 2009 Meetup – Open Source, Distributed, Non-Relational DB – Eric Evans (Rackspace) – Johan Oskarsson (Last.fm) 6
  5. NoSQL History • Bad name, but it stuck! • Not

    a definitive term • Generally, Newer databases solving new and different problems • Not Only SQL http://blog.sym- link.com/2009/10/30/nosql_whats_in_a_name.html 8 Slides and Feedback at: http://joind.in/11012
  6. NoSQL Origination • Problems not solved by RDBMs • Limitations

    of RDBMs, not SQL 9 Slides and Feedback at: http://joind.in/11012
  7. Most Popular NoSQL • MongoDB - Document Store • Cassandra

    – Wide Column Store • Solr – Search Engine • Redis – Key-value store • Hbase – Wide Column Store • Memcached – Key-value Store • CouchDB – Document Store • Neo4j – Graph Database • Riak – Key-value Store • SimpleDB – Key-value Store within Amazon Cloud 11 Slides and Feedback at: http://joind.in/11012
  8. NoSQL “Bleeding Edge” • Several solutions are mature and stable

    enough to run large scale production environments • Not all permutations have been considered • Several (but not all) optimization strategies have been published • Crucial elements such as Security may be a secondary add-on in favor of performance. 14
  9. NoSQL “Bleeding Edge” Sun Microsystems csh man page: “Although robust

    enough for general use, adventures into the esoteric periphery of the C shell may reveal unexpected quirks.” 15 Slides and Feedback at: http://joind.in/11012
  10. NoSQL Comparison 16 Take note of patterns: Recent Release, Open

    Source, Utilized at High-Volume sites Variety of Formats: Key-Value, Wide-Column, Document, Graph http://db-engines.com/en/ranking
  11. NoSQL Database Types 17 • Key-Value • Column Oriented Databases

    (Columnar) • Graph • Document • Search Database - Solr • Key-Value Web Optimization - Memcached
  12. Key-Value Stores 18 Key Value code:java 17.316% Lowest rank on

    Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html Code bucket drink bucket
  13. Document Oriented Database 21 { "_id" : 1, "name" :

    { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }
  14. Document Oriented Database 22 { "facutly" : [ { "_id"

    : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }, { "_id" : 2, "name" : { "first" : "David", "last" : "Williams" }, "contribs" : [ "C#", "Java", "PHP" ], "awards" : [ { "award" : "Sherman Peabody Award II", "year" : 2095, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IX", "year" : 2090, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IV", "year" : 2093, "location" : "Paris", "by" : "Intergalactic Continuum" } ] } ] }
  15. NoSQL Comparison 25 No ANSI SQL Standards, No Predefined Schemas,

    Replication, Eventual Consistency, Rarely Foreign Keys, Data Types not required Newer Concepts: Sharding, REST API, JSON, MapReduce
  16. NoSQL Characteristics 26 No Predefined Schemas • May insert data

    without creating a table • Schema Versions (v1.5, v1.6, v1.7,…) Rarely Foreign Keys • No JOIN operations • Relationships are not automatically maintained Eventual Consistency • Old copies being replaced by new records • Inconsistent data until all replacements are complete
  17. Map Reduce 30 Divides work across distributed systems Parallel processing

    of large data sets Divide – Conquer – Consolidate Often Implement by defining Map and Reduce classes or functions 1+2+3+6+7+8+9=? Google’s MapReduce Programming Model – Revisited Ralf Lammel, Microsoft, 2008 http://www.sciencedirect.com/science/article/pii/S0167642307001281 2 6 8 1 7 3 9 16 20 36
  18. JSON 31 Subset of JavaScript Object Notation Similarities to XML

    method for representing data Syntax Name : Value pairs “salary” : “125000” Values are: number, string, Boolean, array, object, or NULL Objects can store Objects, Arrays can store Arrays Separate pairs by commas “salary” : “125000”, “gender” : “male” Curly braces denote objects { “salary” : “125000”, “gender” : “male” } Square brackets denote arrays “phone” : [”555-1212”, ”555-3344”] “phone” : [ {“office” : ”555-1212”}, {“mobile” : ”555-3344”} ]
  19. JSON Example 32 { "_id" : 1, "name" : {

    "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] } http://www.mongodb.com/json-and-bson
  20. REST API 33 CRUD (Create, Read, Update, Delete) operations through

    the web HTTP Methods GET (List/Read) POST(Update) PUT(Create) DELETE(Delete) EXAMPLE API http://www.blinksale.com/api/ List/Read Data via HTTP GET to http://www.blinksale.com/invoices http://www.blinksale.com/invoices/invoice_id/payments http://www.blinksale.com/invoices/?start=2006&end=2008 Returns XML results
  21. REST API 34 Update data via HTTP POST to http://www.blinksale.com/invoices/invoice_id/payments

    <?xml version="1.0" encoding="UTF-8"?> <payment xmlns="http://www.blinksale.com/api"> <amount>1000.00</amount> <date>2006-09-27</date> </payment> REST = REpresentational State Transfer Twitter Example: https://dev.twitter.com/docs/api/1.1 (GET and POST only)
  22. Database SELECT Statements 35 Oracle SELECT * FROM relationships MongoDB

    db.relationships.find() Cassandra (CQL) SELECT * FROM relationships Slides and Feedback at: http://joind.in/11012
  23. Database SELECT Statements 36 Redis – Key-Value Store SMEMBERS relationships

    Riak – Key-Value Store with REST API (+ proprietary drivers) http://localhost:8091/riak/relationships/likes Neo4j (Cypher) MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Slides and Feedback at: http://joind.in/11012
  24. JOINS without Foreign Keys 37 original_id = ObjectId() db.employer.insert({ "_id":

    original_id, "name": "Broadway Tech", "url": "bc.example.net" }) db.people.insert({ "name": "Erin", “employer_id": original_id, "url": "bc.example.net/Erin" }) “Erin” works at “Broadway Tech” One of the employees at “Broadway Tech” is “Erin” http://docs.mongodb.org/manual/reference/database-references/#document-references
  25. ACID, BASE, CAP, CPR 1979 Gray, 1983 Reuter & Härder

    - ACID Atomic, Consistent, Isolated, Durable Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops 1997 Brewer - BASE Basically Available, Soft-state, Eventually consistent 2000 Brewer – CAP (Pick Two) Consistency, Availability, Partition Tolerance CPR (Pick Two) Consistency, Performance, Replication/Redundancy Contrived - Stretch Definitions 39
  26. CPR 41 Consistency Performance A D C B Spread data

    across storage or computer Redundancy
  27. SQL CRUD 45 Create INSERT INTO table (column1, column2) VALUES

    (9, 'string'); Read SELECT column1, column2 FROM table; Update UPDATE table SET column2 = 'text' WHERE column1= 9 Delete DELETE FROM table WHERE column2='text'
  28. Key-Value Stores 46 Key Value code:java 17.316% Lowest rank on

    Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html code bucket drink bucket
  29. Redis CRUD 47 http://redis.io/commands http://redis.io/topics/data-types-intro http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/ Redis is an in-memory

    Key-Value Store which stores: Strings, Hashes, Lists, Sets, or Ordered sets Strings: values of strings are concrete and can not be altered SET user:jim {lastname: ‘Mathews’, salary: 125000} GET user:jim Hashes: allows modification and retrieval of individual values HSET user:jim salary 125000 HSET user:jim lastname Mathews HGET jim salary
  30. Redis CRUD 48 Lists: One-dimensional array with insert, append, pop,

    and push Redis.lpush(‘users:employees’, ‘user:jim’) redis.mget(redis.lrange(‘users:employess’,0,5)) Sets: lists with no duplicate values (SADD = Set Add) SADD users:employees jim SADD users:employees krishna SMEMBERS employees Sorted Sets: are sets with an added sorting value ZADD users:employees 125000 jim ZADD users:employees 157000 Krishna ZRANGEBYSCORE users:employees 100000 180000
  31. Riak CRUD 49 Easy to install and configure test cluster

    REST Queries Create/PUT a “course:CIS2120” row curl –v –X PUT http://localhost:8091/riak/course/CIS2120 \ -H “Content-Type: application/json” \ -d ‘{“name”:”Database Coding”, “days”:”MWF”}’ Read/GET the value for “course:CIS2120” curl –X GET http://localhost:8091/riak/course/CIS2120 curl http://localhost:8091/riak/course/CIS2120 Key Value course:CIS2120 {“name”:”Database Coding”, “days”:”MWF”}
  32. Riak Links 50 Riak can link on value to key:value

    to another with a relationship curl –v –X PUT http://localhost:8091/riak/student/sorensen \ -H “Content-Type: application/json” \ -H “Link: </riak/course/CIS2120>; riaktag=\”enrolled\”” \ -d ‘{“firstname”:”Conner”}’ This does not automatically create a link from “sorensen“ to “CIS2120”
  33. Neo4j – Graph Database 52 http://www.neo4j.org/learn/try http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN

    n,r,m Matches a person “n” that likes person “m” https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125
  34. Neo4j CRUD 53 Must try dragging nodes at: http://www.neo4j.org/learn/try MATCH

    (user {name:“Bill"})-[:KNOWS]->(colleague) WHERE colleague.employer=“LinkedIn” RETURN user,colleague ORDER BY colleague.name LIMIT 10 http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Matches a person “n” that likes person “m” MATCH (n)-[r]->(m) RETURN n,r,m Matches any relationship between “n” and “m” http://www.neo4j.org/learn/cypher
  35. Neo4j (LUKE {name:"Luke Skywalker"}), (HAN {name:"Han Solo"}), (LEIA {name:"Princess Leia

    Organa"}), (OBI_WAN {name:"Obi Wan Kenobi"}), (YODA {name : "Yoda"}), (VADER {name:"Darth Vader"}), (C3PO {name:"C3PO", droid:true}), (R2D2 {name:"R2D2", droid:true}), (CHEWBACCA {name:"Chewbacca"}), (TATOOINE {name:"Tatooine", distance:13184}), (DAGOBAH {name:"Dagobah", distance:15407}), (JEDI {name:"Jedi"}), (SITH {name:"Sith"}), (REBELLION {name:"Rebellion"}), (EMPIRE {name:"Empire"}), (DARK_SIDE {name:"Dark Side"}), (LIGHT_SIDE {name:"Light Side"}), … (LUKE)-[:FRIENDS_WITH]->(HAN), (LUKE)-[:FRIENDS_WITH]->(LEIA), (HAN)-[:FRIENDS_WITH]->(CHEWBACCA), (YODA)-[:TEACHES]->(OBI_WAN), (YODA)-[:TEACHES]->(LUKE), (OBI_WAN)-[:TEACHES]->(LUKE), (OBI_WAN)-[:KNOWS]->(VADER), (LUKE)-[:KNOWS]->(R2D2), (R2D2)-[:KNOWS]->(C3PO), (LUKE)-[:LIVED_ON]->(TATOOINE), (HAN)-[:LIVED_ON]->(CORELLIA), (LEIA)-[:LIVED_ON]->(ALDERAAN), (YODA)-[:LIVED_ON]->(DAGOBAH), (LUKE)-[:DEVOTED_TO]->(JEDI), (LUKE)-[:DEVOTED_TO]->(REBELLION), (LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE), (VADER)-[:DEVOTED_TO]->(SITH), (VADER)-[:DEVOTED_TO]->(EMPIRE), (VADER)-[:DEVOTED_TO]->(DARK_SIDE), (LEIA)-[:DEVOTED_TO]->(REBELLION), (HAN)-[:DEVOTED_TO]->(REBELLION) … https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125 MATCH y-[r]-other WHERE y.name='Yoda' return y.name, type(r), other.name 54
  36. Google BigTable • White Paper published in 2006 • Many

    databases based upon BigTable • 13 pages, readable for many non-techies • Insightful into the early days of NoSQL http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf 55
  37. Hbase 56 Large-Scale, Column-oriented database Consistency, Performance, Fault-Tolerant, ACID via

    Locking Tables are created before initial data is added Tables have row keys are indexed row identifier strings column families – contain one or more columns timestamp for version control
  38. Hbase 57 Row key is a unifier for column families.

    If row does insert values in a column family no disk space is utilized within the column family. Keys are identified by column_family:column_name text: revision:author revision:comment Write-Ahead Logging (WAL) similar to file system journaling
  39. Hbase CRUD 58 create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’ create ‘wiki’, ‘text’,

    ‘revision’ put ‘wiki’, ‘first page’, ‘text:’, ‘…’ put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’ get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’] delete ‘wiki’, ‘first page’, ‘revision:author’ scan ‘wiki’ = SELECT * FROM wiki Seven Databases in Seven Weeks, Redmond & Wilson 2012
  40. MongoDB Document Store 59 db.courses.insert({ name: “CIS2120”, description: “Database Coding”,

    instructor: { name: “David Williams”, email: “[email protected]” } instructor2: { name: “John Kerley-Weeks”, office: “JQL230” } subjects: [“Python”, “MongoDB”, “3NF”, “ETL”, “Star Schema”] })
  41. MongoDB vs SQL 60 http://docs.mongodb.org/manual/reference/sql-comparison/ MongoDB -> SQL Terminology Collection

    -> Table Document -> Row Column -> Field db.courses.find() = SELECT * FROM courses db.courses.count() = SELECT COUNT(*) FROM courses db.courses.find({name: “CIS2120”})
  42. MongoDB Simple Database 61 http://media.mongodb.org/zips.json {"city": "ACMAR", "loc": [-86.51557, 33.584132],

    "pop": 6055, "state": "AL", "_id": "35004"} {"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"} {"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"} {"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"} {"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"} {"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"} {"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"} {"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"} {"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"} {"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"} {"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"} {"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"} {"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"} {"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"} {“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”} {“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”, “_id”: “95961”}
  43. Cassandra Characteristics 62 Scalable, High-availability Wide-columnar datastore Peer-to-peer rather than

    master-slave clusters Tunable consistency can read/write to a single node, quorum of nodes or all nodes Recommends static and dynamic column families Static column families have contain pre-defined columns Contact Info: phone, address, email, web Dynamic families have variable numbers of similar columns Students enrolled in a course
  44. Cassandra CRUD 63 http://www.datastax.com/docs/0.8/references/cql http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt CREATE TABLE course ( name

    text PRIMARY KEY, instructor text, maxstudents int ) INSERT INTO course (name, instructor, maxstudents) VALUES (‘CIS2120’, ‘Williams’, 28) UPDATE course SET maxstudents=26 WHERE name=‘CIS2120’ SELECT name, instructor FROM course WHERE maxstudents > 20
  45. Cassandra CRUD 64 No JOIN operations or FOREIGN KEYS CREATE

    TABLE people ( name text PRIMARY KEY, email text, phones map<text, text> ) INSERT INTO people (name, email, phones) VALUES (‘John Weeks’, ‘[email protected]’, {‘mobile’ : ‘555-1212’, ‘office’ : ‘797-7133’, ‘fax’ : ‘555-1212’}) UPDATE people SET phones[‘office’] = ‘555-1212’ WHERE email = ‘[email protected]