Slide 1

Slide 1 text

NoSQL Introduction Slides and Feedback at: http://joind.in/11012

Slide 2

Slide 2 text

NoSQL Introduction • Understand what NoSQL is and what it is not. • Why would you want to use NoSQL within your project and which NoSQL database would you utilize? • Explore the relationships between NoSQL and RDBMS. • Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database (MongoDB), Key- Value Store, Graph Database, and Columnar databases or combinations of the above. Thursday May 8th 2014, 3:00pm-3:50pm SB 139 Slides and Feedback at: http://joind.in/11012 2

Slide 3

Slide 3 text

NoSQL • History • Popular NoSQL Databases • NoSQL Database Comparisons • Terminology • Consistency, Replication, Performance • NoSQL Implementation CRUD Operations 3 Slides and Feedback at: http://joind.in/11012

Slide 4

Slide 4 text

NoSQL Introduction • NoSQL is a commonly adopted misnomer • Typically does not use ANSI SQL – SQL = Structured Query Language – Structure exists but is more Flexible – Queries are performed – Language is closer to Programming Languages 4 Slides and Feedback at: http://joind.in/11012

Slide 5

Slide 5 text

NoSQL History 5 http://www.w3resource.com/mongodb/nosql.php

Slide 6

Slide 6 text

NoSQL History • 1998 Carlo Strozzi Command Line Database • June 11, 2009 Meetup – Open Source, Distributed, Non-Relational DB – Eric Evans (Rackspace) – Johan Oskarsson (Last.fm) 6

Slide 7

Slide 7 text

NoSQL History 7

Slide 8

Slide 8 text

NoSQL History • Bad name, but it stuck! • Not a definitive term • Generally, Newer databases solving new and different problems • Not Only SQL http://blog.sym- link.com/2009/10/30/nosql_whats_in_a_name.html 8 Slides and Feedback at: http://joind.in/11012

Slide 9

Slide 9 text

NoSQL Origination • Problems not solved by RDBMs • Limitations of RDBMs, not SQL 9 Slides and Feedback at: http://joind.in/11012

Slide 10

Slide 10 text

Most Popular Databases http://db-engines.com/en/ranking Ranking by: Web Content, Web Searches, Technical Discussion, Jobs, Resumes 10

Slide 11

Slide 11 text

Most Popular NoSQL • MongoDB - Document Store • Cassandra – Wide Column Store • Solr – Search Engine • Redis – Key-value store • Hbase – Wide Column Store • Memcached – Key-value Store • CouchDB – Document Store • Neo4j – Graph Database • Riak – Key-value Store • SimpleDB – Key-value Store within Amazon Cloud 11 Slides and Feedback at: http://joind.in/11012

Slide 12

Slide 12 text

12 NoSQL vs RDBMs Image Reference: http://blogs.the451group.com/information_management/2012/11/02/updated-database-landscape-graphic/

Slide 13

Slide 13 text

Reading Recommendations Great Overview of NoSQL: Seven Databases in Seven Weeks Eric Redmond and Jim Wilson 13

Slide 14

Slide 14 text

NoSQL “Bleeding Edge” • Several solutions are mature and stable enough to run large scale production environments • Not all permutations have been considered • Several (but not all) optimization strategies have been published • Crucial elements such as Security may be a secondary add-on in favor of performance. 14

Slide 15

Slide 15 text

NoSQL “Bleeding Edge” Sun Microsystems csh man page: “Although robust enough for general use, adventures into the esoteric periphery of the C shell may reveal unexpected quirks.” 15 Slides and Feedback at: http://joind.in/11012

Slide 16

Slide 16 text

NoSQL Comparison 16 Take note of patterns: Recent Release, Open Source, Utilized at High-Volume sites Variety of Formats: Key-Value, Wide-Column, Document, Graph http://db-engines.com/en/ranking

Slide 17

Slide 17 text

NoSQL Database Types 17 • Key-Value • Column Oriented Databases (Columnar) • Graph • Document • Search Database - Solr • Key-Value Web Optimization - Memcached

Slide 18

Slide 18 text

Key-Value Stores 18 Key Value code:java 17.316% Lowest rank on Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html Code bucket drink bucket

Slide 19

Slide 19 text

Column Oriented Database 19

Slide 20

Slide 20 text

Neo4j 20

Slide 21

Slide 21 text

Document Oriented Database 21 { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }

Slide 22

Slide 22 text

Document Oriented Database 22 { "facutly" : [ { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] }, { "_id" : 2, "name" : { "first" : "David", "last" : "Williams" }, "contribs" : [ "C#", "Java", "PHP" ], "awards" : [ { "award" : "Sherman Peabody Award II", "year" : 2095, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IX", "year" : 2090, "location" : "Paris", "by" : "Intergalactic Continuum" }, { "award" : "Sherman Peabody Award IV", "year" : 2093, "location" : "Paris", "by" : "Intergalactic Continuum" } ] } ] }

Slide 23

Slide 23 text

Document Oriented Database 23 http://chris.photobooks.com/json/ Slides and Feedback at: http://joind.in/11012

Slide 24

Slide 24 text

24 http://visualizer.json2html.com/

Slide 25

Slide 25 text

NoSQL Comparison 25 No ANSI SQL Standards, No Predefined Schemas, Replication, Eventual Consistency, Rarely Foreign Keys, Data Types not required Newer Concepts: Sharding, REST API, JSON, MapReduce

Slide 26

Slide 26 text

NoSQL Characteristics 26 No Predefined Schemas • May insert data without creating a table • Schema Versions (v1.5, v1.6, v1.7,…) Rarely Foreign Keys • No JOIN operations • Relationships are not automatically maintained Eventual Consistency • Old copies being replaced by new records • Inconsistent data until all replacements are complete

Slide 27

Slide 27 text

Download NoSQL v95141.3 27 Released 4/1/2014 http://www.nosql.org/downloads/ymbkm.zip

Slide 28

Slide 28 text

NoSQL Terminology and Concepts 28 Slides and Feedback at: http://joind.in/11012

Slide 29

Slide 29 text

Sharding 29 Partitions – Data distributed across disks Sharding – Data distributed across servers

Slide 30

Slide 30 text

Map Reduce 30 Divides work across distributed systems Parallel processing of large data sets Divide – Conquer – Consolidate Often Implement by defining Map and Reduce classes or functions 1+2+3+6+7+8+9=? Google’s MapReduce Programming Model – Revisited Ralf Lammel, Microsoft, 2008 http://www.sciencedirect.com/science/article/pii/S0167642307001281 2 6 8 1 7 3 9 16 20 36

Slide 31

Slide 31 text

JSON 31 Subset of JavaScript Object Notation Similarities to XML method for representing data Syntax Name : Value pairs “salary” : “125000” Values are: number, string, Boolean, array, object, or NULL Objects can store Objects, Arrays can store Arrays Separate pairs by commas “salary” : “125000”, “gender” : “male” Curly braces denote objects { “salary” : “125000”, “gender” : “male” } Square brackets denote arrays “phone” : [”555-1212”, ”555-3344”] “phone” : [ {“office” : ”555-1212”}, {“mobile” : ”555-3344”} ]

Slide 32

Slide 32 text

JSON Example 32 { "_id" : 1, "name" : { "first" : "John", "last" : "Backus" }, "contribs" : [ "Fortran", "ALGOL", "FP" ], "awards" : [ { "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" }, { "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" } ] } http://www.mongodb.com/json-and-bson

Slide 33

Slide 33 text

REST API 33 CRUD (Create, Read, Update, Delete) operations through the web HTTP Methods GET (List/Read) POST(Update) PUT(Create) DELETE(Delete) EXAMPLE API http://www.blinksale.com/api/ List/Read Data via HTTP GET to http://www.blinksale.com/invoices http://www.blinksale.com/invoices/invoice_id/payments http://www.blinksale.com/invoices/?start=2006&end=2008 Returns XML results

Slide 34

Slide 34 text

REST API 34 Update data via HTTP POST to http://www.blinksale.com/invoices/invoice_id/payments 1000.00 2006-09-27 REST = REpresentational State Transfer Twitter Example: https://dev.twitter.com/docs/api/1.1 (GET and POST only)

Slide 35

Slide 35 text

Database SELECT Statements 35 Oracle SELECT * FROM relationships MongoDB db.relationships.find() Cassandra (CQL) SELECT * FROM relationships Slides and Feedback at: http://joind.in/11012

Slide 36

Slide 36 text

Database SELECT Statements 36 Redis – Key-Value Store SMEMBERS relationships Riak – Key-Value Store with REST API (+ proprietary drivers) http://localhost:8091/riak/relationships/likes Neo4j (Cypher) MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Slides and Feedback at: http://joind.in/11012

Slide 37

Slide 37 text

JOINS without Foreign Keys 37 original_id = ObjectId() db.employer.insert({ "_id": original_id, "name": "Broadway Tech", "url": "bc.example.net" }) db.people.insert({ "name": "Erin", “employer_id": original_id, "url": "bc.example.net/Erin" }) “Erin” works at “Broadway Tech” One of the employees at “Broadway Tech” is “Erin” http://docs.mongodb.org/manual/reference/database-references/#document-references

Slide 38

Slide 38 text

Replication Challenge is Write Consistency 38

Slide 39

Slide 39 text

ACID, BASE, CAP, CPR 1979 Gray, 1983 Reuter & Härder - ACID Atomic, Consistent, Isolated, Durable Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops 1997 Brewer - BASE Basically Available, Soft-state, Eventually consistent 2000 Brewer – CAP (Pick Two) Consistency, Availability, Partition Tolerance CPR (Pick Two) Consistency, Performance, Replication/Redundancy Contrived - Stretch Definitions 39

Slide 40

Slide 40 text

CPR 40 Consistency Redundancy Performance Pick Two

Slide 41

Slide 41 text

CPR 41 Consistency Performance A D C B Spread data across storage or computer Redundancy

Slide 42

Slide 42 text

42 Redundancy Performance Consistency ABCE ABCD ABCD ABCE Updates may be inconsistent across devices

Slide 43

Slide 43 text

43 Consistency Redundancy Performance ABCD ABCD ABCD ABCD One Update Locks all Nodes

Slide 44

Slide 44 text

CRUD 44 Create Read Update Delete Slides and Feedback at: http://joind.in/11012

Slide 45

Slide 45 text

SQL CRUD 45 Create INSERT INTO table (column1, column2) VALUES (9, 'string'); Read SELECT column1, column2 FROM table; Update UPDATE table SET column2 = 'text' WHERE column1= 9 Delete DELETE FROM table WHERE column2='text'

Slide 46

Slide 46 text

Key-Value Stores 46 Key Value code:java 17.316% Lowest rank on Feb 2014 code:C 18.334% Lowest rank on August 2013 code:Objective-C Lowest rank on Dec 2007 11.341% code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”} Key Value drink:java coffee drink:punch Sprite + pineapple juice drink:pop Carbonated Soda http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html code bucket drink bucket

Slide 47

Slide 47 text

Redis CRUD 47 http://redis.io/commands http://redis.io/topics/data-types-intro http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/ Redis is an in-memory Key-Value Store which stores: Strings, Hashes, Lists, Sets, or Ordered sets Strings: values of strings are concrete and can not be altered SET user:jim {lastname: ‘Mathews’, salary: 125000} GET user:jim Hashes: allows modification and retrieval of individual values HSET user:jim salary 125000 HSET user:jim lastname Mathews HGET jim salary

Slide 48

Slide 48 text

Redis CRUD 48 Lists: One-dimensional array with insert, append, pop, and push Redis.lpush(‘users:employees’, ‘user:jim’) redis.mget(redis.lrange(‘users:employess’,0,5)) Sets: lists with no duplicate values (SADD = Set Add) SADD users:employees jim SADD users:employees krishna SMEMBERS employees Sorted Sets: are sets with an added sorting value ZADD users:employees 125000 jim ZADD users:employees 157000 Krishna ZRANGEBYSCORE users:employees 100000 180000

Slide 49

Slide 49 text

Riak CRUD 49 Easy to install and configure test cluster REST Queries Create/PUT a “course:CIS2120” row curl –v –X PUT http://localhost:8091/riak/course/CIS2120 \ -H “Content-Type: application/json” \ -d ‘{“name”:”Database Coding”, “days”:”MWF”}’ Read/GET the value for “course:CIS2120” curl –X GET http://localhost:8091/riak/course/CIS2120 curl http://localhost:8091/riak/course/CIS2120 Key Value course:CIS2120 {“name”:”Database Coding”, “days”:”MWF”}

Slide 50

Slide 50 text

Riak Links 50 Riak can link on value to key:value to another with a relationship curl –v –X PUT http://localhost:8091/riak/student/sorensen \ -H “Content-Type: application/json” \ -H “Link: ; riaktag=\”enrolled\”” \ -d ‘{“firstname”:”Conner”}’ This does not automatically create a link from “sorensen“ to “CIS2120”

Slide 51

Slide 51 text

Neo4j 51

Slide 52

Slide 52 text

Neo4j – Graph Database 52 http://www.neo4j.org/learn/try http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Matches a person “n” that likes person “m” https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125

Slide 53

Slide 53 text

Neo4j CRUD 53 Must try dragging nodes at: http://www.neo4j.org/learn/try MATCH (user {name:“Bill"})-[:KNOWS]->(colleague) WHERE colleague.employer=“LinkedIn” RETURN user,colleague ORDER BY colleague.name LIMIT 10 http://docs.neo4j.org/refcard/2.0/ MATCH (n)-[r:LIKES]->(m) RETURN n,r,m Matches a person “n” that likes person “m” MATCH (n)-[r]->(m) RETURN n,r,m Matches any relationship between “n” and “m” http://www.neo4j.org/learn/cypher

Slide 54

Slide 54 text

Neo4j (LUKE {name:"Luke Skywalker"}), (HAN {name:"Han Solo"}), (LEIA {name:"Princess Leia Organa"}), (OBI_WAN {name:"Obi Wan Kenobi"}), (YODA {name : "Yoda"}), (VADER {name:"Darth Vader"}), (C3PO {name:"C3PO", droid:true}), (R2D2 {name:"R2D2", droid:true}), (CHEWBACCA {name:"Chewbacca"}), (TATOOINE {name:"Tatooine", distance:13184}), (DAGOBAH {name:"Dagobah", distance:15407}), (JEDI {name:"Jedi"}), (SITH {name:"Sith"}), (REBELLION {name:"Rebellion"}), (EMPIRE {name:"Empire"}), (DARK_SIDE {name:"Dark Side"}), (LIGHT_SIDE {name:"Light Side"}), … (LUKE)-[:FRIENDS_WITH]->(HAN), (LUKE)-[:FRIENDS_WITH]->(LEIA), (HAN)-[:FRIENDS_WITH]->(CHEWBACCA), (YODA)-[:TEACHES]->(OBI_WAN), (YODA)-[:TEACHES]->(LUKE), (OBI_WAN)-[:TEACHES]->(LUKE), (OBI_WAN)-[:KNOWS]->(VADER), (LUKE)-[:KNOWS]->(R2D2), (R2D2)-[:KNOWS]->(C3PO), (LUKE)-[:LIVED_ON]->(TATOOINE), (HAN)-[:LIVED_ON]->(CORELLIA), (LEIA)-[:LIVED_ON]->(ALDERAAN), (YODA)-[:LIVED_ON]->(DAGOBAH), (LUKE)-[:DEVOTED_TO]->(JEDI), (LUKE)-[:DEVOTED_TO]->(REBELLION), (LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE), (VADER)-[:DEVOTED_TO]->(SITH), (VADER)-[:DEVOTED_TO]->(EMPIRE), (VADER)-[:DEVOTED_TO]->(DARK_SIDE), (LEIA)-[:DEVOTED_TO]->(REBELLION), (HAN)-[:DEVOTED_TO]->(REBELLION) … https://gist.github.com/peterneubauer/6019125 http://gist.neo4j.org/?6019125 MATCH y-[r]-other WHERE y.name='Yoda' return y.name, type(r), other.name 54

Slide 55

Slide 55 text

Google BigTable • White Paper published in 2006 • Many databases based upon BigTable • 13 pages, readable for many non-techies • Insightful into the early days of NoSQL http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf 55

Slide 56

Slide 56 text

Hbase 56 Large-Scale, Column-oriented database Consistency, Performance, Fault-Tolerant, ACID via Locking Tables are created before initial data is added Tables have row keys are indexed row identifier strings column families – contain one or more columns timestamp for version control

Slide 57

Slide 57 text

Hbase 57 Row key is a unifier for column families. If row does insert values in a column family no disk space is utilized within the column family. Keys are identified by column_family:column_name text: revision:author revision:comment Write-Ahead Logging (WAL) similar to file system journaling

Slide 58

Slide 58 text

Hbase CRUD 58 create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’ create ‘wiki’, ‘text’, ‘revision’ put ‘wiki’, ‘first page’, ‘text:’, ‘…’ put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’ get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’] delete ‘wiki’, ‘first page’, ‘revision:author’ scan ‘wiki’ = SELECT * FROM wiki Seven Databases in Seven Weeks, Redmond & Wilson 2012

Slide 59

Slide 59 text

MongoDB Document Store 59 db.courses.insert({ name: “CIS2120”, description: “Database Coding”, instructor: { name: “David Williams”, email: “[email protected]” } instructor2: { name: “John Kerley-Weeks”, office: “JQL230” } subjects: [“Python”, “MongoDB”, “3NF”, “ETL”, “Star Schema”] })

Slide 60

Slide 60 text

MongoDB vs SQL 60 http://docs.mongodb.org/manual/reference/sql-comparison/ MongoDB -> SQL Terminology Collection -> Table Document -> Row Column -> Field db.courses.find() = SELECT * FROM courses db.courses.count() = SELECT COUNT(*) FROM courses db.courses.find({name: “CIS2120”})

Slide 61

Slide 61 text

MongoDB Simple Database 61 http://media.mongodb.org/zips.json {"city": "ACMAR", "loc": [-86.51557, 33.584132], "pop": 6055, "state": "AL", "_id": "35004"} {"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"} {"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"} {"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"} {"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"} {"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"} {"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"} {"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"} {"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"} {"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"} {"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"} {"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"} {"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"} {"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"} {“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”} {“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”, “_id”: “95961”}

Slide 62

Slide 62 text

Cassandra Characteristics 62 Scalable, High-availability Wide-columnar datastore Peer-to-peer rather than master-slave clusters Tunable consistency can read/write to a single node, quorum of nodes or all nodes Recommends static and dynamic column families Static column families have contain pre-defined columns Contact Info: phone, address, email, web Dynamic families have variable numbers of similar columns Students enrolled in a course

Slide 63

Slide 63 text

Cassandra CRUD 63 http://www.datastax.com/docs/0.8/references/cql http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt CREATE TABLE course ( name text PRIMARY KEY, instructor text, maxstudents int ) INSERT INTO course (name, instructor, maxstudents) VALUES (‘CIS2120’, ‘Williams’, 28) UPDATE course SET maxstudents=26 WHERE name=‘CIS2120’ SELECT name, instructor FROM course WHERE maxstudents > 20

Slide 64

Slide 64 text

Cassandra CRUD 64 No JOIN operations or FOREIGN KEYS CREATE TABLE people ( name text PRIMARY KEY, email text, phones map ) INSERT INTO people (name, email, phones) VALUES (‘John Weeks’, ‘[email protected]’, {‘mobile’ : ‘555-1212’, ‘office’ : ‘797-7133’, ‘fax’ : ‘555-1212’}) UPDATE people SET phones[‘office’] = ‘555-1212’ WHERE email = ‘[email protected]

Slide 65

Slide 65 text

Questions ??? 65 Slides and Feedback at: http://joind.in/11012