Slide 1

Slide 1 text

NoSQL: mongoDB vs. CouchDB Bc. Tomáš Jukin @Inza Ing. Michal Valenta, Ph.D. ČVUT MI-PDB 1

Slide 2

Slide 2 text

Roadmap • Rekapitulace • Motivace • Trocha teorie • Features, Architektura, Kde použít?, Odkud instalovat? • mongoDB • CouchDB • Map-Reduce v praxi • mongoDB • CouchDB • Shrnutí ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB 2

Slide 3

Slide 3 text

Rekapitulace • NoSQL = Not Only SQL 3 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 4

Slide 4 text

Rekapitulace • ACID občas svazuje • viz. CAP theorem • viz. BASE • via http://www.cs.berkeley.edu/~brewer/ cs262b-2004/PODC-keynote.pdf • Problematika GASŠ - “Garáže, Auta, Součástek a Šuplíků” 4 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 5

Slide 5 text

Motivace Situace: “V aplikaci instance, z DB lezou datasety a posíláme do ní řetězce...???” 5 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB ?

Slide 6

Slide 6 text

Motivace Overkill? 6 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB ?

Slide 7

Slide 7 text

Motivace ORM? 7 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB ?

Slide 8

Slide 8 text

Motivace ORM? 8 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB ?

Slide 9

Slide 9 text

Motivace NoSQL! 9 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB !

Slide 10

Slide 10 text

Motivace - NoSQL • V aplikaci auto, v garáži auto • Propustnost • KISS • Pozor: • ALE JEN PRO SPRÁVNÉ USE CASES! • NoSQL není všelék! • ...a Map/Reduce není SQL... 10 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 11

Slide 11 text

Motivace - NoSQL • Ok, když NoSQL, jaké? • celá řada, viz. http://nosql-database.org/ • http://blog.nahurst.com/visual-guide-to- nosql-systems • my si představíme dvě dokumentově orientované... 11 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 12

Slide 12 text

Motivace NoSQL! 12 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 13

Slide 13 text

Motivace Problém SQL DBs • Škálování SQL DB = nákup více CPU • CPU je nejdražší komponenta počítače! • - tedy škálování SQL DB vyžaduje nákup těch nejdražších komponent PC... 13 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB ?

Slide 14

Slide 14 text

Motivace Co s tím? We shall see... 14 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB ?

Slide 15

Slide 15 text

Motivace • Co kdyby bylo možné místo škálování pomocí CPU škálovat pomocí RAM? 15 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB ?

Slide 16

Slide 16 text

Motivace • Ono to možné je! 16 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB !

Slide 17

Slide 17 text

Trocha teorie • “Agile and scalable” • “MongoDB (from "humongous") is a scalable, high-performance, open source NoSQL database.” • Written in C++ • http://www.mongodb.org 17 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 18

Slide 18 text

Trocha teorie • Document-oriented storage • Schema-Less • Full Index Support • Replication • Horizontal scaling of the data layer • Document-based queries (JS / JSON) • Map / Reduce • /data/db 18 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 19

Slide 19 text

Trocha teorie • No relations, no joins • Embed vs. link • Embed = “prejoin” • Links = needs to be processed on client • follow-up query • Indexes • Collections • a.k.a. tables in SQL • mohou být heterogenní • v praxi často homogenní 19 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB “There are no joins in MongoDB – distributed joins would be difficult on a 1,000 server cluster.”

Slide 20

Slide 20 text

Trocha teorie • Document-oriented storage • BSON • žije v RAM paměti • VRAM • dedikovaný stroj • nezpůsobuje SWAPování • (řeší sama) • Journaling (mimo storage) • on / off • Jazyk: “interaktivní”JS • JSON • BSON (over network) • REST (není native!) • mongod --rest 20 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 21

Slide 21 text

Trocha teorie 21 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 22

Slide 22 text

Trocha teorie 22 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 23

Slide 23 text

• http://labs.google.com/papers/ mapreduce.html • map() & reduce() • nad kolekcí 23 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Map / Reduce

Slide 24

Slide 24 text

• map() - dej mi jen to podstatné, nebo to uprav • reduce() - zredukuj to! 24 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Map / Reduce

Slide 25

Slide 25 text

• map() - dej mi jen to podstatné, nebo to uprav • reduce() - zredukuj to! • a znovu! 25 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Map / Reduce

Slide 26

Slide 26 text

• map() - dej mi jen to podstatné, nebo to uprav • reduce() - zredukuj to! • a znovu! a znovu! 26 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Map / Reduce

Slide 27

Slide 27 text

• map() - dej mi jen to podstatné, nebo to uprav • reduce() - zredukuj to! • a znovu! a znovu! a znovu! ... 27 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Map / Reduce

Slide 28

Slide 28 text

Map / Reduce 28 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB (1..3).map  {  |number|  number  *  2  } #  =>  [2,  4,  6] (1..3).reduce(0)  {  |sum,  num|  sum  +=  num  } #  =>  6

Slide 29

Slide 29 text

Map / Reduce 29 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB // Map function(doc) { for (tag in doc.tags) { emit(doc.tags[tag], 1) } } // Reduce _count

Slide 30

Slide 30 text

• Komplexní dotazy? AND? • Co s tím? 30 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Map / Reduce

Slide 31

Slide 31 text

• CouchDB-Lucene • http://github.com/rnewson/couchdb-lucene • mongoDB API • http://www.mongodb.org/display/DOCS/Advanced+Queries 31 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Map / Reduce

Slide 32

Slide 32 text

• Nestačí? • Too Bad! • -> SQL :-(... 32 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Map / Reduce

Slide 33

Slide 33 text

33 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB db.runCommand({ mapreduce: "DenormAggCollection", query: { filter1: { '$in': [ 'A', 'B' ] }, filter2: 'C', filter3: { '$gt': 123 } }, map: function() { emit( { d1: this.Dim1, d2: this.Dim2 }, { msum: this.measure1, recs: 1, mmin: this.measure1, mmax: this.measure2 < 100 ? this.measure2 : 0 } );}, reduce: function(key, vals) { var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 }; for(var i = 0; i < vals.length; i++) { ret.msum += vals[i].msum; ret.recs += vals[i].recs; if(vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin; if((vals[i].mmax < 100) && (vals[i].mmax > ret.mmax)) ret.mmax = vals[i].mmax; } return ret; }, finalize: function(key, val) { val.mavg = val.msum / val.recs; return val; }, out: 'result1', verbose: true }); db.result1. find({ mmin: { '$gt': 0 } }). sort({ recs: -1 }). skip(4). limit(8); SELECT Dim1, Dim2, SUM(Measure1) AS MSum, COUNT(*) AS RecordCount, AVG(Measure2) AS MAvg, MIN(Measure1) AS MMin MAX(CASE WHEN Measure2 < 100 THEN Measure2 END) AS MMax FROM DenormAggTable WHERE (Filter1 IN (’A’,’B’)) AND (Filter2 = ‘C’) AND (Filter3 > 123) GROUP BY Dim1, Dim2 HAVING (MMin > 0) ORDER BY RecordCount DESC LIMIT 4, 8 1 2 3 4 5 1 7 6 1 2 3 4 5 Grouped dimension columns are pulled out as keys in the map function, reducing the size of the working set. Measures must be manually aggregated. Aggregates depending on record counts must wait until finalization. Measures can use procedural logic. Filters have an ORM/ActiveRecord- looking style. 6 Aggregate filtering must be applied to the result set, not in the map/reduce. 7 Ascending: 1; Descending: -1 Revision 4, Created 2010-03-06 Rick Osborne, rickosborne.org mySQL MongoDB

Slide 34

Slide 34 text

Trocha teorie • Cloud • https://mongohq.com/home • Heroku, nodejitsu, AWS, EngineYard • Reference • SAP, MTV, SourceForge, Disney, FourSquare 34 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 35

Slide 35 text

Trocha teorie • Zdroje • http://www.mongodb.org/display/DOCS/ Philosophy • http://www.mongodb.org/display/DOCS/SQL+to +Mongo+Mapping+Chart • http://rickosborne.org/blog/2010/02/infographic- migrating-from-sql-to-mapreduce-with-mongodb/ • http://www.mongodb.org/display/DOCS/ Production+Deployments 35 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 36

Slide 36 text

Instalace • Windows • mongoDB - http://goo.gl/wgmpE • Linux • custom apt pkgs - http://goo.gl/Bngud • mongoDB - http://goo.gl/wgmpE • MacOS • brew install mongodb • mongoDB - http://goo.gl/wgmpE 36 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 37

Slide 37 text

Motivace Problém SQL DBs • Škálování SQL DB = nákup více CPU • CPU je nejdražší komponenta počítače! • - tedy škálování SQL DB vyžaduje nákup těch nejdražších komponent PC... 37 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB ?

Slide 38

Slide 38 text

Motivace • Co kdyby bylo možné místo škálování pomocí CPU škálovat pomocí HDD? 38 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB ?

Slide 39

Slide 39 text

Motivace • Ono to možné je! 39 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB !

Slide 40

Slide 40 text

Trocha teorie • “Time to relax!” • “Apache CouchDB is a distributed, fault- tolerant and schema-free document- oriented database accessible via a RESTful HTTP/JSON API.” • Written in Erlang • http://couchdb.apache.org/ 40 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 41

Slide 41 text

Trocha teorie “Django may be built for the Web, but CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP.” Jacob Kaplan-Moss, Of the Web (2007) 41 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 42

Slide 42 text

Trocha teorie • “A Database for the Web” • RESTful over HTTP • Server = HTTP server • Client = Website / cURL / Futon • ETag HTTP caching 42 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 43

Slide 43 text

Trocha teorie • Fault tolerant and Concurent • Erlang • Append-only B-Trees • Append ONLY! • Distributed • Bez kofliktů! 43 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 44

Slide 44 text

Trocha teorie HOST=http://localhost:5984 curl ‐X GET $HOST # {"couchdb":"Welcome","version":"1.0.2"} curl ‐X GET $HOST/my‐db # {"error":"not_found","reason":"no_db_file"} curl ‐X PUT $HOST/my‐db # {"ok":true} curl ‐X PUT $HOST/my‐db/foo ‐d '{"moo":"bar"}' # {"ok":true,"id":"foo","rev":"1‐4c6114c65e295552ab1019e2b046b10e"} curl ‐X GET $HOST/my‐db/foo # {"_id":"foo","_rev":"1-4c6114c65e295552ab1019e2b046b10e","moo":"bar"} curl ‐X DELETE $HOST/my‐db/foo?rev=2‐d179f665eb01834faf192153dc72dcb3 # {"ok":true,"id":"foo","rev":"1‐4c6114c65e295552ab1019e2b046b10e"} 44 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 45

Slide 45 text

Trocha teorie • Budoucnost? • CouchBase • používá CouchDB • Reference • BBC, EngineYard, WikiLeaks, .. (http://goo.gl/ UZFBj) • http://www.jobs.cz/vysoke-skoly/ 45 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 46

Slide 46 text

Trocha teorie • Zdroje • http://couchdb.apache.org/ • http://wiki.apache.org/couchdb • http://www.couchbase.com/couchdb • http://jacobian.org/writing/of-the-web/ • http://webexpo.cz/praha2010/prednaska/couchdb- databaze-pro-web/ • http://wiki.apache.org/couchdb/CouchDB_in_the_wild 46 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 47

Slide 47 text

Instalace • Windows • CouchDB - http://goo.gl/SJUkp • Linux • CouchDB - http://goo.gl/SJUkp • MacOS • CouchDBX - http://goo.gl/crdYL 47 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 48

Slide 48 text

Shrnutí • mongoDB • škálování = nakupuji RAM • update only (journaling OFF by default) • replikace je MUST have! (by default) • CouchDB • škálování = nakupuji HDD • append only (revisions) • replikace je NICE to have 48 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 49

Slide 49 text

One last thing 49 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 50

Slide 50 text

Živá ukázka real-time replikace! 50 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 51

Slide 51 text

• POST _replicate • push vs. pull • Jednocestná replikace dat (konrétní DB) • lze nastavit i druhý směr • jen poslední revize • single / continuous • podporuje autentikaci (HTTP) • 1.1.0+ - _replicator DB • nastavení replikace není persistentní přes restart 51 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Replikace

Slide 52

Slide 52 text

Replikace 2. 2 kliky přes FUTON GUI nebo 3. jeden POST dotaz ;) 4. DONE 52 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 53

Slide 53 text

Replikace curl -i -X POST "http://localhost:5984/_replicate" \ -H "Content-Type: application/json" \ -d '{"source":"http://example.net:5984/test", "target":"test","create_target":true }' 53 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB • http://wiki.apache.org/couchdb/Replication

Slide 54

Slide 54 text

Replikace curl -i -X POST "http://localhost:5984/_replicate" \ -H "Content-Type: application/json" \ -d '{"source":"http://example.net:5984/test", "target":"test","create_target":true, "continuous":true }' 54 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB • http://wiki.apache.org/couchdb/Replication

Slide 55

Slide 55 text

• master-slave • Replica Set • voting • inkrementální replikace dat • on write • write to all • read from one 55 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Replikace

Slide 56

Slide 56 text

Replikace 1. start v režimu Replica Set 2. na jednom z uzlů rs.initiate(); • volitelně předáme config JSON rs.initiate(config); • tento uzel bude master 3. můžeme přidat další uzly rs.add(“node”) 4. ověříme stav Replika Setu rs.status(); 5. DONE 56 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB

Slide 57

Slide 57 text

Replikace mongod --replSet foo config = {_id: 'foo', members: [ {_id: 0, host: 'node1ip:27017'}, {_id: 1, host: 'node2ip:27017'}]} rs.initiate(config); 57 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB • http://www.mongodb.org/display/DOCS/Replica+Sets+-+Basics Node B (PRIMARY) Node A (SECONDARY) mongod --replSet foo

Slide 58

Slide 58 text

Děkujeme za pozornost Dotazy? Bc. Tomáš Jukin @Inza Ing. Michal Valenta, Ph.D. 58 ČVUT Bc. Tomáš Jukin, Ing. Michal Valenta, Ph.D. - NoSQL: mongoDB vs. CouchDB MI-PDB Přednášku stahujte na link