Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

ToroDB Internals: How to Create a NoSQL Databas...

Avatar for 8Kdata 8Kdata
November 18, 2015

ToroDB Internals: How to Create a NoSQL Database on Top ofย SQL

ToroDB. New NoSQL database. Open source. Not reinventing the wheel, runs on top of PostgreSQL. MongoDB-compatible. With transactions. Does not suffer dirty-reads. With replication & sharding. This talk will show to the audience that SQL databases are not dead --definitely not. They shine. Even competing face to face with NoSQL. Now it's possible to build a NoSQL database on top of a SQL database. And probably, a better one. Discover how PostgreSQL can become a NoSQL database: how to map NoSQL documents to relational (no jsonb needed!); how to emulate MongoDB replication with PostgreSQL's awesome logical decoding; how to map MongoDB queries to SQL; how to benchmark and compare ToroDB & MongoDB. And how to turn a MongoDB-compatible database into a big data and analytics animal, in pure SQL.

Avatar for 8Kdata

8Kdata

November 18, 2015
Tweet

More Decks by 8Kdata

Other Decks in Programming

Transcript

  1. ToroDB internals How to create a NoSQL db on top

    of SQL @NoSQLonSQL รlvaro Hernรกndez <[email protected]>
  2. ToroDB @NoSQLonSQL About *8Kdata* โ€ข Research & Development in databases

    โ€ข Consulting, Training and Support in PostgreSQL โ€ข Founders of PostgreSQL Espaรฑa, 5th largest PUG in the world (>500 members as of today) โ€ข About myself: CTO at 8Kdata: @ahachete http://linkd.in/1jhvzQ3 www.8kdata.com
  3. ToroDB @NoSQLonSQL ToroDB in one slide โ€ข Document-oriented, JSON, NoSQL

    db โ€ข Open source (AGPL) โ€ข MongoDB compatibility (wire protocol level) โ€ข Uses PostgreSQL as a storage backend
  4. ToroDB @NoSQLonSQL How to make a NoSQL database โ€ข Varying

    schema (aka โ€œschema-lessโ€) โ€ข Document-oriented API โ€ข Replication โ€ข Horizontal scalability
  5. ToroDB @NoSQLonSQL ToroDB's Document 2 Relational transformation โ€ข Data is

    stored in tables. No blobs โ€ข JSON documents are split by hierarchy levels into โ€œsubdocumentsโ€, which contain no nested structures. Each subdocument level is stored separately โ€ข Subdocuments are classified by โ€œtypeโ€. Each โ€œtypeโ€ maps to a different table
  6. ToroDB @NoSQLonSQL โ€ข A โ€œstructureโ€ table keeps the subdocument โ€œschemaโ€

    โ€ข Keys in JSON are mapped to attributes, which retain the original name โ€ข Tables are created dinamically and transparently to match the exact types of the documents ToroDB's Document 2 Relational transformation
  7. ToroDB @NoSQLonSQL ToroDB storage internals { "name": "ToroDB", "data": {

    "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } } }
  8. ToroDB @NoSQLonSQL ToroDB storage internals The document is split into

    the following subdocuments: { "name": "ToroDB", "data": {}, "nested": {} } { "a": 42, "b": "hello world!"} { "j": 42, "deeper": {}} { "a": 21, "b": "hello"}
  9. ToroDB @NoSQLonSQL ToroDB storage internals select * from demo.t_3 โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”

    โ”‚ did โ”‚ index โ”‚ _id โ”‚ name โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 0 โ”‚ ยค โ”‚ \x5451a07de7032d23a908576d โ”‚ ToroDB โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ select * from demo.t_1 โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ did โ”‚ index โ”‚ a โ”‚ b โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 0 โ”‚ ยค โ”‚ 42 โ”‚ hello world! โ”‚ โ”‚ 0 โ”‚ 1 โ”‚ 21 โ”‚ hello โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ select * from demo.t_2 โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ” โ”‚ did โ”‚ index โ”‚ j โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค โ”‚ 0 โ”‚ ยค โ”‚ 42 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”˜
  10. ToroDB @NoSQLonSQL ToroDB storage internals select * from demo.structures โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”

    โ”‚ sid โ”‚ _structure โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 0 โ”‚ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ select * from demo.root; โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ” โ”‚ did โ”‚ sid โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 0 โ”‚ 0 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜
  11. ToroDB @NoSQLonSQL Why not jsonb? โ€ข jsonb is really cool

    โ€ข But keys (metadata) are repeated most of the time (storage, I/O savings) โ€ข Docuements are not classified (normalized) โ€ข ToroDB does all that
  12. ToroDB @NoSQLonSQL Metadata repetition { โ€œaโ€: 1, โ€œbโ€: 2 }

    { โ€œaโ€: 3 } { โ€œaโ€: 4, โ€œcโ€: 5 } { โ€œaโ€: 6, โ€œbโ€: 7 } { โ€œbโ€: 8 } { โ€œaโ€: 9, โ€œbโ€: 10 } { โ€œaโ€: 11, โ€œbโ€: 12, โ€œjโ€: 13 } { โ€œaโ€: 14, โ€œcโ€: 15 } Counting โ€œdocument typesโ€ in collections of millions: at most, 1000s of different types
  13. ToroDB @NoSQLonSQL ToroDB storage and I/O savings 29% - 68%

    storage required, compared to Mongo 2.6
  14. ToroDB @NoSQLonSQL ToroDB: query โ€œby structureโ€ โ€ข ToroDB is effectively

    partitioning by type โ€ข Structures (schemas, partitioning types) are cached in ToroDB memory โ€ข Queries only scan a subset of the data โ€ข Negative queries are served directly from memory
  15. ToroDB @NoSQLonSQL MongoDB WP and API โ€ข ToroDB speaks MongoDB

    protocol natively, and implements Mongo API โ€ข All MongoDB tools, drivers, GUIs, whould work as is on ToroDB โ€ข Offer a widely used API on top of your current infrastructure
  16. ToroDB @NoSQLonSQL โ€ข NoSQL is trying to get back to

    SQL โ€ข ToroDB is SQL native! โ€ข Insert with Mongo, query with SQL! โ€ข Powerful PostgreSQL SQL: window functions, recursive queries, hypothetical aggregates, lateral joins, CTEs, etc ToroDB: native SQL
  17. ToroDB @NoSQLonSQL db.toroviews.insert({ id:5, person: { name: "Alvaro", surname: "Hernandez",

    contact: { email: "[email protected]", verified: true } } }) db.toroviews.insert({ id:6, person: { name: "Name", surname: "Surname", age: 31, contact: { email: "[email protected]" } } }) ToroDB VIEWs
  18. ToroDB @NoSQLonSQL torodb$ select * from toroviews.person ; โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ” โ”‚

    did โ”‚ surname โ”‚ name โ”‚ age โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 0 โ”‚ Hernandez โ”‚ Alvaro โ”‚ ยค โ”‚ โ”‚ 1 โ”‚ Surname โ”‚ Name โ”‚ 31 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜ (2 rows) torodb$ select * from toroviews."person.contact"; โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ did โ”‚ verified โ”‚ email โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 0 โ”‚ t โ”‚ [email protected] โ”‚ โ”‚ 1 โ”‚ ยค โ”‚ [email protected] โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ (2 rows) ToroDB VIEWs
  19. ToroDB @NoSQLonSQL Replication protocol choice โ€ข ToroDB is based on

    PostgreSQL โ€ข PostgreSQL has either binary streaming replication (async or sync) or logical replication โ€ข MongoDB has logical replication โ€ข ToroDB uses MongoDB's protocol
  20. ToroDB @NoSQLonSQL MongoDB's replication protocol โ€ข Every change is recorded

    in JSON documents, idempotent format (collection: local.oplog.rs) โ€ข Slaves pull these documents from master (or other slaves) asynchronously โ€ข Changes are applied and feedback is sent upstream
  21. ToroDB @NoSQLonSQL ToroDB v0.4 โ€ข ToroDB works as a secondary

    slave of a MongoDB master (or slave, chained rep) โ€ข Implements the full replication protocol (not as an oplog tailable query) โ€ข Replicates from Mongo to a PostgreSQL โ€ข Open source github.com/torodb/torodb (repl branch, version 0.4-SNAPSHOT)
  22. ToroDB @NoSQLonSQL MongoDB's oplog & tailable cursors Oplog is a

    system, capped collection which contains idempotent DML commands Tailable cursors are โ€œopen queriesโ€ where data is streamed as it appears on the database Both are used a lot (replication, Meteor...)
  23. ToroDB @NoSQLonSQL How to do oplog & tailable with PG?

    Thanks to 9.4's logical decoding, it's easy Replication (master) is just plain logical decoding, transformed to MongoDB's json format Tailable cursors would be logical decoding plus filtering (to make sure streamed data matches query filters)
  24. ToroDB @NoSQLonSQL Write scalability (sharding) โ€ข MongoDB's sharding API not

    implemented yet (roadmap: ToroDB 0.8) โ€ข Will use MongoDB's mongos without modification, as well as config servers โ€ข That might change in the future (pg_shard?)
  25. ToroDB @NoSQLonSQL Horizontal scalability (storage level) โ€ข Another non-exclusive option

    is to have ToroDB store data in a distributed database โ€ข Requires a distributed database like GreenPlum, CitusDb or RedShift โ€ข Paired with replication as a slave: DW in NoSQL enabler
  26. ToroDB @NoSQLonSQL โ€ข GreenPlum was open sourced Oct/15 โ€ข ToroDB

    already runs on GP! CitusDB 5.0 port coming soon! โ€ข Goal: ToroDB v0.4 replicates from a MongoDB master onto Toro-Greenplum and runs the reporting queries using distributed SQL ToroDB on GP: SQL DW for MongoDB
  27. ToroDB @NoSQLonSQL โ€ข Amazon reviews dataset Image-based recommendations on styles

    and substitutes J. McAuley, C. Targett, J. Shi, A. van den Hengel SIGIR, 2015 โ€ข AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD EBS โ€ข 4x shards, 3x config; 4x segments GP โ€ข 83M records, 65GB plain json Benchmark
  28. ToroDB @NoSQLonSQL Disk usage Mongo 3.0, WT, Snappy GP columnar,

    zlib level 9 table size index size total size 0 10000000000 20000000000 30000000000 40000000000 50000000000 60000000000 70000000000 80000000000 Storage requirements MongoDB vs ToroDB on Greenplum Mongo ToroDB on GP bytes
  29. ToroDB @NoSQLonSQL SELECT count( distinct( "reviewerID" ) ) FROM reviews;

    Queries: which one is easier? db.reviews.aggregate([ { $group: { _id: "reviewerID"} }, { $group: {_id: 1, count: { $sum: 1}} } ])
  30. ToroDB @NoSQLonSQL SELECT "reviewerName", count(*) as reviews FROM reviews GROUP

    BY "reviewerName" ORDER BY reviews DESC LIMIT 10; Queries: which one is easier? db.reviews.aggregate( [ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true} )
  31. ToroDB @NoSQLonSQL Query times 3 different queries Q3 on MongoDB:

    aggregate fails 27,95 74,87 0 0 200 400 600 800 1000 1200 969 1007 0 35 13 31 Query duration (s) MongoDB vs ToroDB on Greenplum MongoDB ToroDB on GP speedup seconds
  32. ToroDB @NoSQLonSQL Atomic operations โ€ข There is no support for

    atomic bulk insert/update/delete operations โ€ข Not even with $isolated: โ€œPrevents a write operation that affects multiple documents from yielding to other reads or writes [โ€ฆ] You can ensure that no client sees the changes until the operation completes or errors out. The $isolated isolation operator does not provide โ€œall-or-nothingโ€ atomicity for write operations.โ€ http://docs.mongodb.org/manual/reference/operator/update/isolated/
  33. ToroDB @NoSQLonSQL Cheap single-node durability โ€ข Without journaling, MongoDB is

    not durable nor crash-safe โ€ข MongoDB requires โ€œj: trueโ€ for true single-node durability. But who guarantees its consistent usage? Who uses it by default? j:true creates I/O storms equivalent to SQL CHECKPOINTs
  34. ToroDB @NoSQLonSQL โ€œCleanโ€ reads http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior โ€œMongoDB will allow clients to

    read the results of a write operation before the write operation returns.โ€ โ€œIf the mongod terminates before the journal commits, even if a write returns successfully, queries may have read data that will not exist after the mongod restarts.โ€ Thus, MongoDB suffers from dirty reads. But let's call them just โ€œtainted readsโ€.
  35. ToroDB @NoSQLonSQL โ€œCleanโ€ reads What about $snapshot? Nope: โ€œThe snapshot()

    does not guarantee that the data returned by the query will reflect a single moment in time nor does it provide isolation from insert or delete operations.โ€ http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors Cursors in ToroDB run in repeatable read, read-only mode: globalCursorDataSource.setTransactionIsolation("TRANSACTIO N_REPEATABLE_READ"); globalCursorDataSource.setReadOnly(true);
  36. ToroDB @NoSQLonSQL MongoDB's dirty and stale reads (repl) Dirty reads

    A primary in minority accepts a write that other clients see, but it later steps down, write is rolled back (fixed in 3.2?) Stale reads A primary in minority serves a value that ought to be current, but a newer value was written to the other primary in minority
  37. ToroDB @NoSQLonSQL Data discoverability, SQL connectors โ€ข They are two

    of the major announcements for MongoDB 3.2 โ€ข To discover data, MongoDB samples data. ToroDB: just look at table structures! (and join with root if you want a count) โ€ข SQL connectors: native, no emulation