Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ToroDB: Supercharge your RDBMS with MongoDB Sup...

Avatar for 8Kdata 8Kdata
April 21, 2016

ToroDB: Supercharge your RDBMS with MongoDB Superpowers

Are you a DBA/devops tired of being asked to also support MongoDB? Wouldn't it be cool if your current RDBMS could support MongoDB... without having a MongoDB server? Would you like to enable your RDBMS to accept true unstructured data with MongoDB's famous API, while still offering ACID-like capabilities? ToroDB is an open source project that turns your RDBMS into a MongoDB-compatible server, supporting the MongoDB query API and MongoDB's replication, but storing your data into a reliable and trusted PostgreSQL database. Or analytics databases such as Greenplum or CitusDB, to also support OLTP and OLAP/DW workloads! ToroDB natively implements the MongoDB protocol, so you can use it with MongoDB tools and drivers, and features a document-to-relational mapping algorithm that transforms the JSON documents into relational tables. ToroDB also offers transactions, a native SQL API and automatic data normalization and partitioning based on JSON document's implicit schema. If you want to have a RDBMS and MongoDB on the same system, you can't miss this talk!

Avatar for 8Kdata

8Kdata

April 21, 2016
Tweet

More Decks by 8Kdata

Other Decks in Programming

Transcript

  1. ToroDB @NoSQLonSQL Say you were… • A happy DBA, managing

    your RDBMS • Bofhing your users when required • Just having to fight devs who don't know who is Mr. Bobby Tables
  2. ToroDB @NoSQLonSQL … and then NoSQL came And you started

    receiving questions like: I want NoSQL! Install MongoDB! My app is web scale!
  3. ToroDB @NoSQLonSQL ToroDB in one slide • Document-oriented, JSON, NoSQL

    db • Open source (AGPL) • MongoDB compatibility (wire protocol level)
  4. ToroDB @NoSQLonSQL ToroDB storage internals { "name": "ToroDB", "data": {

    "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } } }
  5. ToroDB @NoSQLonSQL ToroDB storage internals The document is split into

    the following subdocuments: { "name": "ToroDB", "data": {}, "nested": {} } { "a": 42, "b": "hello world!"} { "j": 42, "deeper": {}} { "a": 21, "b": "hello"}
  6. ToroDB @NoSQLonSQL ToroDB storage internals ┌─────┬───────┬────────────────────────────┬────────┐ │ did │ index

    │ _id │ name │ ├─────┼───────┼────────────────────────────┼────────┤ │ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │ └─────┴───────┴────────────────────────────┴────────┘ ┌─────┬───────┬────┬──────────────┐ │ did │ index │ a │ b │ ├─────┼───────┼────┼──────────────┤ │ 0 │ ¤ │ 42 │ hello world! │ │ 0 │ 1 │ 21 │ hello │ └─────┴───────┴────┴──────────────┘ ┌─────┬───────┬────┐ │ did │ index │ j │ ├─────┼───────┼────┤ │ 0 │ ¤ │ 42 │ └─────┴───────┴────┘
  7. ToroDB @NoSQLonSQL ToroDB storage internals select * from demo.structures ┌─────┬────────────────────────────────────────────────────────────────────────────┐

    │ sid │ _structure │ ├─────┼────────────────────────────────────────────────────────────────────────────┤ │ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │ └─────┴────────────────────────────────────────────────────────────────────────────┘ select * from demo.root; ┌─────┬─────┐ │ did │ sid │ ├─────┼─────┤ │ 0 │ 0 │ └─────┴─────┘
  8. ToroDB @NoSQLonSQL torodb$ select * from toroviews.person ; ┌─────┬───────────┬────────┬─────┐ │

    did │ surname │ name │ age │ ├─────┼───────────┼────────┼─────┤ │ 0 │ Hernandez │ Alvaro │ ¤ │ │ 1 │ Surname │ Name │ 31 │ └─────┴───────────┴────────┴─────┘ (2 rows) torodb$ select * from toroviews."person.contact"; ┌─────┬──────────┬────────────────────────┐ │ did │ verified │ email │ ├─────┼──────────┼────────────────────────┤ │ 0 │ t │ [email protected] │ │ 1 │ ¤ │ [email protected] │ └─────┴──────────┴────────────────────────┘ (2 rows) ToroDB VIEWs
  9. ToroDB @NoSQLonSQL Mix-and-match relational & NoSQL • Use the same

    database for both your relational data and ToroDB • Just use separate schemas (if you will) • Don't write to ToroDB data or metadata tables • Query with SQL, do joins, whatever!
  10. ToroDB @NoSQLonSQL And much more! • Atomic batch-operations • Clean

    reads • Within node… transactions! (coming soon)
  11. ToroDB @NoSQLonSQL Data discoverability, SQL connectors • They are two

    of the major announcements for MongoDB 3.2 • To discover data, MongoDB samples data. ToroDB: just look at table structures! (and join with root if you want a count) • SQL connectors: native, no emulation
  12. ToroDB @NoSQLonSQL ToroDB v0.4 • ToroDB works as a secondary

    slave of a MongoDB master (or slave, chained rep) • Implements the full replication protocol (not as an oplog tailable query) • Open source github.com/torodb/torodb (devel branch, version 0.4-SNAPSHOT)
  13. ToroDB @NoSQLonSQL Write scalability (sharding) • MongoDB's sharding API not

    implemented yet (roadmap: ToroDB 0.8) • Will use MongoDB's mongos without modification, as well as config servers • That might change in the future (pg_shard?)
  14. ToroDB @NoSQLonSQL Horizontal scalability (storage level) • Another non-exclusive option

    is to have ToroDB store data in a distributed database • Requires a distributed database like GreenPlum, CitusDb or RedShift • Paired with replication as a slave: DW in NoSQL enabler
  15. ToroDB @NoSQLonSQL • Amazon reviews dataset Image-based recommendations on styles

    and substitutes J. McAuley, C. Targett, J. Shi, A. van den Hengel SIGIR, 2015 • AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD • 4x shards, 3x config; 4x segments GP • 83M records, 65GB plain json Benchmark
  16. ToroDB @NoSQLonSQL Disk usage Mongo 3.0, WT, Snappy GP columnar,

    zlib level 9 table size index size total size 0 10000000000 20000000000 30000000000 40000000000 50000000000 60000000000 70000000000 80000000000 Storage requirements MongoDB vs ToroDB on Greenplum Mongo ToroDB on GP bytes
  17. ToroDB @NoSQLonSQL SELECT count( distinct( "reviewerID" ) ) FROM reviews;

    Queries: which one is easier? db.reviews.aggregate([ { $group: { _id: "reviewerID"} }, { $group: {_id: 1, count: { $sum: 1}} } ])
  18. ToroDB @NoSQLonSQL SELECT "reviewerName", count(*) as reviews FROM reviews GROUP

    BY "reviewerName" ORDER BY reviews DESC LIMIT 10; Queries: which one is easier? db.reviews.aggregate( [ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true} )
  19. ToroDB @NoSQLonSQL Query times 3 different queries Q3 on MongoDB:

    aggregate fails 27.95 74.87 0 0 200 400 600 800 1000 1200 969 1007 0 35 13 31 Query duration (s) MongoDB vs ToroDB on Greenplum MongoDB ToroDB on GP speedup seconds