Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ToroDB: Supercharge your RDBMS with MongoDB Superpowers

8Kdata
April 21, 2016

ToroDB: Supercharge your RDBMS with MongoDB Superpowers

Are you a DBA/devops tired of being asked to also support MongoDB? Wouldn't it be cool if your current RDBMS could support MongoDB... without having a MongoDB server? Would you like to enable your RDBMS to accept true unstructured data with MongoDB's famous API, while still offering ACID-like capabilities? ToroDB is an open source project that turns your RDBMS into a MongoDB-compatible server, supporting the MongoDB query API and MongoDB's replication, but storing your data into a reliable and trusted PostgreSQL database. Or analytics databases such as Greenplum or CitusDB, to also support OLTP and OLAP/DW workloads! ToroDB natively implements the MongoDB protocol, so you can use it with MongoDB tools and drivers, and features a document-to-relational mapping algorithm that transforms the JSON documents into relational tables. ToroDB also offers transactions, a native SQL API and automatic data normalization and partitioning based on JSON document's implicit schema. If you want to have a RDBMS and MongoDB on the same system, you can't miss this talk!

8Kdata

April 21, 2016
Tweet

More Decks by 8Kdata

Other Decks in Programming

Transcript

  1. ToroDB @NoSQLonSQL Say you were… • A happy DBA, managing

    your RDBMS • Bofhing your users when required • Just having to fight devs who don't know who is Mr. Bobby Tables
  2. ToroDB @NoSQLonSQL … and then NoSQL came And you started

    receiving questions like: I want NoSQL! Install MongoDB! My app is web scale!
  3. ToroDB @NoSQLonSQL ToroDB in one slide • Document-oriented, JSON, NoSQL

    db • Open source (AGPL) • MongoDB compatibility (wire protocol level)
  4. ToroDB @NoSQLonSQL ToroDB storage internals { "name": "ToroDB", "data": {

    "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } } }
  5. ToroDB @NoSQLonSQL ToroDB storage internals The document is split into

    the following subdocuments: { "name": "ToroDB", "data": {}, "nested": {} } { "a": 42, "b": "hello world!"} { "j": 42, "deeper": {}} { "a": 21, "b": "hello"}
  6. ToroDB @NoSQLonSQL ToroDB storage internals ┌─────┬───────┬────────────────────────────┬────────┐ │ did │ index

    │ _id │ name │ ├─────┼───────┼────────────────────────────┼────────┤ │ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │ └─────┴───────┴────────────────────────────┴────────┘ ┌─────┬───────┬────┬──────────────┐ │ did │ index │ a │ b │ ├─────┼───────┼────┼──────────────┤ │ 0 │ ¤ │ 42 │ hello world! │ │ 0 │ 1 │ 21 │ hello │ └─────┴───────┴────┴──────────────┘ ┌─────┬───────┬────┐ │ did │ index │ j │ ├─────┼───────┼────┤ │ 0 │ ¤ │ 42 │ └─────┴───────┴────┘
  7. ToroDB @NoSQLonSQL ToroDB storage internals select * from demo.structures ┌─────┬────────────────────────────────────────────────────────────────────────────┐

    │ sid │ _structure │ ├─────┼────────────────────────────────────────────────────────────────────────────┤ │ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │ └─────┴────────────────────────────────────────────────────────────────────────────┘ select * from demo.root; ┌─────┬─────┐ │ did │ sid │ ├─────┼─────┤ │ 0 │ 0 │ └─────┴─────┘
  8. ToroDB @NoSQLonSQL torodb$ select * from toroviews.person ; ┌─────┬───────────┬────────┬─────┐ │

    did │ surname │ name │ age │ ├─────┼───────────┼────────┼─────┤ │ 0 │ Hernandez │ Alvaro │ ¤ │ │ 1 │ Surname │ Name │ 31 │ └─────┴───────────┴────────┴─────┘ (2 rows) torodb$ select * from toroviews."person.contact"; ┌─────┬──────────┬────────────────────────┐ │ did │ verified │ email │ ├─────┼──────────┼────────────────────────┤ │ 0 │ t │ [email protected] │ │ 1 │ ¤ │ [email protected] │ └─────┴──────────┴────────────────────────┘ (2 rows) ToroDB VIEWs
  9. ToroDB @NoSQLonSQL Mix-and-match relational & NoSQL • Use the same

    database for both your relational data and ToroDB • Just use separate schemas (if you will) • Don't write to ToroDB data or metadata tables • Query with SQL, do joins, whatever!
  10. ToroDB @NoSQLonSQL And much more! • Atomic batch-operations • Clean

    reads • Within node… transactions! (coming soon)
  11. ToroDB @NoSQLonSQL Data discoverability, SQL connectors • They are two

    of the major announcements for MongoDB 3.2 • To discover data, MongoDB samples data. ToroDB: just look at table structures! (and join with root if you want a count) • SQL connectors: native, no emulation
  12. ToroDB @NoSQLonSQL ToroDB v0.4 • ToroDB works as a secondary

    slave of a MongoDB master (or slave, chained rep) • Implements the full replication protocol (not as an oplog tailable query) • Open source github.com/torodb/torodb (devel branch, version 0.4-SNAPSHOT)
  13. ToroDB @NoSQLonSQL Write scalability (sharding) • MongoDB's sharding API not

    implemented yet (roadmap: ToroDB 0.8) • Will use MongoDB's mongos without modification, as well as config servers • That might change in the future (pg_shard?)
  14. ToroDB @NoSQLonSQL Horizontal scalability (storage level) • Another non-exclusive option

    is to have ToroDB store data in a distributed database • Requires a distributed database like GreenPlum, CitusDb or RedShift • Paired with replication as a slave: DW in NoSQL enabler
  15. ToroDB @NoSQLonSQL • Amazon reviews dataset Image-based recommendations on styles

    and substitutes J. McAuley, C. Targett, J. Shi, A. van den Hengel SIGIR, 2015 • AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD • 4x shards, 3x config; 4x segments GP • 83M records, 65GB plain json Benchmark
  16. ToroDB @NoSQLonSQL Disk usage Mongo 3.0, WT, Snappy GP columnar,

    zlib level 9 table size index size total size 0 10000000000 20000000000 30000000000 40000000000 50000000000 60000000000 70000000000 80000000000 Storage requirements MongoDB vs ToroDB on Greenplum Mongo ToroDB on GP bytes
  17. ToroDB @NoSQLonSQL SELECT count( distinct( "reviewerID" ) ) FROM reviews;

    Queries: which one is easier? db.reviews.aggregate([ { $group: { _id: "reviewerID"} }, { $group: {_id: 1, count: { $sum: 1}} } ])
  18. ToroDB @NoSQLonSQL SELECT "reviewerName", count(*) as reviews FROM reviews GROUP

    BY "reviewerName" ORDER BY reviews DESC LIMIT 10; Queries: which one is easier? db.reviews.aggregate( [ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true} )
  19. ToroDB @NoSQLonSQL Query times 3 different queries Q3 on MongoDB:

    aggregate fails 27.95 74.87 0 0 200 400 600 800 1000 1200 969 1007 0 35 13 31 Query duration (s) MongoDB vs ToroDB on Greenplum MongoDB ToroDB on GP speedup seconds