ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain 2015

NoSQL databases have emerged as a response to some perceived problems in the RDBMSs: agile/dynamic schemas; and transparent, horizontal scaling of the database. The former has been promptly targeted with the introduction of unstructured data types, but scaling a relational databases is still a very hard problem.

As a consequence, all NoSQL databases have been built from scratch: their storage engines, replication techniques, journaling, ACID support (if any). They haven't leveraged the previously existing state-of-the-art of RDBMSs, effectively re-inventing the wheel. Isn't this sub-optimal? Wouldn't it be possible to construct a NoSQL database by layering it on top of a relational database?

Session presented at Big Data Spain 2015 Conference
16th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Abstract: http://www.bigdataspain.org/program/fri/slot-37.html

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

October 21, 2015
Tweet

Transcript

  1. None
  2. ToroDB SCALING PostgreSQL LIKE MongoDB @NoSQLonSQL Álvaro Hernández <aht@torodb.com>

  3. ToroDB @NoSQLonSQL About *8Kdata* • Research & Development in databases

    • Consulting, Training and Support in PostgreSQL • Founders of PostgreSQL España, 5th largest PUG in the world (>500 members as of today) • About myself: CTO at 8Kdata: @ahachete http://linkd.in/1jhvzQ3 www.8kdata.com
  4. ToroDB @NoSQLonSQL ToroDB in brief

  5. ToroDB @NoSQLonSQL ToroDB in one slide • Document-oriented, JSON, NoSQL

    db • Open source (AGPL) • MongoDB compatibility (wire protocol level) • Uses PostgreSQL as a storage backend
  6. ToroDB @NoSQLonSQL ToroDB storage • Data is stored in tables.

    No blobs • JSON documents are split by hierarchy levels into “subdocuments”, which contain no nested structures. Each subdocument level is stored separately • Subdocuments are classified by “type”. Each “type” maps to a different table
  7. ToroDB @NoSQLonSQL ToroDB storage (II) • A “structure” table keeps

    the subdocument “schema” • Keys in JSON are mapped to attributes, which retain the original name • Tables are created dinamically and transparently to match the exact types of the documents
  8. ToroDB @NoSQLonSQL ToroDB storage internals { "name": "ToroDB", "data": {

    "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } } }
  9. ToroDB @NoSQLonSQL ToroDB storage internals The document is split into

    the following subdocuments: { "name": "ToroDB", "data": {}, "nested": {} } { "a": 42, "b": "hello world!"} { "j": 42, "deeper": {}} { "a": 21, "b": "hello"}
  10. ToroDB @NoSQLonSQL ToroDB storage internals select * from demo.t_3 ┌─────┬───────┬────────────────────────────┬────────┐

    │ did │ index │ _id │ name │ ├─────┼───────┼────────────────────────────┼────────┤ │ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │ └─────┴───────┴────────────────────────────┴────────┘ select * from demo.t_1 ┌─────┬───────┬────┬──────────────┐ │ did │ index │ a │ b │ ├─────┼───────┼────┼──────────────┤ │ 0 │ ¤ │ 42 │ hello world! │ │ 0 │ 1 │ 21 │ hello │ └─────┴───────┴────┴──────────────┘ select * from demo.t_2 ┌─────┬───────┬────┐ │ did │ index │ j │ ├─────┼───────┼────┤ │ 0 │ ¤ │ 42 │ └─────┴───────┴────┘
  11. ToroDB @NoSQLonSQL ToroDB storage internals select * from demo.structures ┌─────┬────────────────────────────────────────────────────────────────────────────┐

    │ sid │ _structure │ ├─────┼────────────────────────────────────────────────────────────────────────────┤ │ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │ └─────┴────────────────────────────────────────────────────────────────────────────┘ select * from demo.root; ┌─────┬─────┐ │ did │ sid │ ├─────┼─────┤ │ 0 │ 0 │ └─────┴─────┘
  12. ToroDB @NoSQLonSQL ToroDB storage and I/O savings 29% - 68%

    storage required, compared to Mongo 2.6
  13. ToroDB @NoSQLonSQL ToroDB: query “by structure” • ToroDB is effectively

    partitioning by type • Structures (schemas, partitioning types) are cached in ToroDB memory • Queries only scan a subset of the data • Negative queries are served directly from memory
  14. ToroDB @NoSQLonSQL Scaling ToroDB like MongoDB

  15. ToroDB @NoSQLonSQL Big Data: NoSQL vs SQL vs http://www.networkworld.com/article/2226514/tech-debates/what-s-better-for-your-big-data-application--sql-or-nosql-.html

  16. ToroDB @NoSQLonSQL Scalability?

  17. ToroDB @NoSQLonSQL Ways to scale • Vertical scalability ➔ Concurrency

    scalability ➔ Hardware scalability ➔ Query scalability • Read scalability (replication) • Write scalability (horizontal, sharding)
  18. ToroDB @NoSQLonSQL Vertical scalability Concurrency scalability • SQL is usually

    better (e.g. PostgreSQL): ➔ Finer locking ➔ MVCC ➔ better caching • NoSQL often needs sharding within the same host to scale
  19. ToroDB @NoSQLonSQL Vertical scalability Hardware scalability • Scaling with the

    number of cores? • Process/threading model? Query scalability • Use of indexes? Use of more than one? • Table/collection partitioning? • ToroDB “by-type” partitioning
  20. ToroDB @NoSQLonSQL Read scalability: replication • Replicate data to slave

    nodes, available read-only: scale-out reads • Both NoSQL and SQL support it • Binary replication usually faster (e.g. PostgreSQL's Streaming Replication) • Not free from undesirable phenomena
  21. ToroDB @NoSQLonSQL https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads Dirty and stale reads (“call me maybe”)

  22. ToroDB @NoSQLonSQL MongoDB write acknowledge https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

  23. ToroDB @NoSQLonSQL MongoDB's dirty and stale reads Dirty reads A

    primary in minority accepts a write that other clients see, but it later steps down, write is rolled back (fixed in 3.2?) Stale reads A primary in minority serves a value that ought to be current, but a newer value was written to the other primary in minority
  24. ToroDB @NoSQLonSQL Write scalability (sharding) • NoSQL better prepared than

    SQL • But many compromises in data modeling (schema design): no FKs • There are also solutions for SQL: ➔ Shared-disk, limited scalability (RAC) ➔ Sharding (like pg_shard) ➔ PostgreSQL's FDWs
  25. ToroDB @NoSQLonSQL Read scalability (replication) in MongoDB and ToroDB

  26. ToroDB @NoSQLonSQL Replication protocol choice • ToroDB is based on

    PostgreSQL • PostgreSQL has either binary streaming replication (async or sync) or logical replication • MongoDB has logical replication • ToroDB uses MongoDB's protocol
  27. ToroDB @NoSQLonSQL MongoDB's replication protocol • Every change is recorded

    in JSON documents, idempotent format (collection: local.oplog.rs) • Slaves pull these documents from master (or other slaves) asynchronously • Changes are applied and feedback is sent upstream
  28. ToroDB @NoSQLonSQL MongoDB slave's states • Secondary: slave is more

    or less up to date and pulling “diffs” from other nodes • InitialSync: copy * from all databases, all collections. Used to init slaves or when sync is lost (rollback didn't find common root; resync is requested) • Rollback: there is data to DELETE
  29. ToroDB @NoSQLonSQL MongoDB replication implementation https://github.com/stripe/mosql

  30. ToroDB @NoSQLonSQL Announcing ToroDB v0.4 (snap) Supporting MongoDB replication

  31. ToroDB @NoSQLonSQL ToroDB v0.4 • ToroDB works as a secondary

    slave of a MongoDB master (or slave) • Implements the full replication protocol (not an oplog tailable query) • Replicates from Mongo to a PostgreSQL • Open source github.com/torodb/torodb (repl branch, version 0.4-SNAPSHOT)
  32. ToroDB @NoSQLonSQL Advantages of ToroDB w/ replication • Native SQL

    • Query “by type” • Better SQL scaling • Less concurrency contention • Better hardware utilization • No need for ETL from Mongo to PG!
  33. ToroDB @NoSQLonSQL • NoSQL is trying to get back to

    SQL • ToroDB is SQL native! • Insert with Mongo, query with SQL! • Powerful PostgreSQL SQL: window functions, recursive queries, hypothetical aggregates, lateral joins, CTEs, etc ToroDB: native SQL
  34. ToroDB @NoSQLonSQL ToroDB: native SQL

  35. ToroDB @NoSQLonSQL db.bds15.insert({ id:5, person: { name: "Alvaro", surname: "Hernandez",

    contact: { email: "aht@torodb.com", verified: true } } }) db.bds15.insert({ id:6, person: { name: "Name", surname: "Surname", age: 31, contact: { email: "name.surname@gmail.com" } } }) Introducing ToroDB VIEWs
  36. ToroDB @NoSQLonSQL torodb$ select * from bds15.person ; ┌─────┬───────────┬────────┬─────┐ │

    did │ surname │ name │ age │ ├─────┼───────────┼────────┼─────┤ │ 0 │ Hernandez │ Alvaro │ ¤ │ │ 1 │ Surname │ Name │ 31 │ └─────┴───────────┴────────┴─────┘ (2 rows) torodb$ select * from bds15."person.contact"; ┌─────┬──────────┬────────────────────────┐ │ did │ verified │ email │ ├─────┼──────────┼────────────────────────┤ │ 0 │ t │ aht@torodb.com │ │ 1 │ ¤ │ name.surname@gmail.com │ └─────┴──────────┴────────────────────────┘ (2 rows) Introducing ToroDB VIEWs
  37. None