doc” Even “every row is now a doc” is a rough approximation. RDBMS Normal Forms do not strictly apply to CouchDB or NoSQL. I had a big analysis here of Normal Forms and CouchDB but it doesn’t fit into a 20 minute slot… 4
collection of relations from undesirable insertion, update and deletion dependencies; 2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs; 3. To make the relational model more informative to users; 4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by. 5 Source: Codd, E.F. “Further Normalization of the Data Base Relational Model,” ACM Trans. on DB Systems, 1971.
need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs; Just create a new document. type is a near-universal document field. 7
Record everything as it comes in, as a new doc 2. Timestamp everything* 3. Highly relational data may be a bad fit for CouchDB. 10 *Do not trust client-side time stamps to give you absolute event ordering!
for replication. – Replication of other documents may get held up by a big one • This can be catered for via filtered replication – Large files can rapidly eat available disk space – >1GB attachments are not a first-order design scenario. Attachments are also not available to view servers. You wouldn’t store video files as BLOBs in Oracle, would you? 12
replicating CouchDB servers: Multiple Couches may have different views of the DB. Replication will reconcile differences, but will leave behind traces of the disagreement. These traces are your document conflicts. Look at them. 15
Consider creating a new document for every change. Views will then help you find the latest info. For more info: http://guide.couchdb.org/v1/conflicts.html http://docs.couchdb.org/en/latest/intro/consistency.html 16
"MVCC tokens are not a revision control system" CouchDB does not keep all document history forever. Compaction will remove all but the latest + conflicts. Replication will not replicate all historical versions. – In fact, it transfers every leaf and its body, and just the path of revisions that lead to that leaf, but only up to _revs_limit revisions per leaf. 18
variations on this theme: “I’ll just treat CouchDB as a key/value store and do all my own queries client-side.” “I have highly relational data and views aren’t powerful enough.” “I need keyword / full text search.” 21
browser, or use a full server-side framework. SHOW/LIST is a last resort for legacy clients that expect CSV, XML, etc. It is not for rendering images, HTML, etc. To me, SHOW/LIST is ugly and should probably be deprecated. 27
30 Replication makes no guarantee of delivery, timeliness, or that all updates occur in the source’s _changes feed order. It’s more of a mailbox with weak ordering. Strong ordering is hard, if not impossible. “…it is impossible to order events with respect to time in a distributed system, this means they must be ordered causally.” (Riak docs) Partial ordering is possible, via causal relations. Read up on Lamport timestamps, vector clocks, etc. for more.
own.” 32 It’s not easy. To get all the edge cases right, and to be interoperable, it’ll take you months of effort. This is worse if you attempt it with flat files. Just ask how long it took PouchDB to get it right!
No, you want the bigcouch merge and haproxy/nginx: 1. Automatic database and view sharding 2. Optimized internal replication 3. Tunable DynamoDB-like parameters. 4. Lots more I don’t have time to talk about