Slide 1

Slide 1 text

Look Ma! No more blobs Aparna Chaudhary NoSQL matters, @Cologne Germany 2013

Slide 2

Slide 2 text

EMBRACE POLYGLOT PERSISTENCE! STOP RDBMS ABUSE! KNOW YOUR USE CASE

Slide 3

Slide 3 text

Parse Extract Store Read XML We don't do rocket science... Use Case Runtime support for document types Metadata definition provided at runtime Document type names - max 50 char Look up content based on metadata RA

Slide 4

Slide 4 text

Challenges Storage of up to one million documents of 10KB to 2GB per document type per year Write 1MB < x msec Retrieve 1MB < y msec ......and details RA But…the Numbers make it interesting...

Slide 5

Slide 5 text

How? File System MongoDB RDBMS JCR Document Management

Slide 6

Slide 6 text

if you want to store files, its logical to use file system. ain't it? File System ✓ Ease of Use ✓ No special skill-set ✓ Backup and Recovery ✓ It’s free!

Slide 7

Slide 7 text

How do I name them? Support for metadata storage? Performance with too many small files? Query - Administration? High Availability? Limitation on total number of files?

Slide 8

Slide 8 text

Relational database Integrity Consistency Durability Atomicity Joins Backups High Availability You name it, We have it! RDBMS Aggregations

Slide 9

Slide 9 text

RDBMS Developer’s Perspective

Slide 10

Slide 10 text

Challenge #1 RA We need runtime support for document type. RA We need runtime support for document type.

Slide 11

Slide 11 text

Challenge #1 DOC_1 DOC_2 DOC_3 DOC_4 DOC_5 DOC_6 Dynamic DDL Generation DOC_1 DOC_2 DOC_3 DOC_4 DOC_5 DOC_6 Dynamic DDL Generation

Slide 12

Slide 12 text

Challenge #1 String concatenations are ugly… DEV String concatenations are ugly… DEV

Slide 13

Slide 13 text

Challenge #1 Let's build a utility. DEV Let's build a utility. DEV

Slide 14

Slide 14 text

Challenge #1 More Work More Work

Slide 15

Slide 15 text

Challenge #2 RA Document type is 50 char long RA Document type is 50 char long

Slide 16

Slide 16 text

Challenge #2 TABLE NAME LIMITS Wait… SQL-92 says 128 Char ? We rule. Let's support only 30 char. TABLE NAME LIMITS Wait… SQL-92 says 128 Char ? We rule. Let's support only 30 char.

Slide 17

Slide 17 text

Challenge #2 DOC_TYPE_MAPPING Let's create a mapping table. DEV DOC_TYPE_MAPPING Let's create a mapping table. DEV

Slide 18

Slide 18 text

Challenge #2 Ugly unreadable table names! Ugly unreadable table names!

Slide 19

Slide 19 text

So...finally... Read XML Dynamic DDL generation Document Type Alias DocumentType Defined Yes No Extract Metadata Store Metadata Store Content Simple use case becomes complex...

Slide 20

Slide 20 text

Remember... Our Challenge QA Let's see if we are in spec for response time. Aah..what about performance now? DEV

Slide 21

Slide 21 text

MongoDB Document Based GridFS B-Tree Dynamic Schema JSON BSON Query Scalable http://www.10gen.com/presentations/storage-engine-internals Joins Complex Transaction

Slide 22

Slide 22 text

F1 F2 F3 F4 F5 ID1 ID2 ID3 ID4 ID5 F1 F1 F1 F1 F2 F2 F3 F4 F5 F6 F2 F3 F4 F5 Fx F8 F3 F9 F7 Concepts Database Collection Collection Collection Collection Collection Collection Database Collection Collection Collection Collection Collection Collection Database Collection Collection Collection Collection Collection Collection Database Collection Collection Collection Collection Collection Collection Table = Collection Column = Field Row = Document Database = Database

Slide 23

Slide 23 text

GridFS MongoDB divides the large content into chunks Stores Metadata and Chunks separately http://docs.mongodb.org/manual/core/gridfs/

Slide 24

Slide 24 text

> mybucket.files { "_id" : ObjectId("514d5cb8c2e6ea4329646a5c"), "chunkSize" : NumberLong(262144), "length" : NumberLong(103015), "md5" : "34d29a163276accc7304bd69c5520e55", "filename" : "health_record_2.xml", "contentType" : application/xml, "uploadDate" : ISODate("2013-03-23T07:41:44.907Z"), "aliases" : null, "metadata" : { "fname" : "Aparna", "lname" : "Chaudhary","country" : "Netherlands" } } ObjectId - 12 Byte BSON: 4 Byte - Seconds since Epoch 3 Byte - Machine Id 2 Byte - Process Id 3 Byte - Counter

Slide 25

Slide 25 text

> mybucket.chunks { "_id" : ObjectId("514d5cb8c2e6ea4329646a5d"), "files_id" : ObjectId("514d5cb8c2e6ea4329646a5c"), "n" : 0, "data" : BinData(0,...) }

Slide 26

Slide 26 text

? I'm storing 10KB file, but would it use 256KB on disk? Last Chunk = FileSize % 256 + Metadata overhead 256 1128KB 256 256 256 104 + x 10KB 10 + x Chunk is as big as it needs to be...

Slide 27

Slide 27 text

Challenge #1 DEV MongoDB supports Dynamic Schema. You can use collection per docType and they are created dynamically. RA We need runtime support for document type.

Slide 28

Slide 28 text

Challenge #2 RA Document type is 50 char long DEV MongoDB namespace can be up to 123 char.

Slide 29

Slide 29 text

So...finally... Simple use case remains simple...well becomes simpler... Read XML Extract Metadata Store Metadata & Content

Slide 30

Slide 30 text

Remember... Our Challenge QA Let's see if we are in spec for response time. DEV Performance test is part of our definition of 'DONE'

Slide 31

Slide 31 text

BEcause seeing is believing! Demo ‣ GridFS 2.4.0 ‣ PostgreSQL 9.2 ‣ Spring Data ‣ JMeter 2.7 ‣ Mac OS X 10.8.3 2.3GHz Quad-Core Intel Core i7, 16GB RAM https://github.com/aparnachaudhary/nosql-matters-demo

Slide 32

Slide 32 text

EMBRACE POLYGLOT PERSISTENCE! STOP RDBMS ABUSE! KNOW YOUR USE CASE @aparnachaudhary

Slide 33

Slide 33 text

Java Developer, Data Lover Eindhoven, Netherlands http://blog.aparnachaudhary.com/ @aparnachaudhary Thank You!