Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Look Ma! No more blobs

Look Ma! No more blobs

Binary storage using GridFS.

8e0ede7586a03357acfe3eb3ac362c28?s=128

Aparna Chaudhary

April 27, 2013
Tweet

Transcript

  1. Look Ma! No more blobs Aparna Chaudhary NoSQL matters, @Cologne

    Germany 2013
  2. EMBRACE POLYGLOT PERSISTENCE! STOP RDBMS ABUSE! KNOW YOUR USE CASE

  3. Parse Extract Store Read XML We don't do rocket science...

    Use Case Runtime support for document types Metadata definition provided at runtime Document type names - max 50 char Look up content based on metadata RA
  4. Challenges Storage of up to one million documents of 10KB

    to 2GB per document type per year Write 1MB < x msec Retrieve 1MB < y msec ......and details RA But…the Numbers make it interesting...
  5. How? File System MongoDB RDBMS JCR Document Management

  6. if you want to store files, its logical to use

    file system. ain't it? File System ✓ Ease of Use ✓ No special skill-set ✓ Backup and Recovery ✓ It’s free!
  7. How do I name them? Support for metadata storage? Performance

    with too many small files? Query - Administration? High Availability? Limitation on total number of files?
  8. Relational database Integrity Consistency Durability Atomicity Joins Backups High Availability

    You name it, We have it! RDBMS Aggregations
  9. RDBMS Developer’s Perspective

  10. Challenge #1 RA We need runtime support for document type.

    RA We need runtime support for document type.
  11. Challenge #1 DOC_1 DOC_2 DOC_3 DOC_4 DOC_5 DOC_6 Dynamic DDL

    Generation DOC_1 DOC_2 DOC_3 DOC_4 DOC_5 DOC_6 Dynamic DDL Generation
  12. Challenge #1 String concatenations are ugly… DEV String concatenations are

    ugly… DEV
  13. Challenge #1 Let's build a utility. DEV Let's build a

    utility. DEV
  14. Challenge #1 More Work More Work

  15. Challenge #2 RA Document type is 50 char long RA

    Document type is 50 char long
  16. Challenge #2 TABLE NAME LIMITS Wait… SQL-92 says 128 Char

    ? We rule. Let's support only 30 char. TABLE NAME LIMITS Wait… SQL-92 says 128 Char ? We rule. Let's support only 30 char.
  17. Challenge #2 DOC_TYPE_MAPPING Let's create a mapping table. DEV DOC_TYPE_MAPPING

    Let's create a mapping table. DEV
  18. Challenge #2 Ugly unreadable table names! Ugly unreadable table names!

  19. So...finally... Read XML Dynamic DDL generation Document Type Alias DocumentType

    Defined Yes No Extract Metadata Store Metadata Store Content Simple use case becomes complex...
  20. Remember... Our Challenge QA Let's see if we are in

    spec for response time. Aah..what about performance now? DEV
  21. MongoDB Document Based GridFS B-Tree Dynamic Schema JSON BSON Query

    Scalable http://www.10gen.com/presentations/storage-engine-internals Joins Complex Transaction
  22. F1 F2 F3 F4 F5 ID1 ID2 ID3 ID4 ID5

    F1 F1 F1 F1 F2 F2 F3 F4 F5 F6 F2 F3 F4 F5 Fx F8 F3 F9 F7 Concepts Database Collection Collection Collection Collection Collection Collection Database Collection Collection Collection Collection Collection Collection Database Collection Collection Collection Collection Collection Collection Database Collection Collection Collection Collection Collection Collection Table = Collection Column = Field Row = Document Database = Database
  23. GridFS MongoDB divides the large content into chunks Stores Metadata

    and Chunks separately http://docs.mongodb.org/manual/core/gridfs/
  24. > mybucket.files { "_id" : ObjectId("514d5cb8c2e6ea4329646a5c"), "chunkSize" : NumberLong(262144), "length"

    : NumberLong(103015), "md5" : "34d29a163276accc7304bd69c5520e55", "filename" : "health_record_2.xml", "contentType" : application/xml, "uploadDate" : ISODate("2013-03-23T07:41:44.907Z"), "aliases" : null, "metadata" : { "fname" : "Aparna", "lname" : "Chaudhary","country" : "Netherlands" } } ObjectId - 12 Byte BSON: 4 Byte - Seconds since Epoch 3 Byte - Machine Id 2 Byte - Process Id 3 Byte - Counter
  25. > mybucket.chunks { "_id" : ObjectId("514d5cb8c2e6ea4329646a5d"), "files_id" : ObjectId("514d5cb8c2e6ea4329646a5c"), "n"

    : 0, "data" : BinData(0,...) }
  26. ? I'm storing 10KB file, but would it use 256KB

    on disk? Last Chunk = FileSize % 256 + Metadata overhead 256 1128KB 256 256 256 104 + x 10KB 10 + x Chunk is as big as it needs to be...
  27. Challenge #1 DEV MongoDB supports Dynamic Schema. You can use

    collection per docType and they are created dynamically. RA We need runtime support for document type.
  28. Challenge #2 RA Document type is 50 char long DEV

    MongoDB namespace can be up to 123 char.
  29. So...finally... Simple use case remains simple...well becomes simpler... Read XML

    Extract Metadata Store Metadata & Content
  30. Remember... Our Challenge QA Let's see if we are in

    spec for response time. DEV Performance test is part of our definition of 'DONE'
  31. BEcause seeing is believing! Demo ‣ GridFS 2.4.0 ‣ PostgreSQL

    9.2 ‣ Spring Data ‣ JMeter 2.7 ‣ Mac OS X 10.8.3 2.3GHz Quad-Core Intel Core i7, 16GB RAM https://github.com/aparnachaudhary/nosql-matters-demo
  32. EMBRACE POLYGLOT PERSISTENCE! STOP RDBMS ABUSE! KNOW YOUR USE CASE

    @aparnachaudhary
  33. Java Developer, Data Lover Eindhoven, Netherlands http://blog.aparnachaudhary.com/ @aparnachaudhary Thank You!