Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From SQL to Documents

rozza
October 17, 2013

From SQL to Documents

Since the late 1970's relation SQL databases have been king and their dominance on the industry complete. In the last few years there has been an eruption of new style databases and the term "NoSQL" was coined. Of this new generation "Document Databases" have quickly become the most popular and widely used. Using MongoDB as the example, we'll look at why Document Databases are a good fit for the modern world and how to move from the relational thinking of the SQL world to a Document oriented approach.

rozza

October 17, 2013
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. Storage is cheap 0.01 0.10 1.00 10.00 100.00 1,000.00 10,000.00

    100,000.00 1,000,000.00 Source: http://www.mkomo.com/cost-per-gigabyte Storage cost per GB($)
  2. class Car(models.Model): name = models.CharField(max_length=50) price = models.DecimalField() manufacturer =

    models.ForeignKey(Manufacturer) class Manufacturer(models.Model): name = models.CharField("Name", max_length=30) cars = models.ManyToManyField("MCars", blank=True) How developers like to model data
  3. So we use Object Relational Mappers Relational Database Object Relational

    Mapping Application Code XML Config DB Schema
  4. ORM Cracks •  Polymorphism is Complicated –  Table per class,

    per concrete class, per family? •  Duplicate/Partial Schema –  Schema held in DB –  Schema held in the ORM/Code –  Refactoring's affect one/both –  Migrations needed –  Pressure to make model static •  Ownership of Schema is Split –  Schema not always under control of developer –  Turf wars
  5. ORM Cracks •  How to abstract away SQL –  Query

    by Example –  Query by API –  Query by Language - OQL/HQL (Subsets of SQL) •  Partial Objects Problem –  Load Whole Object ? - All relationships at the same time ? –  Load Lazily ? –  It's not clear what queries you are doing –  Not obvious know how much pain your SQL database is experiencing –  Custom SQL breaks the OO model
  6. Databases runs on expensive hardware "Clients can also opt to

    run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes." IBM Press Release 28 Aug, 2012
  7. This was a problem for google Source: http://googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html 250,000+ MBP's

    == 4.1 miles 2010 Search Index Size: 100,000,000 GB New data added per day: 100,000+ GB Databases they could use: 0
  8. And for facebook 2010: 13,000,000 queries per second TPC Top

    Results TPC #1 DB: 504,161 tps Top 10 combined: 1,370,368 tps
  9. Documents are core Relational MongoDB {        first_name:

     "Paul",      surname:  "Miller",      city:  "London",      location:  [45.123,47.232],      cars:  [            {  model:  "Bentley",              year:  1973,              value:  100000,  …  },          {  model:  "Rolls  Royce",              year:  1965,              value:  330000,  …  }      ]   }  
  10. MongoDB is fully featured {        first_name:  "Paul",

         surname:  "Miller",      city:  "London",      location:  [45.123,47.232],      cars:  [            {  model:  "Bentley",              year:  1973,              value:  100000,  …  },          {  model:  "Rolls  Royce",              year:  1965,              value:  330000,  …  }      ]   }   Rich Queries •  Find Paul's cars •  Find everybody who owns a car built in the 1970’s Geospatial •  Find all of the car owners in London Text Search •  Find all the cars described as having leather seats Aggregation •  What's the average value of Paul's car collection Map Reduce •  For each make and model of car, how many exist in the world?
  11. Embedded {! "id": 1! , "title": "My awesome blog entry"!

    , "description": "More awesome award winning words"! , "comments": [! {! "username": "nero"! , "comment": "This will for sure light up Rome" ! }! ]! }!
  12. Linked {! "id": 1! , "title": "My awesome blog entry"!

    , "description": "More awesome award winning words"! , "comments": [1]! }! ! {! "id": 1! , "blog": 1! , "username": "nero"! , "comment": "This will for sure light up Rome" ! }!
  13. {! "id": 1! , "product_code": "GFX443"! , "description": "Awesome new

    graphics card"! , "categories": [1, 2]! }! ! {! "id": 1! , "category": "Graphics Cards"! , "products":! }! ! {! "id": 2! , "category": "PC Components"! , "products": [1]! }! Many to Many [1, 2, 3, 4, 5, 6, 7, 8, 9 … 10000000] [1]
  14. Many One To Many {! "id": 1! , "product_code": "GFX443"!

    , "description": "Awesome new graphics card"! , "categories": [1, 2]! }! ! {! "id": 1! , "category": "Graphics Cards"! }! ! {! "id": 2! , "category": "PC Components"! }!
  15. As a table type radius length width height inner_radius Circle

    5 - - - - Square - 5 - - - Rectangle - - 5 3 - Wheel 5 - - - 3
  16. As Documents {! "_type": "circle"! , "radius": 4! } {!

    "_type": "wheel"! , "radius": 4! , "inner_radius": 2! } {! "_type": "square"! , "length": 10 ! } {! "_type": "rectangle"! , "width": 30! , "length": 20! }
  17. Add a new type Shape width Rectangle length Square radius

    Circle inner_radius Wheel height Box
  18. As Documents {! "_type": "circle"! , "radius": 4! } {!

    "_type": "wheel"! , "radius": 4! , "inner_radius": 2! } {! "_type": "square"! , "length": 10 ! } {! "_type": "rectangle"! , "width": 30! , "length": 20! } {! "_type": "box"! , "length": 30! , "width": 20! , "height": 10! }
  19. Why Documents? •  Documents are self-contained pieces of data – 

    Closer map to the use case (structure and data) •  Flexible schema –  Schema defined in application code –  Add/Remove/Replace fields in a document without DB Schema changes •  Its easy to prototype •  Reactive to change
  20. Things to avoid •  Document DB is not a Silver

    bullet •  Using a Document DB as a Relational DB •  Normalized data model with lost of in App joins •  Massive documents –  Continuously growing arrays •  Wrong level of granularity (to low/to high)
  21. Hints and tips •  De-normalize first •  Map your document

    to your OO model •  Normalize as needed –  Break out sub documents into separate collection •  Using an ODM is not an excuse not to understand Document DB's •  Use the right tool for the right job