Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB, E-commerce and Transactions

Steve Francia
September 26, 2011

MongoDB, E-commerce and Transactions

How (and why) to use MongoDB for e-commerce, when it makes sense to use a RDBMS and how to blend the two.

Steve Francia

September 26, 2011
Tweet

Other Decks in Technology

Transcript

  1. • 15+ years building e-commerce • Long time open source

    contributor • Hacker, father, husband, skate punk • Chief Solutions Architect @ 10gen
  2. OpenSky was the first e-commerce site built on MongoDB ...

    also the first e-commerce site built on Symfony2
  3. Data dilemma of e-commerce • Stick to one vertical (Sane

    schema) • Flexibility (Insane schema) Pick One
  4. Sane schema • Works ... for a while • Fine

    for a few types of products • Not possible when more product types introduced
  5. Book Product Schema General Product attributes Book Specific attributes Product

    { id: sku: product dimensions: shipping weight: MSRP: price: description: ... author: Orson Scott Card title: Enders Game binding: Hardcover publication date: July 15, 1994 publisher name: Tor Science Fiction number of pages: 352 ISBN: 0812550706 language: English ...
  6. Album Product Schema General Product attributes stay the same Album

    Specific attributes are different Product { id: sku: product dimensions: shipping weight: MSRP: price: description: ... artist: MxPx title: Panic release date: June 7, 2005 label: Side One Dummy track listing: [ The Darkest ... language: English format: CD ...
  7. Jeans Product Schema General Product attributes stay the same Jeans

    specific attributes are totally different ... and not consistent across brands & make Product { id: sku: product dimensions: shipping weight: MSRP: price: description: ... brand: Lucky gender: Mens make: Vintage style: Straight Cut length: 34 width: 34 color: Hipster material: Cotten Blend ...
  8. as popularized by Magento “For purposes of flexibility, the Magneto

    database heavily utilizes an Entity-Attribute-Value (EAV) data model. As is often the case, the cost of flexibility is complexity - Magento is no exception. The process of manipulating data in Magento is often more “involved” than that typically experienced using traditional relational tables.” - Varien EAV
  9. EAV • Crazy SQL queries • Hundreds of joins in

    a query... or • Hundreds of queries joined in the application • No database enforced integrity
  10. Did I say crazy SQL (this is a single query)

    You may have trouble reading this in the back
  11. Single Table Inheritance (insanely wide tables) • No data integrity

    enforcement • Only can use FK for common elements • Very wasteful (but disk is cheap!) • Can’t effectively index
  12. Generic Columns • No data integrity enforcement • No data

    type enforcement • Only can use FK for common elements • Wasteful (but disk is cheap!) • Can’t index
  13. Serialized in Blob • Not searchable • No integrity •

    All the disadvantages of a document store, but none of the advantages • Never should be used • One exception is Oracle XML which operates similar to a document store
  14. Concrete Table Inheritance (a table for each product attribute set)

    • Allows for data integrity • Querying across attribute sets quite hard to do (lots of joins, OR statements and full table scanning) • New table needs to be created for each new attribute set
  15. Class table inheritance (single product table, each attribute set in

    own table) • Likely best solution within the constraint of SQL • Supports data type enforcement • No data integrity enforcement • Easy querying across categories (for browse pages) since common data in single table • Every set needs a new table • Requires a ton of forsight, as changes are very complicated
  16. MongoDB to the Rescue • Flexible (and sane) Schema •

    Easily searchable • Easily accessible
  17. MongoDB to the Rescue • Flexible (and sane) Schema •

    Easily searchable • Easily accessible • Fast
  18. { sku: "00e8da9c", type: "Audio Album", title: "Hoss", description: "by

    Lagwagon", asin: "B0000007QG", shipping: { weight: 6, dimensions: { width: 10, height: 10, depth: 1 }, }, pricing: { list: 1000, retail: 800, savings: 200, pct_savings: 20 }, details: { title: "Hoss", { sku: "00e8da9d", type: "Film", title: "The Matrix", description: "Set in the 22nd century, Th asin: "B000P0J0AQ", shipping: { weight: 6, dimensions: { width: 10, height: 10, depth: 1 }, }, pricing: { list: 1200, retail: 1100, savings: 100, pct_savings: 8.5 }, details: { title: "The Matrix",
  19. pct_savings: 20 }, details: { title: "Hoss", artist: "Lagwagon", genre:

    [ "Punk", "Hardcore", "Indie Rock" ], label: "Fat Wreck Chords", number_of_discs: 1, issue_date: "November 21, 1995", format: "CD", alternate_formats: [ 'Vinyl', 'MP3' ], tracks: [ "Kids Don't Like To Share", "Violins", "Name Dropping", "Bombs Away", "Move The Car", "Sleep", "Sick", "Rifle", "Weak", "Black Eye", "Bro Dependent", "Razor Burn", "Shaving Your Head", "Ride The Snake", ], pct_savings: 8.5 }, details: { title: "The Matrix", director: [ "Andy Wachowski", "Larry Wa writer: [ "Andy Wachowski", "Larry Wach actor: [ "Keanu Reeves" , "Lawrence Fis genre: [ "Science Fiction", "Action" ], number_of_discs: 1, issue_date: "May 15 2007", original_release_date: "1999", disc_format: "DVD", rating: "R", alternate_formats: [ 'VHS', 'Bluray' ], run_time: "136", studio: "Warner Bros", language: "English", format: [ "AC-3", "Closed-captioned", " aspect_ratio: "1.66:1" }, }
  20. { "_id": ObjectId("4d8ad78b46b731a22943d3d3"), "sku": "00e8da9d", "type": "Film", "name": "The Matrix",

    "description": "Set in the 22nd century, The Matrix...", "asin": "B000P0J0AQ", "shipping": { "weight": 6, "dimensions": { "width": 10, "height": 10, "depth": 1 } }, "pricing": { db.products.find( { 'name': "The Matrix" } );
  21. }, "pricing": { "list": 1000, "retail": 800, "savings": 200, "pct_savings":

    20 }, "details": { "title": "A Night at the Opera", "director": "Sam Wood", "actor": ["Groucho Marx", "Chico Marx", "Harpo Marx"], "genre": "Comedy", "number_of_discs": 1, "issue_date": "May 4 2004", "original_release_date": "1935", "disc_format": "DVD", db.products.find( { 'details.actor': "Groucho Marx" } );
  22. "list": 1200, "retail": 1100, "savings": 100, "pct_savings": 8 }, "details":

    { "title": "A Love Supreme [Original Recording Reissued]", "artist": "John Coltrane", "genre": ["Jazz", "General"], "format": "CD", "label": "Impulse Records", "number_of_discs": 1, "issue_date": "December 9, 1964", "alternate_formats": ["Vinyl", "MP3"], "tracks": [ "A Love Supreme Part I: Acknowledgement", db.products.find( { 'details.genre': "Jazz", 'details.format': "CD" } );
  23. }, "details": { "title": "It's a Wonderful Life", "director": "Frank

    Capra", "actor": ["James Stewart", "Donna Reed", "Lionel Barrymore"], "writer": [ "Frank Capra", "Albert Hackett", "Frances Goodrich", "Jo Swerling", "Michael Wilson" ], "genre": "Drama", "number_of_discs": 1, "issue_date": "Oct 31 2006", "original_release_date": "1947", db.products.find( { 'details.actor': { $all: ['James Stewart', 'Donna Reed'] } } );
  24. Embedded documents are great for orders • Ordered items need

    to be fixed at the time of purchase • Embed them right in the order db.order.find( { 'items.sku': '00e8da9f' } ); db.order.find( { 'items.details.actor': 'James Stewart' } ).count();
  25. Stricter data requirements for $$ • For financial systems any

    data inconsistency is unacceptable • Perhaps you’ve heard of ACID?
  26. Atomicity • MongoDB does atomic writes ... for single document

    changesets • $set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit
  27. Consistency • MongoDB can enforce unique keys ... but only

    on keys shared by every document in the collection
  28. Consistency • MongoDB can enforce unique keys ... but only

    on keys shared by every document in the collection • MongoDB can't enforce referential integrity
  29. Isolation • // Pseudo-isolated updates db.foo.update( { x : 1

    } , { $inc : { y : 1 } } , false , true );
  30. Isolation • // Pseudo-isolated updates db.foo.update( { x : 1

    } , { $inc : { y : 1 } } , false , true ); • // Isolated updates db.foo.update( { x : 1 , $atomic : 1 } , { $inc : { y : 1 } } , false , true );
  31. Isolation • // Pseudo-isolated updates db.foo.update( { x : 1

    } , { $inc : { y : 1 } } , false , true ); • // Isolated updates db.foo.update( { x : 1 , $atomic : 1 } , { $inc : { y : 1 } } , false , true ); • But there are caveats...
  32. Isolation • // Pseudo-isolated updates db.foo.update( { x : 1

    } , { $inc : { y : 1 } } , false , true ); • // Isolated updates db.foo.update( { x : 1 , $atomic : 1 } , { $inc : { y : 1 } } , false , true ); • But there are caveats... • Despite the $atomic keyword, this is not an atomic update, since atomicity implies “all or nothing”
  33. Isolation • // Pseudo-isolated updates db.foo.update( { x : 1

    } , { $inc : { y : 1 } } , false , true ); • // Isolated updates db.foo.update( { x : 1 , $atomic : 1 } , { $inc : { y : 1 } } , false , true ); • But there are caveats... • Despite the $atomic keyword, this is not an atomic update, since atomicity implies “all or nothing” • $atomic here means update is done without an interference from any other operation (isolated)
  34. Isolation • // Pseudo-isolated updates db.foo.update( { x : 1

    } , { $inc : { y : 1 } } , false , true ); • // Isolated updates db.foo.update( { x : 1 , $atomic : 1 } , { $inc : { y : 1 } } , false , true ); • But there are caveats... • Despite the $atomic keyword, this is not an atomic update, since atomicity implies “all or nothing” • $atomic here means update is done without an interference from any other operation (isolated) • An isolated update can only act on a single collection. Multi- collection updates are not transactional, thus not isolatable.
  35. • Atomic single document writes • If you need atomic

    writes across multi-document transactions don't use Mongo • Many if not most e-commerce transactions could be accomplished within a single document write
  36. • Atomic single document writes • If you need atomic

    writes across multi-document transactions don't use Mongo • Many if not most e-commerce transactions could be accomplished within a single document write • Unique indexes • This only works on keys used by the entire collection
  37. • Atomic single document writes • If you need atomic

    writes across multi-document transactions don't use Mongo • Many if not most e-commerce transactions could be accomplished within a single document write • Unique indexes • This only works on keys used by the entire collection • Isolated (not atomic) single collection updates. • Mongo does not support locking • There are ways to work around this
  38. • Atomic single document writes • If you need atomic

    writes across multi-document transactions don't use Mongo • Many if not most e-commerce transactions could be accomplished within a single document write • Unique indexes • This only works on keys used by the entire collection • Isolated (not atomic) single collection updates. • Mongo does not support locking • There are ways to work around this • It’s durable
  39. There are ways to guarantee ACID properties in MongoDB Here

    are 3 good approaches useful for E-commerce transactions
  40. Optimistic Concurrency • Read the current state of a product

    • Make your changes with the assertion that your product has the same state as it did when you last read it
  41. Optimistic concurrency in MongoDB We’ll use an update-if-current strategy. This

    example is straight from the documentation: > t = db.inventory > p = t.findOne({sku:'abc'}) > t.update({_id:p._id, qty:p.qty}, {'$inc': {qty: -1}}); > db.$cmd.findOne({getlasterror:1}); {"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it worked
  42. Optimistic concurrency in MongoDB We’ll use an update-if-current strategy. This

    example is straight from the documentation: > t = db.inventory > p = t.findOne({sku:'abc'}) > t.update({_id:p._id, qty:p.qty}, {'$inc': {qty: -1}}); > db.$cmd.findOne({getlasterror:1}); {"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it worked ... If that didn't work, try again until it does.
  43. Optimistic concurrency • Read the current state of a product.

    • Make your changes with the assertion that your product has the same state as it did when you last read it.
  44. Optimistic concurrency • Read the current state of a product.

    • Make your changes with the assertion that your product has the same state as it did when you last read it. • It is also possible to use OCC to bootstrap pessimistic concurrency and fake row level locking
  45. OCC works great for companies like Amazon • Amazon has

    a long-tail catalog • A long tail catalog lends itself well to optimistic concurrency, because it has low data contention
  46. Flash sales and auctions are defined by high data contention

    • The model doesn't work otherwise • They can't afford to be optimistic
  47. Flash sales and auctions are defined by high data contention

    • The model doesn't work otherwise • They can't afford to be optimistic • Order really matters
  48. 1. I go to Barneys and see a pair of

    shoes I just have to buy. 2. I call “dibs” (by grabbing them off the shelf). 3. I take them up to the cash register and purchase them: • Store inventory has been manually decremented. • I pay for them with my trusty AmEx. 4. If all goes according to plan, I walk out of the store. 5. If my card was declined, the shoes are “rolled back” ... out onto the shelves and sold to the next customer who wants them.
  49. 1. Select a product. 2. Update the document to hold

    inventory. • Store inventory has been decremented.
  50. 1. Select a product. 2. Update the document to hold

    inventory. • Store inventory has been decremented. 3. Purchase the product(s)
  51. 1. Select a product. 2. Update the document to hold

    inventory. • Store inventory has been decremented. 3. Purchase the product(s) • Process payment
  52. 1. Select a product. 2. Update the document to hold

    inventory. • Store inventory has been decremented. 3. Purchase the product(s) • Process payment 4. Roll back if anything went wrong.
  53. MongoDB e-commerce transactions • Each Item (not SKU) has it’s

    own document • Document consists of... • a reference to the SKU (product) • a state ( available / sold / ... ) • potentially other data (timestamp, order ref)
  54. Transactions in MongoDB We’ll use a simple update statement here.

    > t = db.inventory > sku = sku.findOne({sku:'abc'}) > t.update({ref_id:sku._id, state: 'available'}, {'$set': {state: 'ordered'}}); > db.$cmd.findOne({getlasterror:1}); {"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it worked
  55. Transactions in MongoDB We’ll use a simple update statement here.

    > t = db.inventory > sku = sku.findOne({sku:'abc'}) > t.update({ref_id:sku._id, state: 'available'}, {'$set': {state: 'ordered'}}); > db.$cmd.findOne({getlasterror:1}); {"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it worked ... If that didn't work, no inventory available
  56. Cart in Cart Action An added benefit, it can easily

    provide inventory hold in cart. > t = db.inventory > sku = sku.findOne({sku:'abc'}) > t.update({ref_id:sku._id, state: 'available'}, {'$set': {state: 'in cart'}}); > db.$cmd.findOne({getlasterror:1}); {"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it worked
  57. Cart in Cart Action An added benefit, it can easily

    provide inventory hold in cart. > t = db.inventory > sku = sku.findOne({sku:'abc'}) > t.update({ref_id:sku._id, state: 'available'}, {'$set': {state: 'in cart'}}); > db.$cmd.findOne({getlasterror:1}); {"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it worked just like reality, each item is either available, in a cart, or purchased
  58. Data to store in MongoDB • User • Product •

    Product/Sellable • Address • Cart • CreditCard • Event • TaxRate • ... and then I got tired of typing them in • Just imagine this list has 40 more classes • ...
  59. CREATE TABLE `product_inventory` ( `product_id` char(32) NOT NULL, `inventory` int(11)

    NOT NULL DEFAULT '0', PRIMARY KEY (`product_id`) ); CREATE TABLE `sellable_inventory` ( `sellable_id` char(32) NOT NULL, `inventory` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`sellable_id`) ); CREATE TABLE `orders` ( `id` int(11) NOT NULL AUTO_INCREMENT, `userId` char(32) NOT NULL, `shippingName` varchar(255) DEFAULT NULL, `shippingAddress1` varchar(255) DEFAULT NULL, `shippingAddress2` varchar(255) DEFAULT NULL, `shippingCity` varchar(255) DEFAULT NULL, `shippingState` varchar(2) DEFAULT NULL, `shippingZip` varchar(255) DEFAULT NULL, `billingName` varchar(255) DEFAULT NULL, `billingAddress1` varchar(255) DEFAULT NULL, `billingAddress2` varchar(255) DEFAULT NULL, `billingCity` varchar(255) DEFAULT NULL,
  60. Wait. How does inventory live in SQL? Isn’t that a

    property in one of your Mongo collections?
  61. CREATE TABLE `product_inventory` ( `product_id` char(32) NOT NULL, `inventory` int(11)

    NOT NULL DEFAULT '0', PRIMARY KEY (`product_id`) ); CREATE TABLE `sellable_inventory` ( `sellable_id` char(32) NOT NULL, `inventory` int(11) NOT NULL DEFAULT '0', PRIMARY KEY (`sellable_id`) ); CREATE TABLE `orders` ( `id` int(11) NOT NULL AUTO_INCREMENT, `userId` char(32) NOT NULL, `shippingName` varchar(255) DEFAULT NULL, `shippingAddress1` varchar(255) DEFAULT NULL, `shippingAddress2` varchar(255) DEFAULT NULL, `shippingCity` varchar(255) DEFAULT NULL, `shippingState` varchar(2) DEFAULT NULL, `shippingZip` varchar(255) DEFAULT NULL, `billingName` varchar(255) DEFAULT NULL, `billingAddress1` varchar(255) DEFAULT NULL, `billingAddress2` varchar(255) DEFAULT NULL, `billingCity` varchar(255) DEFAULT NULL,
  62. Inventory is transient • Product::$inventory is effectively a transient property

    • Note how I said “effectively”? ... we cheat and persist our transient property to MongoDB as well • We can do this because we never really trust the value stored in Mongo
  63. Accuracy is only important when there’s contention • For display,

    sorting and alerts, we can use the value stashed in MongoDB • It’s faster • It’s accurate enough
  64. Accuracy is only important when there’s contention • For display,

    sorting and alerts, we can use the value stashed in MongoDB • It’s faster • It’s accurate enough • For financial transactions, we want the security and comfort of our RDBMS.
  65. Inventory kept in sync with listeners • Every time a

    new product is created, its inventory is inserted in SQL
  66. Inventory kept in sync with listeners • Every time a

    new product is created, its inventory is inserted in SQL • Every time an order is placed, inventory is verified and decremented
  67. Inventory kept in sync with listeners • Every time a

    new product is created, its inventory is inserted in SQL • Every time an order is placed, inventory is verified and decremented • Whenever the SQL inventory changes, it is saved to MongoDB as well
  68. Be careful what you lock 1. Acquire inventory row lock

    and begin transaction 2. Check current product inventory 3. Decrement product inventory 4. Write the Order to SQL 5. Update affected MongoDB documents 6. Commit the transaction 7. Release product inventory lock
  69. /** @mongodb:Document(collection="products") */ class Product { /** @mongodb:Id */ private

    $id; /** @mongodb:String */ private $title; public function getId() { return $this->id; } public function getTitle() { return $this->title; } public function setTitle($title) { $this->title = $title; } }
  70. /** * @orm:Entity * @orm:Table(name="orders") * @orm:HasLifecycleCallbacks */ class Order

    { /** * @orm:Id @orm:Column(type="integer") * @orm:GeneratedValue(strategy="AUTO") */ private $id; /** * @orm:Column(type="string") */ private $productId; /** * @var Documents\Product */ private $product; // ... }
  71. Setting the Product class Order { // ... public function

    setProduct(Product $product) { $this->productId = $product->getId(); $this->product = $product; } }
  72. • $productId is mapped and persisted • $product which stores

    the Product instance is not a persistent entity property
  73. OrderPostLoadListener use Doctrine\ORM\Event\LifecycleEventArgs; class OrderPostLoadListener { public function postLoad(LifecycleEventArgs $eventArgs)

    { // get the order entity $order = $eventArgs->getEntity(); // get odm reference to order.product_id $productId = $order->getProductId(); $product = $this->dm->getReference('MyBundle:Document\Product', $productId); // set the product on the order $em = $eventArgs->getEntityManager(); $productReflProp = $em->getClassMetadata('MyBundle:Entity\Order') ->reflClass->getProperty('product'); $productReflProp->setAccessible(true); $productReflProp->setValue($order, $product); } }
  74. All Together Now // Create a new product and order

    $product = new Product(); $product->setTitle('Test Product'); $dm->persist($product); $dm->flush(); $order = new Order(); $order->setProduct($product); $em->persist($order); $em->flush(); // Find the order later $order = $em->find('Order', $order->getId()); // Instance of an uninitialized product proxy $product = $order->getProduct(); // Initializes proxy and queries the monogodb database echo "Order Title: " . $product->getTitle(); print_r($order);
  75. Read more about this technique Jon Wage, one of OpenSky’s

    engineers, first wrote about this technique on his personal blog: http://jwage.com You can read the full article here: http://jwage.com/2010/08/25/blending-the- doctrine-orm-and-mongodb-odm/