Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mission Critical MongoDB - Kevin Calcagno, Lulu

mongodb
May 07, 2012
620

Mission Critical MongoDB - Kevin Calcagno, Lulu

Lulu is the leading self-publishing site on the web, with over 1.2 million books, eBooks, and calendars, each with its own user-generated metadata. This data is Lulu's entire business, and we learned a number of lessons migrating this mission critical data from PostgreSQL to MongoDB. This talk builds upon a September 2011 talk at Mongo Raleigh where we discussed why we chose a non-relational database. We'll also cover what we learned from backing MongoDB with NFS and from using Spring Data to integrate MongoDB into our code.

mongodb

May 07, 2012
Tweet

Transcript

  1. What is Lulu? Lulu is a company that makes the

    remarkable possible. Lulu enables anyone to publish, distribute, and sell their work for free. Our open publishing model empowers more creators to sell more content to more readers more profitably than ever before.
  2. Headquarters: Raleigh, N.C. Founded: 2002 Founder/CEO: Bob Young Operations: We

    operate Lulu.com in the U.S. and international versions of the site for customers in the U.K., France, Germany, Italy, Spain, and the Netherlands. Products: Hardcover and paperback books, eBooks, photo books, calendars, cookbooks, travel books Creators: 1.1 million Creator revenue: Creators keep 80% of the profit they set on their products. Growth: Approximately 20,000 titles published per month. Lulu At-A-Glance:
  3. Evolution of a System •  Before –  Single, tightly-coupled application

    –  Single, all-purpose database •  After –  Multiple, loosely-coupled applications –  API-based service layer –  Multiple, purpose-built databases Mission Critical MongoDB
  4. Publication Problems •  4.8 million projects •  1.3 million active

    products •  Lots of variety in product types •  Legacy data heavily normalized •  Evolving industry and data Mission Critical MongoDB
  5. Why MongoDB? •  Fit for our data –  Project: What

    the book is •  Owner, Title, Contributors, Copyright info •  Reference to files and active listing –  Listing: How the book is sold •  Pricing, Keywords, Categories, Distribution •  Fossilized copy of its project Mission Critical MongoDB
  6. Why MongoDB? •  Fit for our data Mission Critical MongoDB

    { "_id" : ObjectId("4f1a552f2a6d862190cba99b"), "ownerId" : "3731770", "listingId" : ObjectId("4f1a552f2a6d862190cba99c"), "createDate" : ISODate("2009-10-04T17:59:50.797Z"), "updateDate" : ISODate("2011-09-29T00:38:37.286Z"), "legacy" : { ... }, "productType" : "PERFECT_USTRADE_BW_STANDARD", "tagsInternal" : [ "Buy_Bestseller_Shelf”, "spotlight” ], "title" : "Memories of the Future - Volume 1", "contributors" : [ { "personaId" : 4071082, "name" : "Wil Wheaton", "role" : "AUTHOR” } ], "edition" : "Second Printing", "copyright" : { "year" : 2009, "owner" : "Wil Wheaton” }, "bookBlock" : "document:57662280", "cover" : "document:57662629", "language" : "ENG", "publisher" : "Monolith Press", "fileSize" : NumberLong(945626), "pages" : 138, "version" : 1 }
  7. Why MongoDB? •  Flexible schema •  Performance & Scalability • 

    Fit for our API model –  JSON à Objects à SQL à Objects à JSON –  JSON à Objects à JSON Mission Critical MongoDB
  8. Why MongoDB? •  Adoption •  Frameworks like Spring Data • 

    Commercial Support & Training •  It’s Cool Mission Critical MongoDB
  9. Spring Data •  Spring’s programming model + Mongo –  Connection

    & Driver Config, MongoTemplate helper –  Automatic implementation of Repositories –  Annotation based POJO mapping •  Challenges –  Not fully mature –  Hard things sometimes harder Mission Critical MongoDB
  10. Mass Data Migration •  Rule #1: You have bad data

    –  Good news: You’ll discover most of it… –  Bad news: … when it blows up your migration Mission Critical MongoDB
  11. Mass Data Migration •  Rule #1: You have bad data

    –  Good news: You’ll discover most of it… –  Bad news: … when it blows up your migration •  Rule #2: Make it fast –  Write a custom app or script –  Add indexes after migrating Mission Critical MongoDB
  12. Mass Data Migration •  Rule #3: Use production data – 

    See Rule #1 & #2 –  Compare results –  Monitor if you migrate over an extended time Mission Critical MongoDB
  13. Mass Data Migration •  Rule #3: Use production data – 

    See Rule #1 & #2 –  Compare results –  Monitor if you migrate over an extended time •  Rule #4: Plan for future migrations –  NoSQL is flexible, but don’t abuse it Mission Critical MongoDB
  14. Mission Critical Data •  Set the Write Concern –  Default

    is UNSAFE •  Monitor your data –  Nagios plugin: https://github.com/mzupan/nagios-plugin-mongodb –  http://blog.serverdensity.com/mongodb-monitoring/ –  10gen’s MongoDB Monitoring Service Mission Critical MongoDB
  15. Mission Critical Data •  You need more RAM than you

    think •  You need more indexes than you think •  Complexity vs. Flexibility –  NoSQL is “schema-less,” but your application isn’t –  Don’t allow your data to fall behind your data model Mission Critical MongoDB
  16. MongoDB on NFS •  Lulu’s environment is 99% virtualized – 

    VMware, Netapp storage, 10Gb Ethernet –  NFS had support from Netapp, and KISS •  Advantages –  Easy to manage, can grow/shrink file system –  Export production data to dev/test environments –  Excellent de-duplication rates –  Flexible backup schedule with snapshots Mission Critical MongoDB
  17. MongoDB on NFS •  Keys to a successful deployment – 

    Don’t overload your network interfaces –  Dedicated network interface for storage traffic –  Jumbo frames for storage reduces CPU load •  Challenges –  Traditional backup levels risk 24 hours of data –  Scripted backups aren’t as accurate as traditional RDBMS log shipping Mission Critical MongoDB