Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB at Groupon

MongoDB at Groupon

How the Merchant Data team at Groupon uses Mongo to manage its place data. Presented by Peter Bakkum.

Peter Bakkum

May 10, 2013
Tweet

Other Decks in Programming

Transcript

  1. Merchant   Data   CRM   Merchant   Pages  

    Self-­‐Service   Others  
  2. Arnold: Declarative Crowd-Machine Data Integration Shawn Jeffery, Liwen Sun, Matt

    DeLand, Nick Pendar, Rick Barber, Andrew Galdi CIDR 2013 cidrdb.org/cidr2013/Papers/CIDR13_Paper22.pdf
  3. Concordance { name: “Joe’s Pizza”, location: { address: “1000 Market

    St.”, postal_code: “94100-1001” }, source: 1 } { name: “Joes Pizza”, location: { address: “1000 Market Street”, postal_code: “94100” }, source: 2 } { name: “Joes”, location: { address: “1000 Market”, postal_code: “94100” }, source: 3 }
  4. Content  Input       Data  Sets   Input  

    Feeds   Normaliza@on   Crowd   Sourcing   Web   Crawling  
  5. places.find({ _id: “013e4e2afc26” }) placeCollection.find({ location.postcode: “94100”, location.country: “US” })

    places.findAndModify( { _id: “013e4e2afc26” persisted_at: “2013-02-01T0:00:00Z” }, { place model }) Concordance Persistence
  6. config cluster 4 arbiters 4 shards of 2 nodes replica

    set failover 64 GB dedicated hardware storm workers mongos routers
  7. UUID vB wwwwwwww-xxxx-byyy-yyyy-zzzzzzzzzzzz w: controllable counter x: process id b:

    literal 'b’ y: fragment of MAC address z: milliseconds since epoch (UTC)
  8. places.ns places.0 places.1 places.2 … places.0 places.1 extent   extent

      extent   MapReduce   Input  Split   MapReduce   Input  Split   MapReduce   Input  Split  
  9. public void map( Text key, WritableBSONObject value, Context context) {

    String id = (String) value.get(“_id”); ... }
  10. Mongo Cluster Hadoop Cluster MapReduce Job Backs up Mongo data

    to Hadoop Much faster data export Exploits our Hadoop cluster