Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB: Scaling Citygrid's Places Platform - P...

mongodb
January 26, 2012

MongoDB: Scaling Citygrid's Places Platform - Prashanth Ramdas

MongoDB Los Angeles 2012

mongodb

January 26, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. •  2008 – 15 Million Calls •  2010 – 100+

    Million Calls •  2011 – 200+ Million Calls •  Today – ~ 300 Million Calls •  2012 – 1+ Billion Calls
  2. the largest content and ad network for local •  We

    aggregate advertising and content from local businesses, and distribute this data across our network of websites and mobile applications •  Citygrid Places Store has 19+ MM places and 1+ MM advertisers •  O&O sites and applications are Citysearch.com, Urbanspoon.com and InsiderPages.com
  3. Developer API’s •  Places Search API –  Request Parameters -

    What, Where, Latitude / Longitude, Radius –  XML, JSON, Protocol Buffer Responses –  SOLR Index •  Places Reviews / Offers API –  Request Parameters - What, Where, Latitude / Longitude, Listing ID –  XML, JSON, Protocol Buffer Responses –  SOLR Index •  Places Detail –  Request Parameters - Listing ID, Info USA ID, Public ID –  XML, JSON, Protocol Buffer Responses –  Mongo DB •  AD API’s •  Additional Information is at http://developer.citygridmedia.com
  4. Photos and Video Offers and Deals Categories Link to Website

    Owner’s Message Reservations Business Hours Social Media Links Merchant Review Response View Menus Metered Phone # Tagline Twitter Feed Places Detail API
  5. Places Detail API Architecture – Version 1 –  API was

    built on Oracle –  Data exists in Oracle as Normalized Tables –  Handling a request involved multiple database reads and expensive joins. Oracle is used by multiple applications –  As we continued to ingest more Places and Content the database query performance degraded –  Result : Low QPS and Un-predictable Latency on the Serving System
  6. Requirements for Store •  Scalability •  Built in Partitioning • 

    Built in Replication •  No Schema •  De-normalized Fast Document Reads •  Good Documentation / Support •  Mongo DB satisfied all our requirements!!
  7. Places Detail API Architecture Version 2 – Design Decisions • 

    Each Mongo Cluster is a two node replica set for redundancy •  Each Mongo Instance has two collections – Listing Collection and Content Collections –  Listing Collection has basic data about a Place –  Content Collection stores content received from multiple providers which gets matched against the listing •  Two Mongo Clusters – Load a Cluster Cold and swap it into serving –  Data Pipelines do not produce deltas hence we have full data loads –  Updating / Inserting during serving results in data syncs between replicas which resulted in degraded performance for the serving API
  8. The Listing Collection ! PRIMARY> db.listing.findOne({"public_id":"pinks-los-angeles"})! {! !"_id" : ObjectId("4f0c0e974e8ab89b6982d39e"),!

    !"public_id" : "pinks-los-angeles",! !"phone" : "2133878525",! !"cs_rating" : "8",! !"business_operation_status" : "1",! !"id_alternates" : ["cg:45457592”,"iusa:615760956”],! !"address" : {! ! !"street" : "326 S Western Ave",! ! !"city" : "Los Angeles",! ! !"postal_code" : "90020",! ! !"cross_street" : "",! ! !"latitude" : 34.0684,! ! !"longitude" : -118.3089,! ! !"state" : "CA”},! !"name" : "Pink's”! }! PRIMARY> !
  9. The Content Collection ! PRIMARY> db.content.findOne({public_id:” pi-on-sunset-los-angeles",cap_provider_id: {$in:[”0”,”1”]}})! { !

    !"_id" : "pi-on-sunset-los-angeles_0_70507571_image", ! !"width" : "216", ! !"public_id" : "pi-on-sunset-los-angeles", ! !"url" : "http://images.citysearch.net/assets/imgdb/auth_ws/2010/4/20/0/ ZtOIaiiG0.jpeg", "attribution_text" : "Citysearch", ! !"content_id" : "70507571", ! !"height" : "216", ! !"attribution_logo_path" : "http://images.citysearch.net/assets/imgdb/custom/ ue-357/CS_logo88x31.jpg", ! !"content_provider_name" : "CITYSEARCH", ! !"image_type" : "generic_image", ! !"listing_id" : "45228161", ! !"content_type" : "image", ! !"content_provider_id" : "5", ! !"cap_provider_id" : "0" ! }! PRIMARY>!
  10. Listing Collection Indexes ! PRIMARY> db.listing.getIndexes()! [{! ! !"v" :

    1,! ! !"key" : {"id_alternates" : 1},! ! !"ns" : "cgmdb.listing",! ! !"name" : "id_alternates_1"! !},{! ! !"v" : 1,! ! !"key" : {"phone" : 1},! ! !"ns" : "cgmdb.listing",! ! !"name" : "phone_1"! !},{! ! !"v" : 1,! ! !"key" : {"public_id" : 1},! ! !"ns" : "cgmdb.listing",! ! !"name" : "public_id_1"! !}! ]! PRIMARY>!
  11. Content Collection Indexes ! PRIMARY> db.content.getIndexes()! [{! ! !"v" :

    1,! ! !"key" : {! ! ! !"public_id" : 1! ! !},! ! !"ns" : "cgmdb.content",! ! !"name" : "public_id_1"! !}! ]! PRIMARY>!
  12. Mongo DB Performance •  EC2 M2.2XLARGE Boxes (34GB RAM, 4

    virtual cores) •  Response Times are in < 20 ms
  13. Issues with Version 2 Architecture •  Freshness of Data –

    Data Pipeline Schedules + Loads introduce delays •  Data Audit Scripts run on the same instance create temporary tables via Map Reduce Jobs. As data size has increased these temporary tables have caused performance issues. •  Cold load of Mongo DB and swapping add Operational overhead
  14. Places Detail API – Version 3 – Design Decisions • 

    Messages queues to throttle in coming data •  Re-use Mongo as storage to reuse Data Access Logic of Serving Systems •  Mongo performance in Version 2 satisfied business requirements •  Delete old content when fresh data is available from Long Term Mongo store
  15. Code samples: Loading the content store public void writeContent(List<Map<String,Object> in)

    throws Exception { DB db = mongo.getDB("cgmdb"); DBCollection content = db.getCollection("content"); List<BasicDBObject> out = new ArrayList<BasicDBObject>(); for (Map<String,Object> m : in) { out.add(new BasicDBObject(m)); } content.insert(out.toArray(new DBObject[out.size()])); }
  16. Code samples: Creating Morphia instance import com.google.code.morphia.Morphia; public class MorphiaFactory

    { public static Morphia create() { Morphia morphia = new Morphia(); morphia.mapPackage(”citygrid.api.listing.dto"); morphia.mapPackage(”citygrid.api.content.dto"); return morphia; } }
  17. Code samples: Morphia library makes mapping objects simple import com.google.code.morphia.annotations.Property;

    import com.google.code.morphia.annotations.Entity; import com.google.code.morphia.annotations.Id; @Entity public class Content { @Id private String id; @Property("listing_id") private String listingId; @Property("content_provider_id") private Integer contentProviderId; // etc. }
  18. Code samples: Fetch content from mongo and map to objects

    QueryBuilder qb = QueryBuilder.start(“listing_id") .is(lid).and(“content_provider_id").in(Arrays.asList({0,cpId})); List<Content> results = new ArrayList<Content>(); DBCollection dbc = mongo.getDB(“cgmdb").getCollection(“content"); DBCursor cursor = dbc.find(qb.get()); while (cursor.hasNext()) { DBObject dbo = cursor.next(); if (dbo != null) { Class clazz = super.typeOf(dbo); results.add(morphia.fromDBObject(clazz, dbo)); } }
  19. Data Monitoring •  Graph and Monitor all pieces of data

    throughout the system •  Content Counts by Type
  20. Data Monitoring Design and Issues •  Mongo DB queries are

    good for data lookups but to generate statistics and aggregations across large data sets map reduce is the suggested approach. •  Mongo DB supports Map Reduce commands for aggregations –  Creates a collection to hold the output –  This collection is subject to replication – caused production degradation last week when our content and listing sizes increased. –  The output collection was replicated between replicas –  Solution: Local DB not subject to replication –  Currently broken for map reduce on secondary: https://jira.mongodb.org/browse/SERVER-4264
  21. Code samples: Map Reduce Script •  Count Listings by Content

    Provider Id Map:Function() { if (this.content.provider_id!=null) { emit(this.content_provider_id,1); } } Reduce:Function(k,v) { var total = 0; for (var i=0; i<values.length; i++) { total+=values[i]; } return total; }
  22. Future Steps with Mongo •  Generate Deltas rather than Full

    Data Loads •  Shard Listing and Content Data •  Integrate Search, Ads and other end points with Mongo DB –  Prefix and fuzzy match performs better with less stored fields in SOLR –  Reduce SOLR memory footprint by putting stored fields in Mongo •  Spring Data Integration