Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB @ eBay

mongodb
May 07, 2012
3.2k

MongoDB @ eBay

I will cover several eBay use cases where MongoDB is used and will touch on most interesting tech problems that we faced and decisions that we made.

mongodb

May 07, 2012
Tweet

Transcript

  1. MongoDB @ eBay Overview of strategy and use cases Yuri

    Finkelstein eBay Platform Architect [email protected] May 2012
  2. DB Scalability @ eBay §  eBay is one of the

    first and largest BASE environments based on Oracle DB •  Basic Availability •  Soft-state •  Eventual consistency §  Every database we use is shared and partitioned •  N logical hosts names are defined for each use case ahead of time •  These logical hosts are mapped to physical based on static mapping tables which are controlled by DBAs •  A common ORM framework called DAL provides powerful and consistent patterns for data scalability §  If the client provides a hint along with every DB query: •  DAL maps the hint to a logical host using one of N mapping schemes (ex: modulus, lookup table, range, etc) •  Logical host is then mapped to a physical using L-to-Ph map •  The query is sent to just one shard §  If the client does not have a hint, the query is sent to all shards and the results are joined on the client with the help of DAL framework §  Side-effects: •  Hint is not part of the query; client has to manage it •  Logical to Physical mapping scheme becomes extra piece of client configuration •  Shard rebalancing is “DBA magic” … Physical Master DB hosts … Logical DB hosts (shards) Applications DAL Framework Business Logic Hint (shard key) F1(Hint) Config DAL Framework Business Logic Hint (shard key) … F2(Hint) … Physical standby DB hosts App1 App2
  3. Key desired improvements §  All eBay site-facing applications use the

    scheme outlined above §  It’s proven to scale to tens of thousands of developers, petabytes of data, hundreds of millions of SQL queries per day §  But there is always room for improvements and new ideas •  ORM is not the fastest way to develop; how do we achieve faster development cycles and reduce schema mapping frictions? •  How do we add new attributes to tables faster and without DBA’s involvement? Schema free approach sounds interesting. •  Can we make the hint transparent, ex: auto-extract it from queries? •  Can we rebalance the data seamlessly and automatically? •  Can we add shards faster in order to scale out on demand and transparently to applications? •  How do we deploy new DBs to the cloud on demand? §  And what about performance? Can we use RAM more aggressively and seamlessly to speed up queries?
  4. Enters MongoDB §  We are playing with MongoDB since 2010.

    Why? §  Its scalability scheme is very similar to how we shard RDBMS •  Single master for writes, eventually consistent slaves for reads •  Horizontal partitioning of data sets is a norm at eBay •  MongoS is performing familiar scatter-gather and client- side merge-sorts §  We don’t use distributed transactions since day 1; transactional updates of multiple tables that we do use can be simulated by atomic updates of a single Mongo document §  MongoDB offers a number of features that help address our goals mentioned earlier: •  Developers love document model and schema-free persistence •  Hints are embedded into the queries •  MongoDB has automatic shard rebalancing •  Shards can be added on demand without application restart and data will be auto-rebalanced •  We can easily bring it up in the cloud since cloud machines have storage … ß---------- Shards -------à … … <- Replicas -> Morphia/Mongo Driver Business Logic MongoS Dynamic Config F(Shard Key) Document
  5. Case study #1: eBay Search Suggestions §  Search suggestion list

    is a MongoDB document indexed by word prefix as well as by some metadata: product category, search domain, etc. §  Must have < 60-70msec round trip end to end §  MongoDB query < 1.4msec §  Data set fits in RAM; 100-s M documents §  Data is bulk loaded once a day from Hadoop, but can be tweaked on demand during sale promotions, etc §  Single replica set, no shards in this case §  MongoDB benefits: •  Multiple indexes allow flexible lookups •  In-memory data placement ensures lookup speed •  Large data set is durable and replicated
  6. Case Study #2: Cloud Manager “State Hub” §  State Hub

    powers eBay Cloud §  Every resource provisioned by the cloud is represented by a single Mongo document §  Documents contain highly structured metadata reflecting roles and grouping of the resources §  Lookup by both primary and secondary indexes §  Several GB data sets, easily fit in RAM §  Documents are not uniform §  All resources have “State” field which is updated periodically to reflect health state of the underlying resource §  Mixed workload: lots of in-place writes, but also lots of read queries State Hub Provision Resources Query Resources and Topology Update resource state Mongo
  7. Case Study #3: eBay Merchandizing Info Cache §  Merchandizing backend

    powers eBay product/item classification and categorization §  Each MongoDB document represents a cluster of similar products §  Numerous relationships between clusters are modeled as document attributes §  Relationship hierarchy traversal is achieved by issuing a number of queries on “edge” attributes §  Each instance of such a hierarchy is called a model; there are lots of models §  Again, data set fits in RAM, single replica set §  Replica set members are located in 3 different data centers (3+2+2) with all members in a single data center having higher weight to avoid moving master away §  MongoDB benefits: •  Schema-free design and declarative indexes are perfect for this use case where new attributes and new queries are constantly being added •  Async replication across multiple data centers •  MongoDB Java Driver ensures automatic detection of proximity of clients to replica set members; reads with slaveOK=true are served from local data center nodes which insures low response latency Cluster1 Cluster2 Cluster3 R1 R2 R3
  8. Case Study #4: Zoom – Media Metadata Store §  This

    is a new mega project which is a work in progress §  MongoDB is being evaluated as a storage backend for all media-related metadata on the site (example: picture IDs with lots attributes) §  Requirements: •  Tens of TBs data set, Millions of documents: data set must be partitioned; this is our first use case where MongoDB sharding is used •  System of record for picture info; data can not be lost! •  Replication/DR across 2 data centers; local DC reads are required •  Queries are from site-facing flows; <10msec response time SLA •  Mixed workload: both inserts and reads are happening concurrently all the time §  Can MongoDB do it ??
  9. Zoom: Data Model §  2 main collections: Item and Image

    •  Item references multiple Images §  Item represents eBay Item: •  _id in Item is external ID of the item in eBay site DB •  These IDs are already sharded in balanced across N logical DB hosts using ID ranges •  We use MongoDB pre-split points for initial mapping our N site DB shards to M MongoDB shards •  This ensures good balance between the shards; §  Image represents a picture attached to an Item •  _id in Image is md5 of the image content •  This ensures good distribution across any number of shards •  Md5 is also used to find duplicate images §  Our choice of document IDs in both collections ensures good balance across Mongo shards §  We never query both collections in a single service request to ensure data consistency and to have only one index lookup
  10. Zoom: Service Topology and Configuration §  MongoS is deployed on

    app servers •  Ensures network IO on MongoS won’t become a bottleneck •  This is a very familiar pattern in eBay as was explained in the beginning of this presentation §  M shards; each replica set has 6 members •  3 + 3 in 2 data centers •  Master can be only in one DC during automatic failover; manual failover may activate another DC •  One slave in the secondary DC is invisible for reads and is dedicated to periodic backups/snapshots (more on this later) §  For reads, client first sets SlaveOK=true and if required document is not found flips to SlaveOK=false to read from Master §  Home-grown MongoDB configuration and monitoring agent is running on every node •  Fetches MongoD configuration from a central configuration store and saves it to local config file •  Manages lifecycle of MongoD •  Monitors state and metrics M M M M B B B B ß---- Shards -----à ß--- Replicas ---> ß--- DC1(Primary)---> ß-- DC2(Secondary)-->
  11. Zoom: Data Backup and Restore strategy §  Goals: •  Take

    periodic backups of the entire data set •  Be able to recover from backup •  Do not loose any writes that have happened after last snapshot •  Briefly service unavailability during recovery is better than data loss … §  Dual writes on the client •  Regular write to main cluster •  Second write to another Mongo cluster: single replica set, capped collection, the data written is similar to REDO log record §  Hidden slave in each shard has volume mounted on a remote storage appliance capable of instant file system snapshot; captures both DB files and journal files §  If DB recovery is activated: •  All MongoD on primary cluster are shutdown •  NFS slave is remounted to snapshot volume •  MongoD on this machine is started as a master •  MongoD on other replica set members are started cold •  Full sync-up from master •  Master is switched to a regular member •  Writes that occurred since time when the backup was taken are replayed from the REDO log capped collection in the secondary cluster •  Application M B … Instant Shapshot Capable device C Dual-write to capped collection Recovery Agent
  12. Key Learning §  MongoDB can be a very powerful tool

    but use it wisely §  Deletes can be slow; automatic balancer is dangerous; use it only when you must (example: be careful when adding new shards) §  Use explain for every query; disable full scans to discover inefficiencies early §  Query profiler is great §  Retry every failed query at least once; long tail in response times is possible when data set > RAM size