Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dynamic Indexing at Parse

Dynamic Indexing at Parse

How the parse automatic indexing of Mongo collections works at parse.

Baed042ae9b7f16f1cdb04dd88c51742?s=128

Abhishek Kona

July 08, 2015
Tweet

Transcript

  1. MongoDB and automatic indexes at Parse +

  2. Parse Data • Allow users to create different classes. •

    Super Flexible - users can add columns at any time to a class. • Can query on any member of a class.
  3. Mongo the Good Parts • The mongo object model fits

    perfectly with the parse data model. • Each Class is backed by a Mongo Collection • Adding a new column is a no-op. • Also failover and replication goodness.
  4. Mongo and Compound indexes • The downside of flexible query

    model is the entire table might have to scanned. • Typically if a query looks at O(thousand) rows, we consider it slow • But - Mongo Supports Compound indexes. db.mongoMeetupAttendees.ensureIndex ({ age : 1, isFacebooker :1 , traveledFrom : 1})
  5. Parse Data + Compound Indexes • In most use cases,

    you look at the slow queries, figure out an useful index and create it. • For Parse, this does not work as the “User has all the power” • Can Add a column at any time” • Can start querying with a condition on a new field.
  6. Some Numbers • Parse has 150,000 apps. • Over 700K

    collections • ~20,000 active query types (daily)
  7. Our Solution(s) • Version 1 : Create all possible compound

    indexes. • Version 2 : Measure the entropy of a each column in a collection, use that information to create indices.
  8. Our Current solution Automate the index creation. Simple • Log

    the queries, strip out user data • Aggregate them based on query type., • Analyze the aggregated data and find out what compound indexes are useful. • Create them
  9. The Query Type • For each query our users run,

    we strip out the data and store a key which represents the query, that key is known as query type "ORDER": ["_created_at"], "catalog": "", "frequency": “", "type": }" is the key for { “_created_at” : “some-time”, “catalaog” : “Gucci”, “frequency” : “daily”, “type” : “print_article”, }
  10. Contd.. • Record each query as its query type. •

    Store counts for each query type - we use redis for this. • Create compound indexes by reading the query-type key.
  11. Parameters • we cannot created indices for all our queries,

    we might end up with >10K indices. • Indices consume memory, disk and increase write times. • we create indices for users if more than n% of their queries would be helped by an index. • Another factor : No of rows in a Collection, (small collections don’t need an index)
  12. Some index creation Gotchas’ • The system create’s one index

    per mongo- node (physical node). We are extra careful (theoretically we should be able to create one index at a given time per database). This is to keep the load low on the system. • Always create indices in the background : {background : true} in all ensureIndex calls • Index creation on the secondaries happens in a blocking way.
  13. Code A play version of the algorithm in Python is

    available at http://tiny.cc/parse_query_key
  14. Thanks I am Abhishek Kona, @sheki Parse has some pretty

    kickass people who deal with Mongo. Thanks @charity Shyam (bjornick)