Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dynamic Indexing at Parse

Dynamic Indexing at Parse

How the parse automatic indexing of Mongo collections works at parse.

Abhishek Kona

July 08, 2015
Tweet

More Decks by Abhishek Kona

Other Decks in Programming

Transcript

  1. Parse Data • Allow users to create different classes. •

    Super Flexible - users can add columns at any time to a class. • Can query on any member of a class.
  2. Mongo the Good Parts • The mongo object model fits

    perfectly with the parse data model. • Each Class is backed by a Mongo Collection • Adding a new column is a no-op. • Also failover and replication goodness.
  3. Mongo and Compound indexes • The downside of flexible query

    model is the entire table might have to scanned. • Typically if a query looks at O(thousand) rows, we consider it slow • But - Mongo Supports Compound indexes. db.mongoMeetupAttendees.ensureIndex ({ age : 1, isFacebooker :1 , traveledFrom : 1})
  4. Parse Data + Compound Indexes • In most use cases,

    you look at the slow queries, figure out an useful index and create it. • For Parse, this does not work as the “User has all the power” • Can Add a column at any time” • Can start querying with a condition on a new field.
  5. Some Numbers • Parse has 150,000 apps. • Over 700K

    collections • ~20,000 active query types (daily)
  6. Our Solution(s) • Version 1 : Create all possible compound

    indexes. • Version 2 : Measure the entropy of a each column in a collection, use that information to create indices.
  7. Our Current solution Automate the index creation. Simple • Log

    the queries, strip out user data • Aggregate them based on query type., • Analyze the aggregated data and find out what compound indexes are useful. • Create them
  8. The Query Type • For each query our users run,

    we strip out the data and store a key which represents the query, that key is known as query type "ORDER": ["_created_at"], "catalog": "", "frequency": “", "type": }" is the key for { “_created_at” : “some-time”, “catalaog” : “Gucci”, “frequency” : “daily”, “type” : “print_article”, }
  9. Contd.. • Record each query as its query type. •

    Store counts for each query type - we use redis for this. • Create compound indexes by reading the query-type key.
  10. Parameters • we cannot created indices for all our queries,

    we might end up with >10K indices. • Indices consume memory, disk and increase write times. • we create indices for users if more than n% of their queries would be helped by an index. • Another factor : No of rows in a Collection, (small collections don’t need an index)
  11. Some index creation Gotchas’ • The system create’s one index

    per mongo- node (physical node). We are extra careful (theoretically we should be able to create one index at a given time per database). This is to keep the load low on the system. • Always create indices in the background : {background : true} in all ensureIndex calls • Index creation on the secondaries happens in a blocking way.
  12. Code A play version of the algorithm in Python is

    available at http://tiny.cc/parse_query_key
  13. Thanks I am Abhishek Kona, @sheki Parse has some pretty

    kickass people who deal with Mongo. Thanks @charity Shyam (bjornick)