Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Full Table Scan of NoSQL Indexing

Full Table Scan of NoSQL Indexing

This is the deck I used for a presentation on different aspects of indexing in the NoSQL spectrum of offerings.

Avatar for Will LaForest

Will LaForest

January 20, 2012
Tweet

Other Decks in Technology

Transcript

  1. A  Full  Table  Scan  of  Indexing  in  NoSQL   Will

     LaForest   Director  of  10gen  Federal   will@10gen   @WLaForest  
  2. What I’m Not Doing •  Trying to discuss all factor’s

    used to pick a database. Phew!" •  Related to indexing" •  Not discussing data partioning/sharding" •  Data locality" •  Caching and memory mapping techniques" •  Not brain washing you?" " "
  3. What Indexes Can Help Us Do •  Find the “location”

    of data" •  Based upon a value " •  Based upon a range" •  Geospatial" •  Fast checks for existence" •  Uniqueness enforcement" •  Sorting " •  Aggregation" •  Usually covering indexes"
  4. Factors to think about •  Insert/update speed. •  Search speed

    •  Delete speed •  Space •  Range query speed •  Ordering •  Covering index
  5. Inverted Index •  Inverted index maps a value to locations

    or Ids" •  Often associated with search" •  Implied hash table index" "
  6. B-Tree Index Operation   Average   Worst   Case  

    Insert   O(log  n)   O(log  n)   Search   O(log  n)   O(log  n)   Delete   O(log  n)   O(log  n)   Space   O(n)   O(n)   •  Most common index in relational databases" •  Allows equality and range" •  Misconceptions about scalability" "
  7. Hash Index •  Very fast for searching" •  Really only

    good for existance ( =, !=)" •  Can’t be used for range queries or order by" •  Allows graceful caching (memory mapping, bloom filters)" Opera tion   Avera ge   Worst   Case   Insert   O(1)   O(1)   Search   O(1)   O(n)   Delete   O(1)   O(n)   Space   O(n)   O(n)  
  8. Other Index types •  Covering indices" •  Stores not just

    a pointer but some other info like" •  Frequencies" •  Associated data" •  Geospatial indices" •  Lots of different implementations" •  Geohashing" •  R-tree" •  Quad-tree" •  Bitmap indices"
  9. Other Index Concepts •  Compound indices" •  one index on

    an ordered set of fields" •  Speeds up query’s and sorts on a set of fields" •  Sparse indices" •  Don’t have entries for all records just matching" •  Clustered indices" •  Data organized on disk in the same order of index" •  Uniqueness" •  Roll your own index!
  10. •  Document Oriented" •  B-tree" •  Creation done by specifying

    field (fields)" •  Many rules that apply in MySQL are similar here" •  Indexes to any depth of the document" •  Indexes on string, int, double, boolean, date, bytearray, object, array, others •  Supports secondary indices, compound keys, sparse indices, unique constraints, geospatial indicies (geo hash)" •  Index and data updates atomic" •  Background index creation supported"
  11. •  Key value" •  Hash indexes (optionally sorted) (Memtable +

    SSTable) " •  Indexes on strings •  Row key can be considered to be “primary” key •  Row key must be unique •  Supports secondary indices (hash), sparse indices" •  Ranges and orders supported by sorting hash index" •  Index and data updates atomic" •  Bloom filter for index caching "
  12. •  Columnar (multi-dimensional sorted map)" •  Sorted hash indexes" • 

    Indexes on strings* •  Row key can be considered to be “primary” key •  Row key must be unique •  Supports sparse indices" •  Alphanumeric ranges supported on row key (sorted)*" •  Index and data updates atomic (WAL)"
  13. •  Document oriented" •  B+tree (append only for MVCC)" • 

    Indexes are actually views created by a map function" •  Views created lazily" •  Data stored at the leaf level (hence B+Tree)" •  Indexes types string, number, boolean, array, object" •  Supports secondary indices (multiple views), covering indices, sparse indices, geospatial indices (external module GeoCouch)" •  Index and data updates atomic (MVCC)" •  Background index creation supported"
  14. •  Key value " •  Hash index" •  Keys are

    binary safe strings •  Intrinsically sparse" •  Has a data type called a sorted set" •  Hash table + skip lists for O(log n) insertion" •  Index and data updates atomic"
  15. •  XML Document Oriented" •  Hash index on terms (everything

    is a term)" •  All elements, element values, words indexed by default" •  Supports secondary indices, range indices (for range and ordering), full text, geospatial indices, type compliance" •  Index and data updates atomic" •  Background index creation supported"