Full Table Scan of NoSQL Indexing

A Full Table Scan of Indexing in NoSQL Will
LaForest Director of 10gen Federal will@10gen @WLaForest

Introduction

What I’m Not Doing •  Trying to discuss all factor’s
used to pick a database. Phew!" •  Related to indexing" •  Not discussing data partioning/sharding" •  Data locality" •  Caching and memory mapping techniques" •  Not brain washing you?" " "

What Indexes Can Help Us Do •  Find the “location”
of data" •  Based upon a value " •  Based upon a range" •  Geospatial" •  Fast checks for existence" •  Uniqueness enforcement" •  Sorting " •  Aggregation" •  Usually covering indexes"

Requisite Book Analogy

Factors to think about •  Insert/update speed. •  Search speed
•  Delete speed •  Space •  Range query speed •  Ordering •  Covering index

Inverted Index •  Inverted index maps a value to locations
or Ids" •  Often associated with search" •  Implied hash table index" "

B-Tree Index Operation Average Worst Case
Insert O(log n) O(log n) Search O(log n) O(log n) Delete O(log n) O(log n) Space O(n) O(n) •  Most common index in relational databases" •  Allows equality and range" •  Misconceptions about scalability" "

O(log n)

Hash Index •  Very fast for searching" •  Really only
good for existance ( =, !=)" •  Can’t be used for range queries or order by" •  Allows graceful caching (memory mapping, bloom ﬁlters)" Opera tion Avera ge Worst Case Insert O(1) O(1) Search O(1) O(n) Delete O(1) O(n) Space O(n) O(n)

Other Index types •  Covering indices" •  Stores not just
a pointer but some other info like" •  Frequencies" •  Associated data" •  Geospatial indices" •  Lots of different implementations" •  Geohashing" •  R-tree" •  Quad-tree" •  Bitmap indices"

Other Index Concepts •  Compound indices" •  one index on
an ordered set of ﬁelds" •  Speeds up query’s and sorts on a set of ﬁelds" •  Sparse indices" •  Don’t have entries for all records just matching" •  Clustered indices" •  Data organized on disk in the same order of index" •  Uniqueness" •  Roll your own index!

How did I choose?

•  Document Oriented" •  B-tree" •  Creation done by specifying
ﬁeld (ﬁelds)" •  Many rules that apply in MySQL are similar here" •  Indexes to any depth of the document" •  Indexes on string, int, double, boolean, date, bytearray, object, array, others •  Supports secondary indices, compound keys, sparse indices, unique constraints, geospatial indicies (geo hash)" •  Index and data updates atomic" •  Background index creation supported"

•  Key value" •  Hash indexes (optionally sorted) (Memtable +
SSTable) " •  Indexes on strings •  Row key can be considered to be “primary” key •  Row key must be unique •  Supports secondary indices (hash), sparse indices" •  Ranges and orders supported by sorting hash index" •  Index and data updates atomic" •  Bloom ﬁlter for index caching "

•  Columnar (multi-dimensional sorted map)" •  Sorted hash indexes" • 
Indexes on strings* •  Row key can be considered to be “primary” key •  Row key must be unique •  Supports sparse indices" •  Alphanumeric ranges supported on row key (sorted)*" •  Index and data updates atomic (WAL)"

•  Document oriented" •  B+tree (append only for MVCC)" • 
Indexes are actually views created by a map function" •  Views created lazily" •  Data stored at the leaf level (hence B+Tree)" •  Indexes types string, number, boolean, array, object" •  Supports secondary indices (multiple views), covering indices, sparse indices, geospatial indices (external module GeoCouch)" •  Index and data updates atomic (MVCC)" •  Background index creation supported"

•  Key value " •  Hash index" •  Keys are
binary safe strings •  Intrinsically sparse" •  Has a data type called a sorted set" •  Hash table + skip lists for O(log n) insertion" •  Index and data updates atomic"

•  XML Document Oriented" •  Hash index on terms (everything
is a term)" •  All elements, element values, words indexed by default" •  Supports secondary indices, range indices (for range and ordering), full text, geospatial indices, type compliance" •  Index and data updates atomic" •  Background index creation supported"

In Conclusion

Full Table Scan of NoSQL Indexing

Full Table Scan of NoSQL Indexing

Will LaForest

Other Decks in Technology

Featured

Transcript

A Full Table Scan of Indexing in NoSQL Will

Introduction

What I’m Not Doing •  Trying to discuss all factor’s

What Indexes Can Help Us Do •  Find the “location”

Requisite Book Analogy

Factors to think about •  Insert/update speed. •  Search speed

Inverted Index •  Inverted index maps a value to locations

B-Tree Index Operation Average Worst Case

O(log n)

Hash Index •  Very fast for searching" •  Really only

Other Index types •  Covering indices" •  Stores not just

Other Index Concepts •  Compound indices" •  one index on

How did I choose?

•  Document Oriented" •  B-tree" •  Creation done by specifying

•  Key value" •  Hash indexes (optionally sorted) (Memtable +

•  Columnar (multi-dimensional sorted map)" •  Sorted hash indexes" •

•  Document oriented" •  B+tree (append only for MVCC)" •

•  Key value " •  Hash index" •  Keys are

•  XML Document Oriented" •  Hash index on terms (everything

In Conclusion