Search and Analyze your Data

Honza Král @honzakral Search and Analyze your Data

Elasticsearch

Distributed Search Engine Open Source    Document-based    Based on
Lucene   JSON over HTTP

Document based JSON  Dynamic Schema  Some Relationships Nested Parent/Child

{ "id": 7635, "accepted_answer_id": 7641, "answer_count": 9, "title": "Are you
able to close your eyes and focus/think just on your code?", "body": "How do I ......?", "comment_count": 2, "comments": [{ "creation_date": "2010-09-27T19:31:27.200", "id": 9372, "owner": { "display_name": "sange", "id": 3092 }, "post_id": 7635, "text": "I sometimes close my eyes or stare at something ....." }, {......}], "favorite_count": 2, "last_activity_date": "2010-09-28T00:28:08.393", "owner": { "display_name": "flow", "id": 3761 }, "rating": 6, "tags": [ "focus", "concentration" ], "view_count": 368, "creation_date": "2010-09-27T19:16:57.757", "closed_date": "2011-11-13T12:12:05.937" } StackOverﬂow Question

Search

Full Text (unstructured) in or across ﬁelds phrase, fuzzy, ...
scan api for data extraction relies on analysis

Filtering (structured) exact matches, ranges, geo, ... fast cacheable as
bitsets core ﬁlters are cached, not compound ﬁlters (bool/and/or)

Suggesters terms, phrase "Did you mean?" completion FAST! custom score

Under the Hood

Bible concordance A simple form lists Biblical words alphabetically, with
indications to enable the inquirer to ﬁnd the passages of the Bible where the words occur. The ﬁrst concordance, completed in 1230, was undertaken under the guidance of Hugo de Saint-Cher (Hugo de Sancto Charo), assisted by fellow Dominicans.

Inverted Index

Building an inverted index "Django is a high-level Python Web
framework that encourages rapid development and clean, pragmatic design." django high level python web framework encourag rapid develop clean pragmat design fast

Inverted index python file_1.txt file_2.txt file_3.txt web file_2.txt file_3.txt file_2.txt
file_4.txt django file_3.txt flask jazz file_4.txt

search(python AND django) python file_1.txt file_2.txt file_3.txt file_2.txt file_4.txt django
file_3.txt flask jazz file_4.txt web file_2.txt file_3.txt

Phrase search python file_1.txt (4) file_2.txt (1, 3) file_3.txt (11,
42) web file_2.txt (2) file_3.txt (10)

search("python web") python file_1.txt (4) file_2.txt (1, 13) file_3.txt (11,
42) web file_2.txt (2) file_3.txt (10)

Merging sorted lists.

Flexible Easily distributable

Aggregations

Metrics in Buckets Buckets split documents into groups can be
nested Metrics calculated over documents in given bucket

Buckets terms bucket per field value - "category" significant terms
terms specific for this bucket range per range - "age" geo_range/geohash_grid distance ranges date_histogram buckets per time interval - "daily" ...

Metrics count/sum/avg/min/max/... (extended) stats including std deviation, sum of squares
etc top_hits cardinality percentiles ...

Mix and Match

Example "aggs" : { "states" : { "terms" : {
"field" : "state" }, "aggs" : { "age_groups" : { "histogram" : { "field" : "age", "interval" : 5 }, "aggs" : { "grades" : { "stats" : { "field" : "grade" } }, "gender" : { "terms" : { "field" : "male", "script" : "_value == 'T' ? 'M' : 'F'" }, "aggs" : { "grades" : { "stats" : { "field" : "grade" } } }... Analyze the grades per state Analyze per age_group Stats per state & age_group Stats per state, age_group & gender

in near real-time Calculated in one pass

Under the Hood

(uninverted inverted index) Field Data

Two Modes On-Demand in Memory default more memory, faster needs
to be preloaded   Doc Values on Disk build at index time a bit slower

Column store anyone?

Putting it all together Examples

Faceted Navigation

Facets & Filtering

Log Analysis

Kibana (+logstash data)

Thank You!   Honza Král twitter: @honzakral email: [email protected] Support:
http://elasticsearch.com/support Training: http://training.elasticsearch.com/ We are hiring: http://elasticsearch.com/about/jobs/

Search and Analyze your Data

Search and Analyze your Data

More Decks by Elasticsearch Inc

Other Decks in Technology

Featured

Transcript