Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Search and Analyze your Data

Search and Analyze your Data

Presentation from Øredev in Malmö by Honza Král, talking about different features of Elasticsearch, including their underlying implementation and use cases.

Elasticsearch Inc

November 07, 2014
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. { "id": 7635, "accepted_answer_id": 7641, "answer_count": 9, "title": "Are you

    able to close your eyes and focus/think just on your code?", "body": "How do I ......?", "comment_count": 2, "comments": [{ "creation_date": "2010-09-27T19:31:27.200", "id": 9372, "owner": { "display_name": "sange", "id": 3092 }, "post_id": 7635, "text": "I sometimes close my eyes or stare at something ....." }, {......}], "favorite_count": 2, "last_activity_date": "2010-09-28T00:28:08.393", "owner": { "display_name": "flow", "id": 3761 }, "rating": 6, "tags": [ "focus", "concentration" ], "view_count": 368, "creation_date": "2010-09-27T19:16:57.757", "closed_date": "2011-11-13T12:12:05.937" } StackOverflow Question
  2. Full Text (unstructured) in or across fields phrase, fuzzy, ...

    scan api for data extraction relies on analysis
  3. Filtering (structured) exact matches, ranges, geo, ... fast cacheable as

    bitsets core filters are cached, not compound filters (bool/and/or)
  4. Bible concordance A simple form lists Biblical words alphabetically, with

    indications to enable the inquirer to find the passages of the Bible where the words occur. The first concordance, completed in 1230, was undertaken under the guidance of Hugo de Saint-Cher (Hugo de Sancto Charo), assisted by fellow Dominicans.
  5. Building an inverted index "Django is a high-level Python Web

    framework that encourages rapid development and clean, pragmatic design." django high level python web framework encourag rapid develop clean pragmat design fast
  6. Metrics in Buckets Buckets split documents into groups can be

    nested Metrics calculated over documents in given bucket
  7. Buckets terms bucket per field value - "category" significant terms

    terms specific for this bucket range per range - "age" geo_range/geohash_grid distance ranges date_histogram buckets per time interval - "daily" ...
  8. Example "aggs" : { "states" : { "terms" : {

    "field" : "state" }, "aggs" : { "age_groups" : { "histogram" : { "field" : "age", "interval" : 5 }, "aggs" : { "grades" : { "stats" : { "field" : "grade" } }, "gender" : { "terms" : { "field" : "male", "script" : "_value == 'T' ? 'M' : 'F'" }, "aggs" : { "grades" : { "stats" : { "field" : "grade" } } }... Analyze the grades per state Analyze per age_group Stats per state & age_group Stats per state, age_group & gender
  9. Two Modes On-Demand in Memory default more memory, faster needs

    to be preloaded 
 Doc Values on Disk build at index time a bit slower
  10. Thank You! 
 Honza Král twitter: @honzakral email: [email protected] Support:

    http://elasticsearch.com/support Training: http://training.elasticsearch.com/ We are hiring: http://elasticsearch.com/about/jobs/