Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Search and Analyze your Data

Search and Analyze your Data

Presentation from Øredev in Malmö by Honza Král, talking about different features of Elasticsearch, including their underlying implementation and use cases.

Elasticsearch Inc

November 07, 2014
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. Honza Král @honzakral Search and Analyze your Data

  2. Elasticsearch

  3. Distributed Search Engine Open Source
 
 Document-based
 
 Based on

    Lucene 
 JSON over HTTP
  4. Document based JSON
 Dynamic Schema
 Some Relationships Nested Parent/Child

  5. { "id": 7635, "accepted_answer_id": 7641, "answer_count": 9, "title": "Are you

    able to close your eyes and focus/think just on your code?", "body": "How do I ......?", "comment_count": 2, "comments": [{ "creation_date": "2010-09-27T19:31:27.200", "id": 9372, "owner": { "display_name": "sange", "id": 3092 }, "post_id": 7635, "text": "I sometimes close my eyes or stare at something ....." }, {......}], "favorite_count": 2, "last_activity_date": "2010-09-28T00:28:08.393", "owner": { "display_name": "flow", "id": 3761 }, "rating": 6, "tags": [ "focus", "concentration" ], "view_count": 368, "creation_date": "2010-09-27T19:16:57.757", "closed_date": "2011-11-13T12:12:05.937" } StackOverflow Question
  6. Search

  7. Full Text (unstructured) in or across fields phrase, fuzzy, ...

    scan api for data extraction relies on analysis
  8. Filtering (structured) exact matches, ranges, geo, ... fast cacheable as

    bitsets core filters are cached, not compound filters (bool/and/or)
  9. Suggesters terms, phrase "Did you mean?" completion FAST! custom score

  10. Under the Hood

  11. Bible concordance A simple form lists Biblical words alphabetically, with

    indications to enable the inquirer to find the passages of the Bible where the words occur. The first concordance, completed in 1230, was undertaken under the guidance of Hugo de Saint-Cher (Hugo de Sancto Charo), assisted by fellow Dominicans.
  12. Inverted Index

  13. Building an inverted index "Django is a high-level Python Web

    framework that encourages rapid development and clean, pragmatic design." django high level python web framework encourag rapid develop clean pragmat design fast
  14. Inverted index python file_1.txt file_2.txt file_3.txt web file_2.txt file_3.txt file_2.txt

    file_4.txt django file_3.txt flask jazz file_4.txt
  15. search(python AND django) python file_1.txt file_2.txt file_3.txt file_2.txt file_4.txt django

    file_3.txt flask jazz file_4.txt web file_2.txt file_3.txt
  16. Phrase search python file_1.txt (4) file_2.txt (1, 3) file_3.txt (11,

    42) web file_2.txt (2) file_3.txt (10)
  17. search("python web") python file_1.txt (4) file_2.txt (1, 13) file_3.txt (11,

    42) web file_2.txt (2) file_3.txt (10)
  18. Merging sorted lists.

  19. Flexible Easily distributable

  20. Aggregations

  21. Metrics in Buckets Buckets split documents into groups can be

    nested Metrics calculated over documents in given bucket
  22. Buckets terms bucket per field value - "category" significant terms

    terms specific for this bucket range per range - "age" geo_range/geohash_grid distance ranges date_histogram buckets per time interval - "daily" ...
  23. Metrics count/sum/avg/min/max/... (extended) stats including std deviation, sum of squares

    etc top_hits cardinality percentiles ...
  24. Mix and Match

  25. Example "aggs" : { "states" : { "terms" : {

    "field" : "state" }, "aggs" : { "age_groups" : { "histogram" : { "field" : "age", "interval" : 5 }, "aggs" : { "grades" : { "stats" : { "field" : "grade" } }, "gender" : { "terms" : { "field" : "male", "script" : "_value == 'T' ? 'M' : 'F'" }, "aggs" : { "grades" : { "stats" : { "field" : "grade" } } }... Analyze the grades per state Analyze per age_group Stats per state & age_group Stats per state, age_group & gender
  26. in near real-time Calculated in one pass

  27. Under the Hood

  28. (uninverted inverted index) Field Data

  29. Two Modes On-Demand in Memory default more memory, faster needs

    to be preloaded 
 Doc Values on Disk build at index time a bit slower
  30. Column store anyone?

  31. Putting it all together Examples

  32. Faceted Navigation

  33. Facets & Filtering

  34. Log Analysis

  35. Kibana (+logstash data)

  36. Thank You! 
 Honza Král twitter: @honzakral email: [email protected] Support:

    http://elasticsearch.com/support Training: http://training.elasticsearch.com/ We are hiring: http://elasticsearch.com/about/jobs/