$30 off During Our Annual Pro Sale. View Details »

Beyond the basics with Elasticsearch

Beyond the basics with Elasticsearch

Elasticsearch Inc

June 08, 2015
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. Beyond the basics
    with Elasticsearch
    Honza Král
    @honzakral

    View Slide

  2. Search

    View Slide

  3. Bible concordance
    A simple form lists Biblical words alphabetically, with indications
    to enable the inquirer to find the passages of the Bible where
    the words occur.
    The first concordance, completed in 1230, was undertaken
    under the guidance of Hugo de Saint-Cher (Hugo de Sancto
    Charo), assisted by fellow Dominicans.

    View Slide

  4. Inverted index
    python file_1.txt file_2.txt file_3.txt
    web file_2.txt file_3.txt
    file_2.txt file_4.txt
    django
    file_3.txt
    flask
    jazz file_4.txt

    View Slide

  5. "Python" and "Django"
    python file_1.txt file_2.txt file_3.txt
    file_2.txt file_4.txt
    django
    file_3.txt
    flask
    jazz file_4.txt
    web file_2.txt file_3.txt

    View Slide

  6. Reversed search
    index queries, run documents
    Alerting
    threshold reached
    Classification
    language, geo
    Percolator

    View Slide

  7. Geo Classification
    curl -XPUT 'localhost:9200/events/.percolator/city-warsaw' -d '{
    "query": {"filtered": {"filter": {
    "geo_shape": {
    "location": {
    "indexed_shape": {
    "index": "shapes",
    "type": "city",
    "id": "warsaw",
    "path": "area"
    }
    }
    }
    }
    } } }'

    View Slide

  8. Language Detection
    curl -XPUT 'localhost:9200/events/.percolator/lang-polish' -d '{
    "query": {
    "match": {
    "description": {
    "query": "cześć chrząszcz Żubrówka Wyborowa kur...",
    "type": "boolean",
    "minimum_should_match": 4
    }
    }
    }
    }'

    View Slide

  9. Classification
    curl -XPUT 'localhost:9200/events/conf/_percolate' -d '{
    "doc": {
    "title": "pywaw summit",
    "description": "konferencja zorganizowana dla
    wszystkich zainteresowanych w technologii związanej z
    Pythona...",
    "location" : "40.73,-74.1"
    }
    }'
    {
    ...
    "matches" : [
    {"_index" : "events", "_id" : "city-warsaw"},
    {"_index" : "events", "_id" : "lang-polish"},
    {"_index" : "events", "_id" : "topic-python"}
    ]
    }

    View Slide

  10. Relevancy

    View Slide

  11. Standard is TF-IDF
    good for text
    Lucene adds on top
    factors text length etc.
    Sometimes not good enough
    user contributed "quality"
    time/space decay
    ...
    Relevancy
    0
    0.2
    0.4
    0.6
    0.8
    1
    20 30 40 50 60
    score
    age
    "gauss"
    "exp"
    "lin"
    decay
    reference
    scale
    offset

    View Slide

  12. Hotel Search
    "function_score": {
    "query": {"match": {"name": "grand hotel"}},
    "functions": [
    {
    "filter": {"terms": {"facilities": ["balcony"]}},
    "boost_factor": 2
    },{
    "gauss": {
    "field": "location",
    "scale": "1km",
    "reference": [51,0]
    }
    },{
    "field_value_factor": {"field“: "popularity"}
    },{
    "random_score": {}
    }
    ]
    }

    View Slide

  13. Aggregations

    View Slide

  14. View Slide

  15. Recommendations

    View Slide

  16. Example: recommendations
    Product A
    user A
    purchase
    history
    Product B
    Product C
    user B
    purchase
    history
    Product D
    Users represented as documents
    Products represented as terms

    View Slide

  17. Dumb approach to recommendation (no relevance)
    Popular != Relevant
    {
    "query": {
    "terms": { "artists": user_likes}
    },
    "aggs":{
    "popular":{
    "terms":{
    "field": "artists",
    "exclude": user_likes
    }
    }
    }
    }

    View Slide

  18. Strong recommendations based on relevance
    Use the score!
    Compare to background
    {
    "query": {
    "terms": { "artists": user_likes}
    },
    "aggs":{
    "significant":{
    "significant_terms":{
    "field": "artists",
    "exclude": user_likes
    }
    }
    }
    }

    View Slide

  19. www.elastic.co
    Super-connected nodes in graphs
    We just figured out the way to surf only the
    meaningful connections in a graph!
    Concept A Concept B
    Concept C
    useful
    useless

    View Slide

  20. Meaningful
    Connections

    View Slide

  21. www.elastic.co
    Example: wikipedia
    Yoko Ono
    John
    Lennon
    article
    John Lennon
    United States Beatles
    article
    Ringo Starr
    Wikipedia articles are documents
    Links and page titles are terms

    View Slide

  22. Aggregation + Relevancy
    is how we look at world

    View Slide

  23. Thanks!
    Honza Král
    @honzakral

    View Slide