$30 off During Our Annual Pro Sale. View Details »

Using the percolator for simple classification

Using the percolator for simple classification

Presented at the February Search Meetup Munich

This presentation gives a very quick introduction to Elasticsearch's percolator and showcases the potential of performing document enrichment before indexing a document.

Elasticsearch Inc

February 05, 2015
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Alexander Reelsen
    @spinscale
    [email protected]
    Using the percolator for simple
    classification

    View Slide

  2. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    About...
    • ... Elasticsearch
    Founded 2012
    Offices in Mountain View, Amsterdam, London, Berlin, Phoenix
    VC by Benchmark, Index Ventures & NEA
    Trainings, Development/Production Support
    Products: Elasticsearch, Logstash, Kibana, Marvel, Shield
    • ... me
    joined early 2013
    interested in scalability/concurrency
    Core/Shield developer, blogger, trainer, supporter, speaker

    View Slide

  3. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Perco... what?
    • Search, but reversed
    • Normal: Indexing documents & executing queries
    • How about: Indexing queries and firing documents
    against it?

    View Slide

  4. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Example: Alerting
    PUT /stocks/.percolator/sell-immediately
    {
    "query": {
    "filtered": {
    "filter": {
    "bool": {
    "must": [
    { "term": { "name": “GOOG" } },
    { "range": { "value": { "gte": 1000 } } }
    ]
    }
    }
    }
    }
    }

    View Slide

  5. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Example: Alerting
    GET /stocks/stock/_percolate
    {
    "doc": {
    "name": "GOOG",
    "value": 1100.52
    }
    }

    View Slide

  6. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Example: Alerting
    GET /stocks/stock/_percolate
    {
    "doc": {
    "name": "GOOG",
    "value": 1100.52
    }
    }
    {
    ...
    "total": 1,
    "matches": [
    { "_index": "stocks", "_id": “sell-immediately" }
    ]
    }

    View Slide

  7. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Example: Back-in-stock notify
    PUT /products/.percolator/alr-XYZ
    {
    "query": {
    "filtered": {
    "filter": {
    "bool": {
    "must": [
    { "term": { "id": "XYZ" } },
    { "range": { "stock": { "gt": 1 } } }
    ]
    }
    }
    }
    }
    }

    View Slide

  8. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    GET /products/product/_percolate
    {
    "doc": {
    "id": "XYZ",
    "stock": 200
    }
    }
    Example: Back-in-stock notify

    View Slide

  9. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    GET /products/product/_percolate
    {
    "doc": {
    "id": "XYZ",
    "stock": 200
    }
    }
    {
    ...
    "total": 1,
    "matches": [
    { "_index": "products", "_id": "alr-XYZ" }
    ]
    }
    Example: Back-in-stock notify

    View Slide

  10. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Country tagging by lat/lon
    • Index all countries as geoshapes as percolator
    queries
    • Percolate a document with latitude/longitude
    • Get back the country
    • Index your document including the country in your
    index
    • Aggregate/Search by country!

    View Slide

  11. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Enrich before index
    • Document was classified (by lat/lon) before
    indexing!
    • So, what could we use this for...?

    View Slide

  12. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    How does it work?
    • All registered queries are loaded into memory
    • Each doc is indexed into in-memory index
    • All queries are executed against in-memory index
    • In-Memory-index gets removed
    • Matched queries are returned in response

    View Slide

  13. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    A blast from the past...
    Audi 90 quattro, 170PS, ABS, eFH,
    el.SP, WFS, 4WD, WR, ZV, FFB, EZ
    12/90, MFA, 140000 km, HU 12/15,
    5000 VB

    View Slide

  14. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    As JSON document
    {
    "brand" : "audi",
    "type" : "90",
    "engine" : {
    "hp" : "170",
    "cylinders" : "5",
    "capacity" : 2309
    },
    "price" : {
    "value" : 5000,
    "type" : "negotiable"
    },
    "registration" : "1990-12-01",
    "mileage" : {
    "value" : 40000,
    "unit" : "km""
    }
    "inspection" : "2015-12-01",
    "features" : [
    "anti-lockbraking-system",
    "power-windows",
    “remote-door-lock",
    "multi-functional-display",
    “power-mirrors",
    "anti-theft-protection",
    “4wd",
    "winter-tires",
    "central-locking" ]
    }

    View Slide

  15. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    As JSON document
    {
    "brand" : "audi",
    "type" : "90",
    "engine" : {
    "hp" : "170",
    "cylinders" : "5",
    "capacity" : 2309
    },
    "price" : {
    "value" : 5000,
    "type" : "negotiable"
    },
    "registration" : "1990-12-01",
    "mileage" : {
    "value" : 40000,
    "unit" : "km""
    }
    "inspection" : "2015-12-01",
    "features" : [
    "anti-lockbraking-system",
    "power-windows",
    “remote-door-lock",
    "multi-functional-display",
    “power-mirrors",
    "anti-theft-protection",
    “4wd",
    "winter-tires",
    "central-locking" ]
    }

    View Slide

  16. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    As JSON document
    "inspection" : "2015-12-01",
    "features" : [
    "anti-lockbraking-system",
    "power-windows",
    “remote-door-lock",
    "multi-functional-display",
    “power-mirrors",
    "anti-theft-protection",
    “4wd",
    "winter-tires",
    "central-locking" ]
    }
    ABS
    eFH
    FFB
    MFA
    el. SP
    WFS
    4WD
    WR
    ZV

    View Slide

  17. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Extracting manufacturer
    PUT /ads/.percolator/manufacturer-audi
    {
    "query": {
    "match": {
    "message": "audi"
    }
    }
    }

    View Slide

  18. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Extracting manufacturer
    GET /ads/ad/_percolate
    {
    "doc": {
    "message": "Audi 90 quattro, 170PS, ABS, eFH,
    el.SP, WFS, 4WD, WR, ZV, FFB, EZ 12/90, MFA, 140000
    km, HU 12/15, 5000 VB"
    }
    }
    { ...
    "total": 1,
    "matches": [
    {
    "_index": “ads", "_id": "manufacturer-audi"
    }
    ]
    }

    View Slide

  19. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Extracting features
    PUT /ads/.percolator/feature-4wd
    {
    "query": {
    "match": {
    "message": "4wd quattro 4matic awd"
    }
    }
    }

    View Slide

  20. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Features with typos
    PUT /ads/.percolator/feature-4wd
    {
    "query": {
    "bool": {
    "should": [
    {
    "match": { "message": "4wd quattro 4matic awd" }
    },
    {
    "match": {
    "message": { "query": “quattro", "fuzziness": 1 }
    }
    }
    ]
    }
    }
    }

    View Slide

  21. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Features with typos
    PUT /ads/.percolator/feature-4wd
    {
    "query": {
    "bool": {
    "should": [
    {
    "match": { "message": "4wd quattro 4matic awd" }
    },
    {
    "match": {
    "message": { "query": “quattro", "fuzziness": 1 }
    }
    }
    ]
    }
    }
    }

    View Slide

  22. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Use-cases
    • Alerting
    Price monitoring
    News alerts
    Stock alerts
    Logs
    • Enrich before indexing
    Targeted advertisement
    Classification/Extraction
    Wizards/Helpers

    View Slide

  23. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Thanks for listening!
    Alexander Reelsen
    @spinscale
    [email protected]
    We’re hiring!
    http://elasticsearch.com/jobs
    We’re helping!
    http://elasticsearch.com/support
    http://elasticsearch.com/training

    View Slide

  24. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Resources
    http://www.elasticsearch.org/blog/using-percolator-geo-tagging/
    http://www.elasticsearch.org/blog/percolator-redesign-blog-post/
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/
    current/search-percolate.html

    View Slide