Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using the percolator for simple classification

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Using the percolator for simple classification

Presented at the February Search Meetup Munich

This presentation gives a very quick introduction to Elasticsearch's percolator and showcases the potential of performing document enrichment before indexing a document.

Avatar for Elasticsearch Inc

Elasticsearch Inc

February 05, 2015
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Alexander Reelsen @spinscale [email protected] Using the percolator for simple classification
  2. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited About... • ... Elasticsearch Founded 2012 Offices in Mountain View, Amsterdam, London, Berlin, Phoenix VC by Benchmark, Index Ventures & NEA Trainings, Development/Production Support Products: Elasticsearch, Logstash, Kibana, Marvel, Shield • ... me joined early 2013 interested in scalability/concurrency Core/Shield developer, blogger, trainer, supporter, speaker
  3. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Perco... what? • Search, but reversed • Normal: Indexing documents & executing queries • How about: Indexing queries and firing documents against it?
  4. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Example: Alerting PUT /stocks/.percolator/sell-immediately { "query": { "filtered": { "filter": { "bool": { "must": [ { "term": { "name": “GOOG" } }, { "range": { "value": { "gte": 1000 } } } ] } } } } }
  5. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Example: Alerting GET /stocks/stock/_percolate { "doc": { "name": "GOOG", "value": 1100.52 } }
  6. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Example: Alerting GET /stocks/stock/_percolate { "doc": { "name": "GOOG", "value": 1100.52 } } { ... "total": 1, "matches": [ { "_index": "stocks", "_id": “sell-immediately" } ] }
  7. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Example: Back-in-stock notify PUT /products/.percolator/alr-XYZ { "query": { "filtered": { "filter": { "bool": { "must": [ { "term": { "id": "XYZ" } }, { "range": { "stock": { "gt": 1 } } } ] } } } } }
  8. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited GET /products/product/_percolate { "doc": { "id": "XYZ", "stock": 200 } } Example: Back-in-stock notify
  9. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited GET /products/product/_percolate { "doc": { "id": "XYZ", "stock": 200 } } { ... "total": 1, "matches": [ { "_index": "products", "_id": "alr-XYZ" } ] } Example: Back-in-stock notify
  10. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Country tagging by lat/lon • Index all countries as geoshapes as percolator queries • Percolate a document with latitude/longitude • Get back the country • Index your document including the country in your index • Aggregate/Search by country!
  11. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Enrich before index • Document was classified (by lat/lon) before indexing! • So, what could we use this for...?
  12. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited How does it work? • All registered queries are loaded into memory • Each doc is indexed into in-memory index • All queries are executed against in-memory index • In-Memory-index gets removed • Matched queries are returned in response
  13. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited A blast from the past... Audi 90 quattro, 170PS, ABS, eFH, el.SP, WFS, 4WD, WR, ZV, FFB, EZ 12/90, MFA, 140000 km, HU 12/15, 5000 VB
  14. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited As JSON document { "brand" : "audi", "type" : "90", "engine" : { "hp" : "170", "cylinders" : "5", "capacity" : 2309 }, "price" : { "value" : 5000, "type" : "negotiable" }, "registration" : "1990-12-01", "mileage" : { "value" : 40000, "unit" : "km"" } "inspection" : "2015-12-01", "features" : [ "anti-lockbraking-system", "power-windows", “remote-door-lock", "multi-functional-display", “power-mirrors", "anti-theft-protection", “4wd", "winter-tires", "central-locking" ] }
  15. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited As JSON document { "brand" : "audi", "type" : "90", "engine" : { "hp" : "170", "cylinders" : "5", "capacity" : 2309 }, "price" : { "value" : 5000, "type" : "negotiable" }, "registration" : "1990-12-01", "mileage" : { "value" : 40000, "unit" : "km"" } "inspection" : "2015-12-01", "features" : [ "anti-lockbraking-system", "power-windows", “remote-door-lock", "multi-functional-display", “power-mirrors", "anti-theft-protection", “4wd", "winter-tires", "central-locking" ] }
  16. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited As JSON document "inspection" : "2015-12-01", "features" : [ "anti-lockbraking-system", "power-windows", “remote-door-lock", "multi-functional-display", “power-mirrors", "anti-theft-protection", “4wd", "winter-tires", "central-locking" ] } ABS eFH FFB MFA el. SP WFS 4WD WR ZV
  17. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Extracting manufacturer PUT /ads/.percolator/manufacturer-audi { "query": { "match": { "message": "audi" } } }
  18. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Extracting manufacturer GET /ads/ad/_percolate { "doc": { "message": "Audi 90 quattro, 170PS, ABS, eFH, el.SP, WFS, 4WD, WR, ZV, FFB, EZ 12/90, MFA, 140000 km, HU 12/15, 5000 VB" } } { ... "total": 1, "matches": [ { "_index": “ads", "_id": "manufacturer-audi" } ] }
  19. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Extracting features PUT /ads/.percolator/feature-4wd { "query": { "match": { "message": "4wd quattro 4matic awd" } } }
  20. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Features with typos PUT /ads/.percolator/feature-4wd { "query": { "bool": { "should": [ { "match": { "message": "4wd quattro 4matic awd" } }, { "match": { "message": { "query": “quattro", "fuzziness": 1 } } } ] } } }
  21. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Features with typos PUT /ads/.percolator/feature-4wd { "query": { "bool": { "should": [ { "match": { "message": "4wd quattro 4matic awd" } }, { "match": { "message": { "query": “quattro", "fuzziness": 1 } } } ] } } }
  22. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Use-cases • Alerting Price monitoring News alerts Stock alerts Logs • Enrich before indexing Targeted advertisement Classification/Extraction Wizards/Helpers
  23. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Thanks for listening! Alexander Reelsen @spinscale [email protected] We’re hiring! http://elasticsearch.com/jobs We’re helping! http://elasticsearch.com/support http://elasticsearch.com/training
  24. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Resources http://www.elasticsearch.org/blog/using-percolator-geo-tagging/ http://www.elasticsearch.org/blog/percolator-redesign-blog-post/ http://www.elasticsearch.org/guide/en/elasticsearch/reference/ current/search-percolate.html