Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tokyo - ES study session III - Geohashes

chilling
February 07, 2014

Tokyo - ES study session III - Geohashes

My talk about Geohashes in Elasticsearch on the 3rd sES tudy session in Tokyo/Japan (February 7th 2014)

chilling

February 07, 2014
Tweet

Other Decks in Technology

Transcript

  1. [email protected] @flo_schilling chilling Install Marvel elasticsearch:0.90.11 schilling$ bin/plugin -i elasticsearch/marvel/latest

    -> Installing elasticsearch/marvel/latest... Trying http://download.elasticsearch.org/elasticsearch/marvel/marvel-latest.zip... Downloading ........................................................................ .................................................................................... .................................................................................... .................................................................................... .................................................................................... .................................................................................... .................................................................................... .................................................................................... .................................................................................... .................................................................................... .........................................................................DONE Installed elasticsearch/marvel/latest into /Downloads/elasticsearch/0.90.11/plugins/ marvel elasticsearch:0.90.11 schilling$ bin/elasticsearch -f
  2. [email protected] @flo_schilling chilling Behind the scenes • An Elasticsearch Cluster

    contains a collection of nodes • A node is a running instance of Elasticsearch • The cluster handles a set of indices • Each index is split into shards • Shards may have replicas on several nodes Cluster Node-1 Index1 Shard1 Replica2 Index2 Shard3 Replica1 Node-2 Index1 Shard2 Replica4 Index2 Shard4 Replica3
  3. [email protected] @flo_schilling chilling Behind the scenes D1 T1 D1 T2

    D1 T3 D1 T4 D1 D2 D3 D2 D3 D4 D1 D2 Documents Terms • For each index a reverse index is stored • The reverse index links all terms to the documents containing a certain term • Hence finding all documents containing a term is simple
  4. [email protected] @flo_schilling chilling REST API • Elasticsearch provides a REST

    API • This API allows to communicate with the cluster via JSON • Documents in Elasticsearch are Hashmaps notated in JSON
  5. [email protected] @flo_schilling chilling REST API – Create an Index curl

    -XPOST localhost:9200/tweets -d '{ "settings": { "number_of_shards": 1, "number_of_replicas": 2 }, "mappings": { "tweet": { "_source": { "enabled": false }, "properties": { "user": { "type": "string", "index": "not_analyzed"}, "message": { "type": "string", "index": "not_analyzed"}, "location": { "type": "geo_point", "geohash": true, "geohash_prefix": true, "geohash_precision": "3m", "neighbors": true } } } } }'
  6. [email protected] @flo_schilling chilling REST API - Documents • Create a

    document curl -XPOST 127.0.0.1/tweets/tweet/1 -d '{ "user": "chilling", "message" : "Check out ElasticSearch 0.90.11!" "location" : "52.48677, 13.39164" }' • Get a document curl -XGET 127.0.0.1/tweets/tweet/1 • Update a document curl -XPOST 127.0.0.1/tweets/tweet/1 -d '{ "user": "chilling", "message" : "Check out Elasticsearch 1.0!" "location" : "52.48677, 13.39164" }' • Delete a document curl -XDELETE 127.0.0.1/tweets/tweet/1
  7. [email protected] @flo_schilling chilling REST API - Search curl -XGET localhost:9200/tweets/tweet/_search

    -d '{ "query": "term" : { "user" : "chilling" } }' { "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits":{ "total": 1, "hits" : [ { "_index": "tweets", "_type": "tweet", "_id": "1", "_source": { "user": "chilling", "location": "52.48677, 13.39164", "message": "Check out Elasticsearch 1.0!" } }] } }
  8. [email protected] @flo_schilling chilling REST API - Search How to query

    the location? • Bounding Box • Geo-Distance • Polygon • Geohashes
  9. [email protected] @flo_schilling chilling Geohashes - Encoding • Geohashes basically form

    a grid like quadtrees to • Compared to quadtrees geohashes use 32 cells B C F G U V Y Z 8 9 D E S T W X 2 3 6 7 K M Q R 0 1 4 5 H J N P
  10. [email protected] @flo_schilling chilling Geohashes - Encoding B C F G

    U V Y Z 8 9 D E S T W X 2 3 6 7 K M Q R 0 1 4 5 H J N P • Geohashes basically form a grid like quadtrees to • Compared to quadtrees geohashes use 32 cells • The cell numbering follows a z-curve • Each cell can be subdivided in 32 cells
  11. [email protected] @flo_schilling chilling b b c c f f g

    g u u v v y y z z 8 8 9 9 d d e e s s t t w w x x 2 2 3 3 6 6 7 7 k k m m q q r r 0 0 1 1 4 4 5 5 h h j j n n p p Geohashes - Hierarchy up up ur ur ux ux uz uz un un uq uq uw uw uy uy uj uj um um ut ut uv uv uh uh uk uk us us uu uu u5 u5 u7 u7 ue ue ug ug u4 u4 u6 u6 ud ud uf uf u1 u1 u3 u3 u9 u9 uc uc u0 u0 u2 u2 u8 u8 ub ub u3b u3b u3c u3c u3f u3f u3g u3g u3u u3u u3v u3v u3y u3y u3z u3z u38 u38 u39 u39 u3d u3d u3e u3e u3s u3s u3t u3t u3w u3w u3x u3x u32 u32 u33 u33 u36 u36 u37 u37 u3k u3k u3m u3m u3q u3q u3r u3r u30 u30 u31 u31 u34 u34 u35 u35 u3h u3h u3j u3j u3n u3n u3p u3p
  12. [email protected] @flo_schilling chilling How to calculate a Geohash 1 1

    0 1 0 B B C C F F G G V V Y Y Z Z 8 8 9 9 D D E E S S T T W W X X 2 2 3 3 6 6 7 7 K K M M Q Q R R 0 0 1 1 4 4 5 5 H H J J N N P P U U U U0 U0 U2 U2 U8 U8 UB UB U1 U1 U3 U3 U9 U9 UC UC U4 U4 U6 U6 UD UD UF UF U5 U5 U7 U7 UE UE UG UG UH UH UK UK US US UU UU UJ UJ UM UM UT UT UV UV UN UN UQ UQ UW UW UY UY UP UP UR UR UX UX UZ UZ 0 0 0 1 1 U Binary search and encoding according to the character Map 3 U 3 3 D 8 D Y V C T T B
  13. [email protected] @flo_schilling chilling Why Geohashes • A Geo-distance filter needs

    a distance function to be applied to all documents within the result set • A bounding-box filter need at least 4 comparisons • Polygon filter should be used for special cases only b b c c f f g g u u v v y y z z 8 8 9 9 d d e e s s t t w w x x 2 2 3 3 6 6 7 7 k k m m q q r r 0 0 1 1 4 4 5 5 h h j j n n p p
  14. [email protected] @flo_schilling chilling Why Geohashes D1 T1 D1 T2 D1

    T3 D1 T4 D1 D2 D3 D2 D3 D4 D1 D2 Documents Geohash • A Geohash can either be interpreted a a cell with a certain size or as a imprecise point • After encoding a location as geohash it's a simple term • We store all prefixes of a geohash u, u3, u33, ..., u33d8dyvcbt, u33d8dyvcbtt • Remember, a term lookup is a simple operation • Hence finding all documents within a geohash cell is simple • We can bind documents to a certain level of detail, by controlling the length of geohashes
  15. [email protected] @flo_schilling chilling REST API – Filter by Geohash curl

    -XPOST localhost:9200/tweets/tweet/_search -d '{ "filtered" : { "query" : { "match_all" : {} }, "filter" : { "geohash_cell": { "location": { "lat": 13.4080, "lon": 52.5186 }, "precision": "3km", "neighbors": true } } } }'