Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch in production at EyeEm

Elasticsearch in production at EyeEm

A quick experience report of our usage of Elasticsearch.

Lars Fronius

April 29, 2014
Tweet

More Decks by Lars Fronius

Other Decks in Programming

Transcript

  1. • H A P P Y G R U M

    P Y C AT O F E Y E E M • S TA R T E D A S A N O P S I N A S C I E N T I F I C D ATA C E N T E R • N O W D E V • D E V E L O P E R S H AT E M E S O M E T I M E S M E A B O U T M E
  2. A P I S TA C K • PHP •

    MySQL (~10k commands per second) • Memcached (~50k commands per second) • Redis (~3k commands per second) • S3 (~1k commands per second, 40m photos stored) • Elasticsearch (~250 commands per second - elasticsearch-php) • All writes are async • Metrics everywhere
  3. C U R R E N T C L U

    S T E R S P E C S • 3 x m3.xlarge (4 cores, 15GiB Mem, 2 x 40GB SSD) • cloud-aws plugin to interconnect. • OpenJDK 1.6 • 60% heap size (9 GiB) • 4 Indexes, 5 Shards each. From 1GB to 15GB
  4. C U R R E N T P R O

    D U C T I O N U S E - C A S E S
  5. C U R R E N T P R O

    D U C T I O N U S E - C A S E S A L B U M S E A R C H
  6. C U R R E N T P R O

    D U C T I O N U S E - C A S E S P E O P L E S E A R C H
  7. C U R R E N T P R O

    D U C T I O N U S E - C A S E S • C I T Y- S E A R C H • L I V E N E A R B Y D I S C O V E R
  8. C U R R E N T P R O

    D U C T I O N U S E - C A S E S L I V E N E A R B Y
  9. C U R R E N T B E TA

    U S E - C A S E S
  10. C U R R E N T B E TA

    U S E - C A S E S
  11. L O N G S T O RY • MyISAM

    full-text search • Album Search on one ElasticSearch node • People Search added • Scale-Out to 3 instances for Photo Search (+ Live Nearby)
  12. E L A S T I C S E A

    R C H - I N T E R N A L S • Index • What your application sees. • View for a logical namespace inside ElasticSearch. • Consists of a fixed number of shards • “To Index” means to “put” your data into ElasticSearch to make it available for search and for persistence.
  13. E L A S T I C S E A

    R C H - I N T E R N A L S • Inverted-Index/Mapping • The Mapping tells Lucene how to create the inverted-index in order to make data searchable. • e.g. “EyeEm” as an nGram{2,3} gets “indexed” as [“Ey”,”ye”,”eE”,”Em”,”Eye”,”yeE”,”eEm”],
 “yeah” would be [“ye”,”ah”,”yea”, “eah”]
  14. E L A S T I C S E A

    R C H - I N T E R N A L S • Inverted Index/Mapping by example Ey 1 ye 1,2 eE 1 Em 1 Eye 1 yeE 1 eEm 1 ah 2 yea 2 eah 2
  15. S C H E M A - L E S

    S O R W H AT ? • Yes and No.
  16. S C H E M A - L E S

    S O R W H AT ? • Yes - You can put anything that can be formatted as a JSON in your index, and you get a readable document.
  17. S C H E M A - L E S

    S O R W H AT ? • No - you have to think first, because changing your Mapping is expensive, since you have to reindex.
  18. E L A S T I C S E A

    R C H - I N T E R N A L S • Shard • Instance of Lucene • Consists of multiple Lucene segments • Manages segments (Merging, fsync, deletion etc.)
  19. E L A S T I C S E A

    R C H - I N T E R N A L S segments API http://example.es:9200/yourindex/_segments indices: { eyephoto6: { shards: { 0: [! {! routing: {! state: "STARTED",! primary: true,! node: "PiVDZW-VRYmeaVOy7afoWQ"! },! num_committed_segments: 2,! num_search_segments: 3,! segments: {! _l: {! generation: 21,! num_docs: 13,! deleted_docs: 0,! size_in_bytes: 30810,! memory_in_bytes: 589,! committed: true,! search: true,! version: "4.7",! compound: true! },! ! ! ! ! ! _m: {! generation: 22,! num_docs: 371,! deleted_docs: 16,! size_in_bytes: 408548,! memory_in_bytes: 7365,! committed: false,! search: true,! version: "4.7",! compound: false! },! _n: {! generation: 23,! num_docs: 16,! deleted_docs: 0,! size_in_bytes: 38514,! memory_in_bytes: 615,! committed: false,! search: true,! version: "4.7",! compound: true! }! }! }! ],! 1: [!
  20. E L A S T I C S E A

    R C H - I N T E R N A L S • Segments • Managed by ElasticSearch • Is the storage for the inverted index
  21. E L A S T I C S E A

    R C H - I N T E R N A L S • Basically ElasticSearch is a Lucene cluster manager and API
  22. L E S S O N S L E A

    R N E D - S H A R D S / S E G M E N T S • Deletion does only mark documents as deleted and does not delete them immediately. • Updating a document does only create a new one and marks old one as deleted. • The actual cleanup process happens in background and can result in nice performance surprises.
  23. L E S S O N S L E A

    R N E D - S H A R D S / S E G M E N T S • Nested documents live in the same Lucene Segment. • Can bloat up memory usage a lot. • They are treated as every other document. • If you don’t necessarily always have to search in them, go for parent-child.
  24. L E S S O N S L E A

    R N E D - E L A S T I C S E A R C H • Start with more than one instance - just too simple • Major upgrades are a pain (0.90 -> 1.1) • PHP Client Libraries mostly do not handle connection pools properly, use elasticsearch-php • ‘connectionPoolClass' => ‘\Elasticsearch \ConnectionPool\StaticConnectionPool' • let an intermediate webserver handle it
  25. L E S S O N S L E A

    R N E D - E L A S T I C S E A R C H • You will index more than one time. Promise.
 Be prepared. • Rebalancing is smooth, don’t worry. • Have your metrics ready. • “You can have a good time with ElasticSearch, if you don't ignore the complexity and internals of this distributed database.”
  26. L E S S O N S L E A

    R N E D - E L A S T I C S E A R C H
  27. L E S S O N S L E A

    R N E D - E L A S T I C S E A R C H
  28. L E S S O N S L E A

    R N E D - I N D E X / M A P P I N G • Different analysers should go into separate fields • Score individually - iterative optimisations possible • Keep a raw field • Use dynamic_templates if you found the holy grail of field analysis. • Filter first! Querying and scoring is expensive.
  29. L E S S O N S L E A

    R N E D - I N D E X / M A P P I N G
  30. L E S S O N S L E A

    R N E D - I N D E X / M A P P I N G GET /eyephoto/_mapping! {! "eyephoto6": {! "mappings": {! "photo": {! "dynamic_templates": [! {! "string": {! "mapping": {! "type": "string",! "index_analyzer": "photo_names",! "search_analyzer": "photo_standard",! "fields": {! "raw": {! "type": "string",! "index": "not_analyzed"! },! "split": {! "type": "string",! "analyzer": "standard"! }! }! },! "match": "*",! "match_mapping_type": "string"! }! }! ] • Different analysers should go into separate fields
  31. L E S S O N S L E A

    R N E D - I N D E X / M A P P I N G {! "took": 18,! "timed_out": false,! "_shards": {! ##########! },! "hits": {! "total": 125,! "max_score": 6.44889,! "hits": [! {! #####! "_id": "167480",! #####! }! }! ]! },! "facets": {! "topic": {! "_type": "terms",! "missing": 0,! "total": 138,! "other": 57,! "terms": [! {! "term": "Coffee",! "count": 81! }! ]! }! }! } • Different analysers should go
 into separate fields POST /eyephoto/photo/_search! {! "size": 1,! "fields": [! "id"! ],! "query": {! "multi_match": {! "query": "coff",! "fields": [! "topics"! ]! }! },! "facets": {! "topic": {! "terms": {! "field": "topics.raw",! "size": 1! }! }! }! }
  32. L E S S O N S L E A

    R N E D - I N D E X / M A P P I N G POST /eyephoto/photo/_search! {! "query": {! "bool": {! "should": [! {! "multi_match": {! "query": "lars",! "operator": "and",! "fields": [! “name.raw^3",! “name.split^2”,! “name"! ]! }! },! {! "multi_match": {! "query": "lars",! "fields": [! “name.raw^3”,! “name.split^2”,! “name”! ]! }! }! ]! }! • Different analysers should go 
 into separate fields
  33. L E S S O N S L E A

    R N E D - I N D E X / M A P P I N G • Read and write only to index aliases. Index Name Index Aliases eyephoto5 “eyephotoread” eyephoto6 “eyephotowrite”
  34. L E S S O N S L E A

    R N E D - I N D E X / M A P P I N G • If you have a string or integer field, you can put an array into it as well. Ey 1 ye 1,2 eE 1 Em 1 Eye 1 yeE 1 eEm 1 ah 2 yea 2 eah 2
  35. L E S S O N S L E A

    R N E D - I N D E X / M A P P I N G • Use geohash wherever you query on lat/lng. POST /eyephoto/photo/_search! {! "query": {! "function_score": {! "query": {! "filtered": {! "query": {! "match_all": []! },! "filter": {! "geohash_cell": {! "location": {! "lat": 52.5311,! "lon": 13.404! },! "precision": 4,! "neighbors": true! } } } },! "functions": [! {! "gauss": {! "location": {! "origin": "52.5311,13.404",! "scale": "10km"! }! }! },! {! "exp": {! "uploaded": {! "origin": "now",! "scale": "2d"! }! }! }!
  36. L E S S O N S L E A

    R N E D - A G G R E G AT I O N S • Aggregations give you recursive facets, handle with care. "aggregations": {! “user_fullname": {! "filter": {! "query": {! "match": {! "topics": {! "query": "lars beer",! "operator": "or"! } } } },! "aggs": {! “user_fullname": {! "terms": {! "field": “user_fullname.raw”,! "size": 3! },! "aggs": {! “topics": {! "filter": {! "query": {! "match": {! “topics": {! "query": "lars beer",! "operator": "or"! } } } },! "aggs": {! “topics": {! "terms": {! "field": “topics.raw”,! "size": 3! }! }! }! },!
  37. L E S S O N S L E A

    R N E D - A G G R E G AT I O N S • Aggregations give you recursive facets, handle with care. "user_fullname": {! "doc_count": 678,! "user_fullname": {! "buckets": [! {! "key": "Lars ",! "doc_count": 678,! "topics": {! "doc_count": 5,! "topics": {! "buckets": [! {! "key": "Beer",! "doc_count": 1! },! {! "key": "BeerOps",! "doc_count": 1! },! {! "key": "Birthday beer in the snow",! "doc_count": 1! }! ]! }! }! }! ]! }!
  38. O U T L O O K • 1-liner search

    • public release • Localisation (snowball / stopwords) • Keep indexed documents (e.g. albums) updated
  39. N E X T I T E R AT I

    O N ( E VA L U AT I N G ) • Elasticsearch 1.1 • Oracle Java 1.8 (GC) • more indexes and even more shards. • restore API