Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch at Ticketbis

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Jose Jose
November 19, 2016

Elasticsearch at Ticketbis

Talk (spanish) can be found here: https://www.youtube.com/watch?v=vrtDe0sb09s

Avatar for Jose

Jose

November 19, 2016
Tweet

More Decks by Jose

Other Decks in Technology

Transcript

  1. ~ 400 employees ~ 50 countries ~ 90 m€ in

    2015 ~ May 2016: Acquired by StubHub (eBay)
  2. Elasticsearch Intro ~ A distributed RESTful search engine ~ Lucene:

    search engine lib behind ES ~ Basic concepts - Index (~ database) - Type (~ table) - Document (~ row)
  3. Elasticsearch concepts Node 1 P1 Node 2 R1 Node 3

    P0 R2 Cluster P2 R0 3 nodes, 3 shards per index and 1 replica per shard (up to 6 nodes)
  4. Elasticsearch concepts Node 1 P1 R0 Node 2 R1 R2

    Node 3 P0 R1 R2 Cluster P2 R0 3 nodes, 3 shards per index and 2 replicas per shard (up to 9 nodes)
  5. 3 shards for a 2 nodes cluster (1 replica) Node

    1 P1 R0 Node 2 R1 R2 P0 Cluster P2 Capacity planning Overallocation (#shards > #nodes)
  6. Capacity planning Avoid Kagillion shards problem ~ A shard is

    a Lucene index (+resources) ~ A search request touch all shards ~ Poor document relevance (TF / IDF)
  7. “Es el vecino el que elige al alcalde y es

    el alcalde el que quiere que sean los vecinos el alcalde”
  8. “Es el vecino el que elige al alcalde y es

    el alcalde el que quiere que sean los vecinos el alcalde” tf(freq=3.0) = 1.73 . . . idf(docFreq=1, maxDocs=1) = 0.30 idf(docFreq=2, maxDocs=10) = 2.20
  9. Capacity planning Replicas ~ Improves performance and throughput ~ High-availability

    and huge read-intensive search performance ~ Number of replicas can be changed anytime
  10. Ticketbis use case Size & number of documents ~ 1.2KB

    per document ~ 30K docs in 63 lang = 2M docs / year ~ 2.3GB / year ~ Rule of thumb: < 32GB / shard
  11. Event document { "Id": 18, “Date”: “2016-12-04T20:00:00Z”, “Timezone”: “Europe/Madrid”, "shipment_deadline":

    "2016-12-01T23:00:00Z", “translations”: { “es_es”: { “name”: “Barcelona - Real Madrid”, “title”: “Entradas Barcelona - Real Madrid”, }, “en_gb”: { “name”: “Barcelona v Real Madrid”, “title”: “Barcelona v Real Madrid Tickets - La Liga”, }, }}
  12. I18n approaches ~ Parent / Child ~ Nested data ~

    Shard per language ~ Type per language ~ Index per language ~ Field per language
  13. Parent / Child PUT /catalog { "mappings": { "event": {...},

    // non i18n fields (date, ...) "translations": { "_parent": { "type": "event" }, “Properties”: { “locale”: { “type”: “string”}, “name”, .... // Rest of i18n fields } } } }
  14. Parent / Child ~ Parent / child in the same

    shard ~ Can update a single translation ~ Always hit all shards ~ Doesn’t support different analysers per locale
  15. Parent / Child ~ Parent / child in the same

    shard ~ Can update a single translation ~ Always hit all shards ~ Doesn’t support different analysers per locale
  16. Nested data PUT /catalog { "mappings": { "event": { “properties”:

    { … // non i18n fields (date, ...) "translations": { “type”: “nested”, “properties”: { “locale”: {“type”: “string”} “name”, … // rest of i18n fields } } } }}
  17. Nested data ~ Nested data in the same segment ~

    Query 10 times faster than parent/child ~ Always hit all shards ~ Can’t update a single translation ~ Doesn’t support different analysers per locale
  18. Nested data ~ Nested data in the same segment ~

    Query 10 times faster than parent/child ~ Always hit all shards ~ Can’t update a single translation ~ Doesn’t support different analysers per locale
  19. Shard per language (routing) PUT /catalog/event/18?routing=es_es { “Date”: “2016-12-04T20:00:00Z”, “Timezone”:

    “Europe/Madrid”, "shipment_deadline": "2016-12-01T23:00:00Z", “name”: “Barcelona - Real Madrid”, “title”: “Entradas Barcelona - Real Madrid”, } GET /catalog/event/18?routing=es_es
  20. Shard per language (routing) ~ Only hits one shard ~

    Better query performance ~ Doesn’t support different analysers per locale
  21. Shard per language (routing) ~ Only hits one shard ~

    Better query performance ~ Doesn’t support different analysers per locale
  22. Type per language PUT /catalog/event_es_es/18 { “Date”: “2016-12-04T20:00:00Z”, “Timezone”: “Europe/Madrid”,

    "shipment_deadline": "2016-12-01T23:00:00Z", “name”: “Barcelona - Real Madrid”, “title”: “Entradas Barcelona - Real Madrid”, } GET /catalog/event_es_es/18
  23. Type per language ~ Supports different analysers per locale ~

    Same fields in different types within the same index share the same inverted index
  24. locale es_es: “Una relajante taza de café con leche en

    Plaza Mayor” Taza: idf(docFreq=1, maxDocs=1) = 0.30
  25. locale es_es: “Una relajante taza de café con leche en

    Plaza Mayor” locale en_gb: “A relaxing cup of café con leche in Plaza Mayor” Taza: idf(docFreq=1, maxDocs=2) = 1.00
  26. Index per language PUT /catalog_es_es/event/18 { “Date”: “2016-12-04T20:00:00Z”, “Timezone”: “Europe/Madrid”,

    "shipment_deadline": "2016-12-01T23:00:00Z", “name”: “Barcelona - Real Madrid”, “title”: “Entradas Barcelona - Real Madrid”, } GET /catalog_es_es/event/18
  27. Field per language PUT /catalog/event/18 { “Date”: “2016-12-04T20:00:00Z”, “Timezone”: “Europe/Madrid”,

    "shipment_deadline": "2016-12-01T23:00:00Z", “name_es_es”: “Barcelona - Real Madrid”, “title_es_es”: “Entradas Barcelona - Real Madrid”, “name_en_gb”: “Barcelona v Real Madrid”, “title_en_gb”: “Barcelona v Real Madrid Tickets”, } GET /catalog/event/18
  28. Index / Field per language ~ Support different analysers per

    locale ~ Avoid TF/IDF and stemming problems ~ hit all shards
  29. Index compared to Field ~ more flexible ~ doesn’t need

    to index all languages at once ~ N indices (#languages) from the very beginning
  30. Benchmarking Index per language ~ Documents successfully indexed up to

    25 shards in +50 indices ~ happy with query performance for 63 indices, 3 shards per index and 3 nodes.
  31. Ticketbis use case #documents in 2016 (after integration) ~ 130K

    docs in 63 lang = 8.5M docs / year ~ 10GB / year (2.3 GB/year before integration) ~ Rule of thumb: < 32GB / shard ~ Field per language could be a problem
  32. Conclusions ~ Capacity planning is not the first step ~

    It always depends on your needs ~ A little overallocation can be good ~ Don’t mix languages and mind TF/IDF