Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How HotelTonight Finds the Best Hotels in the Moment

Elastic Co
February 17, 2016

How HotelTonight Finds the Best Hotels in the Moment

Learn how HotelTonight uses hosted Elasticsearch to mine millions of documents and analyze diverse data in milliseconds – from hotel inventory systems to data about a user’s desired trip (date, duration, location, price, brand, ratings, etc) – in order to help their users find a hotel at a moment’s notice.

Elastic Co

February 17, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. ‹#› Paul Sorensen, Platform Engineer at HotelTonight Feb 17th, 2016


    @paulnsorensen Finding the Best Hotels in the Moment
  2. About Me: Paul Sorensen Platform Engineer at HotelTonight I work

    on our hotel ranking algorithm I’m currently fascinated by scaling web apps twitter: @paulnsorensen
  3. How HotelTonight works Hotels compete for limited display, we show

    you the best deals Most API requests here
  4. How HotelTonight works Hotels compete for limited display, we show

    you the best deals Most API requests here (audience downloads app here)
  5. hotel rate records (documents) 200x 10x 2.5x more API traffic

    reduction in response times Since Introducing Elasticsearch…
  6. Overview Why Elasticsearch? How we use Elasticsearch (is this unique?)

    Some challenges we’ve faced (amidst success)
  7. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query
  8. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query
  9. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query
  10. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query
  11. We wanted to expand our booking window from 1 to

    7 days Same-day 6 more days, 7x data Advance booking
  12. What were our choices? • More Caching? • Use OpenGIS

    on MySQL (geospatial index extension)? • Switch to PostgreSQL and use PostGIS? • Find something from Hacker News?
  13. What were our choices? • More Caching? • Use OpenGIS

    on MySQL (geospatial index extension)? • Switch to PostgreSQL and use PostGIS? • Find something from Hacker News? • Use Elasticsearch? The full-text indexing engine?
  14. { “_id” : 4492, “description”: “The quick brown fox jumps

    over dogs” }, { “_id” : 4493, “description”: “The slow red fox doesn’t say things” } Simple Example: Documents
  15. { “query” : { “match” : { “description” : “quick

    fox” } } } returns documents matching “quick” and/or “fox” Simple Example: Query
  16. Simple Example: How is it stored? Inverted index: { “fox”

    => [4492, 4493], “brown” => [4492], “red” => [4493], }
  17. Simple Example: How is it stored? Inverted index: { “fox”

    => [4492, 4493], “brown” => [4492], “red” => [4493], } THIS MAKES IT FAST
  18. tomorrow night 2 night NYC These become bitset filters: “check_in_date”

    : “2016-02-18” “length_of_stay” : 2 “market_id” : 14326
  19. tomorrow night 2 night NYC Well, most of them do

    “check_in_date” : “2016-02-18” “length_of_stay” : 2 “market_id” : 14326 40.7127,-74.0059 “geo_distance” : { // etc. }
  20. Bitset filters: independent filter caching • queries cache individual filter

    matches* • very fast to check if a document matches • *but not geo, range or script filters
  21. Elasticsearch orders documents by relevance • Term frequency • Inverse

    document frequency • Field-length norm must replace with our own scoring
  22. How WE define relevance: • Pre-score document with sub-scores at

    index time • Use field_value_factor to apply weight to sub-scores at query time
  23. What were our problems again? MySQL O(n^2 log n) Ranking

    over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query
  24. What were our problems again? MySQL O(n^2 log n) Ranking

    over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query SOLVED SOLVED
  25. { "index": { "_id": "3752442343255", "_index": "rates_2016-02-19", "_type": "rate", "data":

    { "category_id": 3, "check_in_date": "2016-02-19", "price_in_usd_avg": "100.0", "hotel_id": 791323345, "location": "44.97,-93.35", "market_id": 9352, "price_score": "0.87", "review_score": 0.64, "sold_out": false, "stay_length": 1, }, } }
  26. { "index": { "_id": "3752442343255", "_index": "rates_2016-02-19", "_type": "rate", "data":

    { "category_id": 3, "check_in_date": "2016-02-19", "price_in_usd_avg": "100.0", "hotel_id": 791323345, "location": "44.97,-93.35", "market_id": 9352, "price_score": "0.87", "review_score": 0.64, "sold_out": false, "stay_length": 1, }, } }
  27. { "index": { "_id": "3752442343255", "_index": "rates_2016-02-19", "_type": "rate", "data":

    { "category_id": 3, "check_in_date": "2016-02-19", "price_in_usd_avg": "100.0", "hotel_id": 791323345, "location": "44.97,-93.35", "market_id": 9352, "price_score": "0.87", "review_score": 0.64, "sold_out": false, "stay_length": 1, }, } }
  28. Indices are based on check-in date 2016-0 2-15 2016-0 2-16

    2016-0 2-17 2016-0 2-18 2016-0 2-19 2016-0 2-20 2016-0 2-21 current day deleted
  29. { "filter": { "and": [ { "bool": { // most

    filters here } }, { “geo_bounding_box” : {
 "pin.location" : { "top_left" : [-74.1, 40.7], "bottom_right" : [-73.6, 40.0] } } } ] } } "must": [ { "term": { "start_date": "2016-02-18" } }, { "term": { "length_of_stay": 2 } }, { "term": { "on_sale": true } } ]
  30. { "filter": { "and": [ { "bool": { // most

    filters here } }, { “geo_bounding_box” : {
 "pin.location" : { "top_left" : [-74.1, 40.7], "bottom_right" : [-73.6, 40.0] } } } ] } } "must": [ { "term": { "start_date": "2016-02-18" } }, { "term": { "length_of_stay": 2 } }, { "term": { "on_sale": true } } ]
  31. { "filter": { "and": [ { "bool": { // most

    filters here } }, { “geo_bounding_box” : {
 "pin.location" : { "top_left" : [-74.1, 40.7], "bottom_right" : [-73.6, 40.0] } } } ] } } "must": [ { "term": { "start_date": "2016-02-18" } }, { "term": { "length_of_stay": 2 } }, { "term": { "on_sale": true } } ]
  32. { "filter": { "and": [ { "bool": { // most

    filters here } }, { “geo_bounding_box” : {
 "pin.location" : { "top_left" : [-74.1, 40.7], "bottom_right" : [-73.6, 40.0] } } } ] } } "must": [ { "term": { "start_date": "2016-02-18" } }, { "term": { "length_of_stay": 2 } }, { "term": { "on_sale": true } } ]
  33. { "filter": { "and": [ { "bool": { // most

    filters here } }, { “geo_bounding_box” : {
 "pin.location" : { "top_left" : [-74.1, 40.7], "bottom_right" : [-73.6, 40.0] } } } ] } } "must": [ { "term": { "start_date": "2016-02-18" } }, { "term": { "length_of_stay": 2 } }, { "term": { "on_sale": true } } ]
  34. { "functions": [ { "field_value_factor": { "factor": 0.9, "field": "price_score"

    } }, { "field_value_factor": { "factor": 0.6, "field": “review_score” } } ] }
  35. { "functions": [ { "field_value_factor": { "factor": 0.9, "field": "price_score"

    } }, { "field_value_factor": { "factor": 0.6, "field": “review_score” } } ] }
  36. { "aggs":{ "top_rates":{ "terms":{ "field":"hotel_id", "size":50, "order":{ "top_hit":"desc" } },

    "aggs":{ "top_rate":{ "top_hits":{ "size":1 } }, "top_hit":{ "max":{ "script":"_score" } } } } } }
  37. { "aggs":{ "top_rates":{ "terms":{ "field":"hotel_id", "size":50, "order":{ "top_hit":"desc" } },

    "aggs":{ "top_rate":{ "top_hits":{ "size":1 } }, "top_hit":{ "max":{ "script":"_score" } } } } } }
  38. { "aggs":{ "top_rates":{ "terms":{ "field":"hotel_id", "size":50, "order":{ "top_hit":"desc" } },

    "aggs":{ "top_rate":{ "top_hits":{ "size":1 } }, "top_hit":{ "max":{ "script":"_score" } } } } } }
  39. { "aggs":{ "top_rates":{ "terms":{ "field":"hotel_id", "size":50, "order":{ "top_hit":"desc" } },

    "aggs":{ "top_rate":{ "top_hits":{ "size":1 } }, "top_hit":{ "max":{ "script":"_score" } } } } } }
  40. { "aggs":{ "top_rates":{ "terms":{ "field":"hotel_id", "size":50, "order":{ "top_hit":"desc" } },

    "aggs":{ "top_rate":{ "top_hits":{ "size":1 } }, "top_hit":{ "max":{ "script":"_score" } } } } } }
  41. { "aggs":{ "top_rates":{ "terms":{ "field":"hotel_id", "size":50, "order":{ "top_hit":"desc" } },

    "aggs":{ "top_rate":{ "top_hits":{ "size":1 } }, "top_hit":{ "max":{ "script":"_score" } } } } } } give me one, best-scoring rate by hotel_id; order each hotel_id bucket by top scoring rate’s max score
  42. { "aggs":{ "top_rates":{ "terms":{ "field":"hotel_id", "size":50, "order":{ "top_hit":"desc" } },

    "aggs":{ "top_rate":{ "top_hits":{ "size":1 } }, "top_hit":{ "max":{ "script":"_score" } } } } } } give me one, best-scoring rate by hotel_id; order each hotel_id bucket by top scoring rate’s max score AKA: give me the best rate per hotel, and order the rates by score
  43. { "aggs":{ "top_rates":{ "terms":{ "field":"hotel_id", "size":50, "order":{ "top_hit":"desc" } },

    "aggs":{ "top_rate":{ "top_hits":{ "size":1 } }, "top_hit":{ "max":{ "script":"_score" } } } } } } Feature request: make this an agg? (or at least not a script)
  44. We also have a post-processing framework • Some operations require

    seeing a first pass at the results • Example: variety scoring • Example: “cutting edges off” of bounding box
  45. Leveling up our architecture minimize consistency delays (sync lag) but

    defend against it when it does happen mapping changes / elasticsearch upgrades zero-downtime mapping changes blue/green deployments
  46. 429

  47. hotel rate records (documents) 200x 10x 2.5x more API traffic

    reduction in response times Scaling is big numbers…
  48. But most of all, scaling is fun. Thank you Twitter

    @paulnsorensen Email [email protected] Promo Code ($25 off first booking) PAUL