Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How HotelTonight uses Elasticsearch to power its hotel search algorithm

Paul Burt
October 23, 2015

How HotelTonight uses Elasticsearch to power its hotel search algorithm

Presentation by Paul Sorenson and Jatinder Singh. From the October 23rd (2015) SFRails event:
http://www.meetup.com/SFRails/events/225588522/

Paul Burt

October 23, 2015
Tweet

More Decks by Paul Burt

Other Decks in Programming

Transcript

  1. Simplicity 1. Few taps and a swipe. 2. Just a

    small list of hotels. 3. Just the very best hotels for you. 4. Fast.
  2. Ad

  3. Finding the Best Hotels in the Moment How HotelTonight uses

    Elasticsearch to power its hotel search algorithm
  4. Hi

  5. About Me (Paul Sorensen) • Platform Engineer at HotelTonight •

    Work on our hotel ranking algorithm using Elasticsearch • Currently fascinated with scaling web apps twitter: @paulnsorensen
  6. • Hotels compete for display • We show you the

    best deals this is where the ranking comes in • Book up to 7 days in advance for up to a 5 night stay About HotelTonight
  7. What if I told you... We increased our inventory records

    by 50x Our system can handle 10x more traffic We cut our response times by 150% That’s what we did.
  8. How did we do it? gem install elasticsearch rake scale:hotels

    That’s it. THANKS FOR COMING!!! PROFIT JUST KIDDING
  9. What this is not Not a technical deep dive Not

    an objective comparison between tech
  10. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query
  11. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query
  12. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query
  13. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query
  14. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation NOPE. I’M OUT
  15. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation NOPE. I’M OUT
  16. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation NOPE. I’M OUT
  17. Later 2014: We wanted to expand our booking window from

    1 to 7 days Same-day 6 more days, 7x data Advance booking HOW ARE WE GOING TO RUN GEO QUERIES!?
  18. What were our choices? • More Caching? • Use OpenGIS

    on MySQL (geospatial index extension)? • Switch to PostgreSQL and use PostGIS? • Find something from Hacker News? • Use Elasticsearch? The full-text indexing engine?
  19. Elasticsearch Use cases • Full-text search • Analytics: Elastic’s ELK

    (Elasticsearch, Logstash, Kibana) • Spell-checking, Autocomplete • Ranking hotel rooms?
  20. Documents: { “_id” : 4492, “description: “The quick brown fox

    jumps over lazy dogs” }, { “_id” : 4493, “description: “The slow red fox doesn’t say anything” }
  21. Match Query: { “query” : { “match” : { “description”

    : “quick fox” } } } => returns documents matching “quick” and/or “fox”
  22. Inverted index: { “fox” => [4492, 4493], “brown” => [4492],

    “red” => [4493], } How is it stored? THIS MAKES IT FAST
  23. Elasticsearch supports many filters A few examples we can use:

    • term - exact match • bool - combine filters • various geo filters • range
  24. Independent filter caching • queries cache individual filter matches* •

    very fast to check if a document matches • *but not geo, range or script filters
  25. Elasticsearch orders documents by relevance • Define your own scoring

    functions • Let the Elasticsearch determine most relevant documents • Don’t have to load ActiveRecord objects into memory to rank them anymore
  26. • We have to minimize consistency delays • Defend against

    them when they do happen • Zero-downtime mapping changes We are conquering these challenges
  27. More is always sometimes better 6 more days of booking

    50x inventory 10x traffic 150% quicker response times PROFIT