How HotelTonight uses Elasticsearch to power its hotel search algorithm

0987c282f003226e9698ae96e4b40161?s=47 Paul Burt
October 23, 2015

How HotelTonight uses Elasticsearch to power its hotel search algorithm

Presentation by Paul Sorenson and Jatinder Singh. From the October 23rd (2015) SFRails event:
http://www.meetup.com/SFRails/events/225588522/

0987c282f003226e9698ae96e4b40161?s=128

Paul Burt

October 23, 2015
Tweet

Transcript

  1. HotelTonight Jatinder Singh and Paul Sorensen

  2. None
  3. Jatinder Singh Director of Engineering, Platform Twitter @rubymerchant Email jatinder@hoteltonight.com

  4. HotelTonight?

  5. World’s first mobile only last minute hotel booking app.

  6. Hotels Customers 1. 40% rooms unsold. 1. 20% to 70%

    discounts. HotelTonight
  7. Make the world more spontaneous

  8. None
  9. None
  10. Spontaneity Brings Technical Challenges

  11. Simplicity 1. Few taps and a swipe. 2. Just a

    small list of hotels. 3. Just the very best hotels for you. 4. Fast.
  12. Perishable Inventory 1. Availability and pricing changes all the time.

    2. Real-time
  13. Ad

  14. Jatinder Singh Director of Engineering, Platform Twitter @rubymerchant Email jatinder@hoteltonight.com

  15. Finding the Best Hotels in the Moment How HotelTonight uses

    Elasticsearch to power its hotel search algorithm
  16. Hi

  17. About Me (Paul Sorensen) • Platform Engineer at HotelTonight •

    Work on our hotel ranking algorithm using Elasticsearch • Currently fascinated with scaling web apps twitter: @paulnsorensen
  18. • Hotels compete for display • We show you the

    best deals this is where the ranking comes in • Book up to 7 days in advance for up to a 5 night stay About HotelTonight
  19. What if I told you... We increased our inventory records

    by 50x Our system can handle 10x more traffic We cut our response times by 150% That’s what we did.
  20. How did we do it? gem install elasticsearch rake scale:hotels

    That’s it. THANKS FOR COMING!!! PROFIT JUST KIDDING
  21. there is no silver bullet

  22. scaling is hard

  23. Scope

  24. Overview Why Elasticsearch? What is Elasticsearch? The awesome challenges we

    get to work on
  25. What this is not Not a technical deep dive Not

    an objective comparison between tech
  26. Impetus

  27. we grew up

  28. from 3 cities to 2000 cities

  29. we’re never not booking rooms

  30. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query
  31. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query
  32. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query
  33. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query
  34. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation NOPE. I’M OUT
  35. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation NOPE. I’M OUT
  36. Early 2014: Our system was reaching its capacity MySQL O(n^2

    log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation NOPE. I’M OUT
  37. we cannot have downtime

  38. meanwhile

  39. we still needed to grow

  40. Later 2014: We wanted to expand our booking window from

    1 to 7 days Same-day 6 more days, 7x data Advance booking HOW ARE WE GOING TO RUN GEO QUERIES!?
  41. scaling is hard

  42. scaling is unique

  43. What were our choices? • More Caching? • Use OpenGIS

    on MySQL (geospatial index extension)? • Switch to PostgreSQL and use PostGIS? • Find something from Hacker News? • Use Elasticsearch? The full-text indexing engine?
  44. Elasticsearch Use cases • Full-text search • Analytics: Elastic’s ELK

    (Elasticsearch, Logstash, Kibana) • Spell-checking, Autocomplete • Ranking hotel rooms?
  45. Elasticsearch!

  46. how does elasticsearch work?

  47. Documents: { “_id” : 4492, “description: “The quick brown fox

    jumps over lazy dogs” }, { “_id” : 4493, “description: “The slow red fox doesn’t say anything” }
  48. Match Query: { “query” : { “match” : { “description”

    : “quick fox” } } } => returns documents matching “quick” and/or “fox”
  49. Inverted index: { “fox” => [4492, 4493], “brown” => [4492],

    “red” => [4493], } How is it stored? THIS MAKES IT FAST
  50. Elasticsearch supports many filters A few examples we can use:

    • term - exact match • bool - combine filters • various geo filters • range
  51. Independent filter caching • queries cache individual filter matches* •

    very fast to check if a document matches • *but not geo, range or script filters
  52. how can we use this?

  53. run cheap filters first THEN run geo

  54. but wait, how do you rank documents?

  55. Elasticsearch orders documents by relevance • Define your own scoring

    functions • Let the Elasticsearch determine most relevant documents • Don’t have to load ActiveRecord objects into memory to rank them anymore
  56. less memory == faster

  57. we wanna go fast

  58. Alright — Let’s use Elasticsearch

  59. ✓ prototype ✓ perf test ✓ provision it

  60. How it’s designed Docs MySQL price updates $$ Elasticsearch denormalization

  61. How it’s designed Elasticsearch MySQL generate response generate query

  62. Our biggest challenge

  63. Elasticsearch MySQL must be kept in sync

  64. Elasticsearch MySQL changing fields on a document type requires new

    index Elasticsearch
  65. If Elasticsearch goes down, we go down Elasticsearch MySQL generate

    response generate query
  66. If Elasticsearch goes down, we go down Elasticsearch MySQL generate

    response generate query
  67. we cannot have downtime

  68. we cannot have inconsistency

  69. • We have to minimize consistency delays • Defend against

    them when they do happen • Zero-downtime mapping changes We are conquering these challenges
  70. Zero-downtime Mapping Changes Docs MySQL Elasticsearch denormalization Elasticsearch track changes

    load documents from database
  71. scaling is hard

  72. scaling is unique

  73. More is always sometimes better 6 more days of booking

    50x inventory 10x traffic 150% quicker response times PROFIT
  74. scaling is awesome

  75. Thanks Try Elasticsearch (with us? we’re hiring) Twitter @paulnsorensen Email

    paul@hoteltonight.com $25 Off First Booking PAUL