$30 off During Our Annual Pro Sale. View Details »

How HotelTonight uses Elasticsearch to power its hotel search algorithm

Paul Burt
October 23, 2015

How HotelTonight uses Elasticsearch to power its hotel search algorithm

Presentation by Paul Sorenson and Jatinder Singh. From the October 23rd (2015) SFRails event:
http://www.meetup.com/SFRails/events/225588522/

Paul Burt

October 23, 2015
Tweet

More Decks by Paul Burt

Other Decks in Programming

Transcript

  1. HotelTonight
    Jatinder Singh and Paul Sorensen

    View Slide

  2. View Slide

  3. Jatinder Singh
    Director of Engineering,
    Platform
    Twitter
    @rubymerchant
    Email
    [email protected]

    View Slide

  4. HotelTonight?

    View Slide

  5. World’s first mobile only last minute
    hotel booking app.

    View Slide

  6. Hotels
    Customers
    1. 40% rooms
    unsold.
    1. 20% to 70%
    discounts.
    HotelTonight

    View Slide

  7. Make the world more spontaneous

    View Slide

  8. View Slide

  9. View Slide

  10. Spontaneity Brings Technical
    Challenges

    View Slide

  11. Simplicity
    1. Few taps and a swipe.
    2. Just a small list of hotels.
    3. Just the very best hotels for you.
    4. Fast.

    View Slide

  12. Perishable Inventory
    1. Availability and pricing changes all the time.
    2. Real-time

    View Slide

  13. Ad

    View Slide

  14. Jatinder Singh
    Director of Engineering,
    Platform
    Twitter
    @rubymerchant
    Email
    [email protected]

    View Slide

  15. Finding the Best Hotels in
    the Moment
    How HotelTonight uses Elasticsearch to power its
    hotel search algorithm

    View Slide

  16. Hi

    View Slide

  17. About Me (Paul Sorensen)
    ● Platform Engineer at HotelTonight
    ● Work on our hotel ranking algorithm using Elasticsearch
    ● Currently fascinated with scaling web apps
    twitter: @paulnsorensen

    View Slide

  18. ● Hotels compete for display
    ● We show you the best deals
    this is where the ranking
    comes in
    ● Book up to 7 days in
    advance for up to a 5 night
    stay
    About HotelTonight

    View Slide

  19. What if I told you...
    We increased our inventory records by 50x
    Our system can handle 10x more traffic
    We cut our response times by 150%
    That’s what we did.

    View Slide

  20. How did we do it?
    gem install elasticsearch
    rake scale:hotels That’s it.
    THANKS FOR
    COMING!!!
    PROFIT
    JUST KIDDING

    View Slide

  21. there is no
    silver bullet

    View Slide

  22. scaling is
    hard

    View Slide

  23. Scope

    View Slide

  24. Overview
    Why Elasticsearch?
    What is Elasticsearch?
    The awesome challenges we get to work on

    View Slide

  25. What this is not
    Not a technical deep dive
    Not an objective comparison between tech

    View Slide

  26. Impetus

    View Slide

  27. we grew up

    View Slide

  28. from 3 cities to
    2000 cities

    View Slide

  29. we’re never not
    booking rooms

    View Slide

  30. Early 2014: Our system was reaching its capacity
    MySQL
    O(n^2 log n)
    Ranking over hundreds of
    ActiveRecord objects
    gimme nearby hotels
    geolocation query

    View Slide

  31. Early 2014: Our system was reaching its capacity
    MySQL
    O(n^2 log n)
    Ranking over hundreds of
    ActiveRecord objects
    gimme nearby hotels
    geolocation query
    geolocation query

    View Slide

  32. Early 2014: Our system was reaching its capacity
    MySQL
    O(n^2 log n)
    Ranking over hundreds of
    ActiveRecord objects
    gimme nearby hotels
    geolocation query
    geolocation query
    geolocation query

    View Slide

  33. Early 2014: Our system was reaching its capacity
    MySQL
    O(n^2 log n)
    Ranking over hundreds of
    ActiveRecord objects
    gimme nearby hotels
    geolocation query
    geolocation query
    geolocation query

    View Slide

  34. Early 2014: Our system was reaching its capacity
    MySQL
    O(n^2 log n)
    Ranking over hundreds of
    ActiveRecord objects
    gimme nearby hotels
    geolocation
    NOPE.
    I’M OUT

    View Slide

  35. Early 2014: Our system was reaching its capacity
    MySQL
    O(n^2 log n)
    Ranking over hundreds of
    ActiveRecord objects
    gimme nearby hotels
    geolocation
    NOPE.
    I’M OUT

    View Slide

  36. Early 2014: Our system was reaching its capacity
    MySQL
    O(n^2 log n)
    Ranking over hundreds of
    ActiveRecord objects
    gimme nearby hotels
    geolocation
    NOPE.
    I’M OUT

    View Slide

  37. we cannot
    have
    downtime

    View Slide

  38. meanwhile

    View Slide

  39. we still
    needed to
    grow

    View Slide

  40. Later 2014: We wanted to expand our
    booking window from 1 to 7 days
    Same-day
    6 more days, 7x data
    Advance booking
    HOW ARE WE GOING TO RUN GEO QUERIES!?

    View Slide

  41. scaling is
    hard

    View Slide

  42. scaling is
    unique

    View Slide

  43. What were our choices?
    • More Caching?
    • Use OpenGIS on MySQL (geospatial index extension)?
    • Switch to PostgreSQL and use PostGIS?
    • Find something from Hacker News?
    • Use Elasticsearch? The full-text indexing engine?

    View Slide

  44. Elasticsearch Use cases
    ● Full-text search
    ● Analytics: Elastic’s ELK (Elasticsearch, Logstash, Kibana)
    ● Spell-checking, Autocomplete
    ● Ranking hotel rooms?

    View Slide

  45. Elasticsearch!

    View Slide

  46. how does
    elasticsearch work?

    View Slide

  47. Documents:
    {
    “_id” : 4492,
    “description: “The quick brown fox jumps over lazy dogs”
    },
    {
    “_id” : 4493,
    “description: “The slow red fox doesn’t say anything”
    }

    View Slide

  48. Match Query:
    {
    “query” : {
    “match” : {
    “description” : “quick fox”
    }
    }
    }
    => returns documents matching “quick” and/or “fox”

    View Slide

  49. Inverted index:
    {
    “fox” => [4492, 4493],
    “brown” => [4492],
    “red” => [4493],
    }
    How is it stored?
    THIS MAKES IT FAST

    View Slide

  50. Elasticsearch supports many filters
    A few examples we can use:
    ● term - exact match
    ● bool - combine filters
    ● various geo filters
    ● range

    View Slide

  51. Independent filter caching
    ● queries cache individual filter matches*
    ● very fast to check if a document matches
    ● *but not geo, range or script filters

    View Slide

  52. how can we use this?

    View Slide

  53. run cheap filters first
    THEN run geo

    View Slide

  54. but wait, how do you
    rank documents?

    View Slide

  55. Elasticsearch orders documents by relevance
    ● Define your own scoring functions
    ● Let the Elasticsearch determine most relevant documents
    ● Don’t have to load ActiveRecord objects into memory to
    rank them anymore

    View Slide

  56. less memory == faster

    View Slide

  57. we wanna go fast

    View Slide

  58. Alright — Let’s use
    Elasticsearch

    View Slide

  59. ✓ prototype
    ✓ perf test
    ✓ provision it

    View Slide

  60. How it’s designed
    Docs
    MySQL
    price updates $$
    Elasticsearch
    denormalization

    View Slide

  61. How it’s designed
    Elasticsearch
    MySQL
    generate
    response
    generate query

    View Slide

  62. Our biggest challenge

    View Slide

  63. Elasticsearch
    MySQL
    must be kept in sync

    View Slide

  64. Elasticsearch
    MySQL
    changing fields on a document type requires new index
    Elasticsearch

    View Slide

  65. If Elasticsearch goes down, we go down
    Elasticsearch
    MySQL
    generate
    response
    generate query

    View Slide

  66. If Elasticsearch goes down, we go down
    Elasticsearch
    MySQL
    generate
    response
    generate query

    View Slide

  67. we cannot
    have
    downtime

    View Slide

  68. we cannot
    have
    inconsistency

    View Slide

  69. ● We have to minimize consistency delays
    ● Defend against them when they do happen
    ● Zero-downtime mapping changes
    We are conquering these challenges

    View Slide

  70. Zero-downtime Mapping Changes
    Docs
    MySQL Elasticsearch
    denormalization
    Elasticsearch
    track changes
    load documents from
    database

    View Slide

  71. scaling is
    hard

    View Slide

  72. scaling is
    unique

    View Slide

  73. More is always sometimes better
    6 more days of booking
    50x inventory
    10x traffic
    150% quicker response times
    PROFIT

    View Slide

  74. scaling is
    awesome

    View Slide

  75. Thanks
    Try Elasticsearch
    (with us? we’re hiring)
    Twitter
    @paulnsorensen
    Email
    [email protected]
    $25 Off First Booking
    PAUL

    View Slide