About Me (Paul Sorensen) ● Platform Engineer at HotelTonight ● Work on our hotel ranking algorithm using Elasticsearch ● Currently fascinated with scaling web apps twitter: @paulnsorensen
● Hotels compete for display ● We show you the best deals this is where the ranking comes in ● Book up to 7 days in advance for up to a 5 night stay About HotelTonight
What if I told you... We increased our inventory records by 50x Our system can handle 10x more traffic We cut our response times by 150% That’s what we did.
Early 2014: Our system was reaching its capacity MySQL O(n^2 log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query
Early 2014: Our system was reaching its capacity MySQL O(n^2 log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query
Early 2014: Our system was reaching its capacity MySQL O(n^2 log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query
Early 2014: Our system was reaching its capacity MySQL O(n^2 log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation query geolocation query geolocation query
Early 2014: Our system was reaching its capacity MySQL O(n^2 log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation NOPE. I’M OUT
Early 2014: Our system was reaching its capacity MySQL O(n^2 log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation NOPE. I’M OUT
Early 2014: Our system was reaching its capacity MySQL O(n^2 log n) Ranking over hundreds of ActiveRecord objects gimme nearby hotels geolocation NOPE. I’M OUT
Later 2014: We wanted to expand our booking window from 1 to 7 days Same-day 6 more days, 7x data Advance booking HOW ARE WE GOING TO RUN GEO QUERIES!?
What were our choices? • More Caching? • Use OpenGIS on MySQL (geospatial index extension)? • Switch to PostgreSQL and use PostGIS? • Find something from Hacker News? • Use Elasticsearch? The full-text indexing engine?
Documents: { “_id” : 4492, “description: “The quick brown fox jumps over lazy dogs” }, { “_id” : 4493, “description: “The slow red fox doesn’t say anything” }
Independent filter caching ● queries cache individual filter matches* ● very fast to check if a document matches ● *but not geo, range or script filters
Elasticsearch orders documents by relevance ● Define your own scoring functions ● Let the Elasticsearch determine most relevant documents ● Don’t have to load ActiveRecord objects into memory to rank them anymore