Simplicity
1. Few taps and a swipe.
2. Just a small list of hotels.
3. Just the very best hotels for you.
4. Fast.
Slide 12
Slide 12 text
Perishable Inventory
1. Availability and pricing changes all the time.
2. Real-time
Slide 13
Slide 13 text
Ad
Slide 14
Slide 14 text
Jatinder Singh
Director of Engineering,
Platform
Twitter
@rubymerchant
Email
[email protected]
Slide 15
Slide 15 text
Finding the Best Hotels in
the Moment
How HotelTonight uses Elasticsearch to power its
hotel search algorithm
Slide 16
Slide 16 text
Hi
Slide 17
Slide 17 text
About Me (Paul Sorensen)
● Platform Engineer at HotelTonight
● Work on our hotel ranking algorithm using Elasticsearch
● Currently fascinated with scaling web apps
twitter: @paulnsorensen
Slide 18
Slide 18 text
● Hotels compete for display
● We show you the best deals
this is where the ranking
comes in
● Book up to 7 days in
advance for up to a 5 night
stay
About HotelTonight
Slide 19
Slide 19 text
What if I told you...
We increased our inventory records by 50x
Our system can handle 10x more traffic
We cut our response times by 150%
That’s what we did.
Slide 20
Slide 20 text
How did we do it?
gem install elasticsearch
rake scale:hotels That’s it.
THANKS FOR
COMING!!!
PROFIT
JUST KIDDING
Slide 21
Slide 21 text
there is no
silver bullet
Slide 22
Slide 22 text
scaling is
hard
Slide 23
Slide 23 text
Scope
Slide 24
Slide 24 text
Overview
Why Elasticsearch?
What is Elasticsearch?
The awesome challenges we get to work on
Slide 25
Slide 25 text
What this is not
Not a technical deep dive
Not an objective comparison between tech
Slide 26
Slide 26 text
Impetus
Slide 27
Slide 27 text
we grew up
Slide 28
Slide 28 text
from 3 cities to
2000 cities
Slide 29
Slide 29 text
we’re never not
booking rooms
Slide 30
Slide 30 text
Early 2014: Our system was reaching its capacity
MySQL
O(n^2 log n)
Ranking over hundreds of
ActiveRecord objects
gimme nearby hotels
geolocation query
Slide 31
Slide 31 text
Early 2014: Our system was reaching its capacity
MySQL
O(n^2 log n)
Ranking over hundreds of
ActiveRecord objects
gimme nearby hotels
geolocation query
geolocation query
Slide 32
Slide 32 text
Early 2014: Our system was reaching its capacity
MySQL
O(n^2 log n)
Ranking over hundreds of
ActiveRecord objects
gimme nearby hotels
geolocation query
geolocation query
geolocation query
Slide 33
Slide 33 text
Early 2014: Our system was reaching its capacity
MySQL
O(n^2 log n)
Ranking over hundreds of
ActiveRecord objects
gimme nearby hotels
geolocation query
geolocation query
geolocation query
Slide 34
Slide 34 text
Early 2014: Our system was reaching its capacity
MySQL
O(n^2 log n)
Ranking over hundreds of
ActiveRecord objects
gimme nearby hotels
geolocation
NOPE.
I’M OUT
Slide 35
Slide 35 text
Early 2014: Our system was reaching its capacity
MySQL
O(n^2 log n)
Ranking over hundreds of
ActiveRecord objects
gimme nearby hotels
geolocation
NOPE.
I’M OUT
Slide 36
Slide 36 text
Early 2014: Our system was reaching its capacity
MySQL
O(n^2 log n)
Ranking over hundreds of
ActiveRecord objects
gimme nearby hotels
geolocation
NOPE.
I’M OUT
Slide 37
Slide 37 text
we cannot
have
downtime
Slide 38
Slide 38 text
meanwhile
Slide 39
Slide 39 text
we still
needed to
grow
Slide 40
Slide 40 text
Later 2014: We wanted to expand our
booking window from 1 to 7 days
Same-day
6 more days, 7x data
Advance booking
HOW ARE WE GOING TO RUN GEO QUERIES!?
Slide 41
Slide 41 text
scaling is
hard
Slide 42
Slide 42 text
scaling is
unique
Slide 43
Slide 43 text
What were our choices?
• More Caching?
• Use OpenGIS on MySQL (geospatial index extension)?
• Switch to PostgreSQL and use PostGIS?
• Find something from Hacker News?
• Use Elasticsearch? The full-text indexing engine?
Slide 44
Slide 44 text
Elasticsearch Use cases
● Full-text search
● Analytics: Elastic’s ELK (Elasticsearch, Logstash, Kibana)
● Spell-checking, Autocomplete
● Ranking hotel rooms?
Slide 45
Slide 45 text
Elasticsearch!
Slide 46
Slide 46 text
how does
elasticsearch work?
Slide 47
Slide 47 text
Documents:
{
“_id” : 4492,
“description: “The quick brown fox jumps over lazy dogs”
},
{
“_id” : 4493,
“description: “The slow red fox doesn’t say anything”
}
Inverted index:
{
“fox” => [4492, 4493],
“brown” => [4492],
“red” => [4493],
}
How is it stored?
THIS MAKES IT FAST
Slide 50
Slide 50 text
Elasticsearch supports many filters
A few examples we can use:
● term - exact match
● bool - combine filters
● various geo filters
● range
Slide 51
Slide 51 text
Independent filter caching
● queries cache individual filter matches*
● very fast to check if a document matches
● *but not geo, range or script filters
Slide 52
Slide 52 text
how can we use this?
Slide 53
Slide 53 text
run cheap filters first
THEN run geo
Slide 54
Slide 54 text
but wait, how do you
rank documents?
Slide 55
Slide 55 text
Elasticsearch orders documents by relevance
● Define your own scoring functions
● Let the Elasticsearch determine most relevant documents
● Don’t have to load ActiveRecord objects into memory to
rank them anymore
Slide 56
Slide 56 text
less memory == faster
Slide 57
Slide 57 text
we wanna go fast
Slide 58
Slide 58 text
Alright — Let’s use
Elasticsearch
Slide 59
Slide 59 text
✓ prototype
✓ perf test
✓ provision it
Slide 60
Slide 60 text
How it’s designed
Docs
MySQL
price updates $$
Elasticsearch
denormalization
Slide 61
Slide 61 text
How it’s designed
Elasticsearch
MySQL
generate
response
generate query
Slide 62
Slide 62 text
Our biggest challenge
Slide 63
Slide 63 text
Elasticsearch
MySQL
must be kept in sync
Slide 64
Slide 64 text
Elasticsearch
MySQL
changing fields on a document type requires new index
Elasticsearch
Slide 65
Slide 65 text
If Elasticsearch goes down, we go down
Elasticsearch
MySQL
generate
response
generate query
Slide 66
Slide 66 text
If Elasticsearch goes down, we go down
Elasticsearch
MySQL
generate
response
generate query
Slide 67
Slide 67 text
we cannot
have
downtime
Slide 68
Slide 68 text
we cannot
have
inconsistency
Slide 69
Slide 69 text
● We have to minimize consistency delays
● Defend against them when they do happen
● Zero-downtime mapping changes
We are conquering these challenges
Slide 70
Slide 70 text
Zero-downtime Mapping Changes
Docs
MySQL Elasticsearch
denormalization
Elasticsearch
track changes
load documents from
database
Slide 71
Slide 71 text
scaling is
hard
Slide 72
Slide 72 text
scaling is
unique
Slide 73
Slide 73 text
More is always sometimes better
6 more days of booking
50x inventory
10x traffic
150% quicker response times
PROFIT
Slide 74
Slide 74 text
scaling is
awesome
Slide 75
Slide 75 text
Thanks
Try Elasticsearch
(with us? we’re hiring)
Twitter
@paulnsorensen
Email
[email protected]
$25 Off First Booking
PAUL