Where we were before Elasticsearch
● Used Solr
● Deployed servers under an ELB
● Multi datacenter hot-hot
● Re-build indices and ship them into production with 0 downtime
● We were growing and needed to change ….
4
Slide 5
Slide 5 text
Elasticsearch Features
Distributed Index
5
Data
node
Shard
Data
node
Shard
Data
node
Shard
Slide 6
Slide 6 text
Elasticsearch Features
Data sources
6
Java Client /
Hadoop
Connector
S3
Cassandra
SQS/SNS
Kinesis
Elasticsearch
Slide 7
Slide 7 text
Elasticsearch Features
Relational Data
7
Spatial
Temporal
Analytics
Food
Slide 8
Slide 8 text
Deployment, Discover & Upgrades
8
Slide 9
Slide 9 text
Elasticsearch Deployment
9
AWS Region
AWS Availability Zone
Elasticsearch
● AWS Cloud Plugin
● Netflix Eureka
● Metrics in Datadog
● Tags in Eureka
● Eureka aware Client
● Index & Node
Discovery
● Shard allocation
aware
● Snapshots in S3
Master
Node
Master
Node
Master
Node
Master
Node
Data
Nodes
Data
Nodes
Data
Nodes
Data
Nodes
Eureka Eureka
Eureka
Eureka
App
Master
Node
App
S3
Slide 10
Slide 10 text
Elasticsearch Snapshots
● Snapshots for everyone
● Emulating a Production dataset:
○ Snapshot of the market
○ Test out a sort
● Performance testing:
○ Index in production
○ Replay production load
10
Slide 11
Slide 11 text
Search and Relevance
11
Slide 12
Slide 12 text
Data Collection and Feedback
12
Search
Cassandra
S3
Elasticsearch
A B
TEST
Search
Logs
Clickstream
Events
Impressions
Data
Science
Slide 13
Slide 13 text
Improving Search Features
● Create consumable data
● What are your KPIs?
● What user attributes can you use to influence KPIs?
● Decouple moving parts to allow for independent testing
● Measure, make changes and improve
13
Slide 14
Slide 14 text
Tuning for Production
14
Slide 15
Slide 15 text
Metrics that matter
15
● Index store size and memory
● Active searcher threads
● Balance Read and Write
● Query distribution / Profiler
Slide 16
Slide 16 text
Balancing Shards
16
Slide 17
Slide 17 text
Location Based Sharding
17
● Balance data in each
shard.
● Tuned for index size.
● Tuned for query
throughput.
● Indices behind an
alias to allow for
greater flexibility.
Slide 18
Slide 18 text
Beyond Elasticsearch
18
● Impressions and click tracking
● Relevance tuning
● Data analytics
● Machine learning
● A/B testing