Building Real World High Performance Application using Elastic Stack

Building real world high performance application using Elasticsearch Gaurav Bahrani
3rd Feb 2018, CTO, MeTripping

Introduction • Gaurav Bahrani, CTO, MeTripping ◦ Building intelligent search
engine for travel ◦ Expertise in building large scale distributed systems ▪ SQL, Nosql, Big Data ▪ Database engines ▪ Fault-tolerant systems ◦ Ex-VPE Cloud Lending Solutions (Fin-tech startup), Ex-Yahoo, Ex-MS, Ex-HP

Agenda 1. MeTripping - Introduction 2. MeTripping - Challenges 3.
New architecture - Elasticsearch way 4. Learnings 5. Best practices

MeTripping - Introduction (1)

MeTripping - Introduction (2) static data dynamic data

MeTripping - Challenges • Tons of dynamic data ◦ 50MB
of dynamic data per rank list page • Response time ◦ Static data + dynamic data + ML scoring < 30 - 45 secs (current performance) • Static data problem ◦ Multiple data sources and formats

MeTripping - Static data problem • Sources, formats, APIs Posgres
/ Mongo / Couchbase / ES src-1 src-2 src-3 src-4 UI

New Architecture • Merge data using data pipelines • Host
data in Elasticsearch • Elasticsearch usage ◦ Nosql DB ◦ Geo queries ◦ Indexing ◦ Scoring ◦ Auto-complete, Search suggestions ES src-1 src-2 src-3 src-4 UI data pipeline

New Architecture - Improvements seen • APIs reduced from 25+
to ~10 • Avg. response time < 200ms

Elasticsearch Setup • 2 node cluster ◦ t2.large (2 vCPUs,
8GB RAM, SSD) • Standard ES docker (elasticsearch:5.6.7) • Code: elasticsearch-dsl python package • Development Tool: Sense Chrome plugin (need to move Kibana) • Monitoring ◦ Prometheus exporter for ES (justwatch/elasticsearch_exporter:1.0.2) • Indexes ◦ Locations: 100K docs (100+ fields) ◦ Hotels: 2M docs (50+ fields) ◦ Routes: 10M docs (25+ fields)

Elasticsearch Learnings • Excellent Nosql DB • Better suited for
query performance ◦ 10s inserts / second vs. 1000s queries / secord • Ease of indexing ◦ No need to spend tons of efforts on query optimization • Custom scoring is extremely powerful (using painless scripting language)

Elasticsearch Best Practices • Avoid use of ‘type’ field •
Disable dynamic schema discovery in production • Index only required columns (default: true) ◦ Significantly improves insert performance • Include ‘doc_values’ where needed (default: true) • Understand ‘text’ vs. ‘keyword’ data type differences ◦ Use ‘text’ data type only where fuzzy match needed • Use manageable size shards • Use replicas (cluster) for redundancy, scalability, and performance • Use aliases for easy index switchover (eases index refreshes / upgrades) • System planning ◦ CPU: For typical use-cases (index and search) ES is extremely efficient, so low CPU needs ◦ Memory: For best performance, complete index should fit in system memory ◦ Hard disk: Use SSD. Plan spare capacity for Index upgrades.

Future tasks • Create ES indexes in data pipeline using
Spark • Elasticsearch as GraphDB

Thank You! Gaurav ([email protected])

Building Real World High Performance Applicatio...

Building Real World High Performance Application using Elastic Stack

Aravind Putrevu

More Decks by Aravind Putrevu

Other Decks in Technology

Featured

Transcript

Building real world high performance application using Elasticsearch Gaurav Bahrani

Introduction • Gaurav Bahrani, CTO, MeTripping ◦ Building intelligent search

Agenda 1. MeTripping - Introduction 2. MeTripping - Challenges 3.

MeTripping - Introduction (1)

MeTripping - Introduction (2) static data dynamic data

MeTripping - Challenges • Tons of dynamic data ◦ 50MB

MeTripping - Static data problem • Sources, formats, APIs Posgres

New Architecture • Merge data using data pipelines • Host

New Architecture - Improvements seen • APIs reduced from 25+

Elasticsearch Setup • 2 node cluster ◦ t2.large (2 vCPUs,

Elasticsearch Learnings • Excellent Nosql DB • Better suited for

Elasticsearch Best Practices • Avoid use of ‘type’ field •

Future tasks • Create ES indexes in data pipeline using

Q & A

Thank You! Gaurav ([email protected])