Solve your search & analytics problems with Elasticsearch

Search & Analytics with Elasticsearch Duy Do (@duydo) Barcamp Saigon
2016

Agenda Elasticsearch intro Use cases: GitHub, Sentifi and Uber Questions
& Answers

About me ❏ A father, a husband and a software
engineer ❏ Working with Elasticsearch since 2012 ❏ Creator of Vietnamese Elasticsearch community and analysis plugin ❏ Co-founder at Krom - a small, young startup ❏ Find @duydo on Twitter, GitHub, DuyDo.me or on the roads I run in the morning :-)

What is Elasticsearch?

In a sentence Elasticsearch is a distributed, search and analytics
engine, designed for horizontal scalability with easy management.

in a nutshell ❏ Schema-less, JSON based document store ❏
Distributed and horizontal scalable ❏ Open source with Apache Licence 2.0 ❏ Built on top of Lucene, written in Java ❏ Extensible with plugin system ❏ Created by Shay Banon (@kimchy)

okay, tell us more...

unstructured (full-text) search

structured search

Sorting

Pagination

highlight

AGGREGATIONS

COOL! HOW about Scalability?

Run elasticsearch on your laptop or hundreds of servers with
petabytes of data.

WONDERFUL! WE’RE EXCITED TO SEE Which problems elasticsearch can solve

product store Sell your products online Store product catalog &
inventory Search & autocomplete suggestions Explore product category, material, brand Filter product by price, color, seller

log analytics Logstash Collect & parse your log or transaction
data Mine for trends, statistics, summarizations, or anomalies

alerting Take action based on changes in your data Provide
the capability for users to save searches in e-commerce website Monitors items purchased per minute and the number of items listed per minute

analytics/bi Investigate, Analyze, Visualize, Ad-hoc Queries Use Kibana to create
custom dashboards to visualize your data Use wide range of aggregations to perform complex business intelligence queries

sounds great! We’re curious to know who uses elasticsearch for
their business

ELASTICSEARCH IS EVERYWHERE

cool! show us some use cases in detail

Elasticsearch at GitHub

What is GitHub? GitHub is a web-based Git repository hosting
service. • Distributed version control and source code management • Access control and several collaboration: bug tracking, feature requests, task management and wikis

The challenge How do you satisfy the search needs of
GitHub's 4 million users while simultaneously providing tactical operational insights that help you iteratively improve customer service?

“Search is the core of GitHub” Tim Pease, Operation Engineer
at GitHub

WHY ELASTICSEARCH?

Enable Powerful Search For Users And Developers ❏ Scale out
to meet the needs of burgeoning user base by migrating away from Apache Solr to Elasticsearch ❏ Index and query almost any type of publicly exposed data ❏ Enable deep programmatic search for developer applications ❏ Provide near real-time indexing as soon as users upload new data

Leverage Analytics On Search Data ❏ Reveal rogue users by
querying indexed logging data ❏ Find so ware bugs within the GitHub platform by indexing all alerts, events, logs and tracking the rate of specfic code exceptions ❏ Make queries that go beyond standard SQL

“You can do lots of queries on that data using
Elasticsearch that a standard SQL database won’t support” Tim Pease, Operation Engineer at GitHub

4M+ USERS

8M+ CODE REPOSITORIES

2B+ ISSUES, PUll REQUESTS, WIKIS, SOURCE CODE

300+ AVG SEARCH REQUESTS PER MINUTE

Elasticsearch at Sentifi

What is sentifi? Sentifi is building the largest online ecosystem
of crowd-experts and influencers in global financial markets.

The challenge How do you satisfy the search needs of
users and the analysts while simultaneously providing financial insights, market intelligence for your customers?

“Analytics is the core of Sentifi” Duy Do, Former Software
Engineer at Sentifi

WHY ELASTICSEARCH?

Enable Powerful Search For Users and Analysts ❏ Scale out
to meet the needs of burgeoning publishers base by migrating away from MongoDB to Elasticsearch ❏ Index and query almost publishers data ❏ Detect similarity articles, tweets ❏ Provide near real-time indexing

Leverage Analytics on Publishers Data ❏ Build complex analytics using
advanced queries and aggregations ❏ Monitor incoming messages

Search Suggestions Aggregations

Aggregations SIMILARity DETECTION STRUCTURED SEARCH

Aggregations

3,3M+ PUBLISHERS

150M+ ARTICLES, TWEETS PER MONTH

Elasticsearch at Uber

What is Uber? A location-based app that makes hiring an
on-demand private driver easy. • For riders, Uber is a taxi service • For drivers, Uber allows you to be your own boss & pick your own hours

The challenges for storage system ❏ Data contains many dimensions,
dozens of fields per event ❏ Granular data (hexagons, vehicle types, driver states, cities…) ❏ Unknown query patterns, any combination of dimensions ❏ Variety of aggregations (heatmap, top N, histogram, count(), avg(), sum(), percent(), geo) ❏ Large data volume (100Ks of events per sec or Bs of events per day)

10k HEXAGONS IN THE CITY

7 VEHICLE TYPES

13 DRIVER STATES

300 CITIES

1440 MINUTES PER DAY

393B POSSIBLE COMBINATIONS

minimal requirements ❏ OLAP with geospatial and time series support
❏ Support large amount of data ❏ Sub-second response time, fast scanning ❏ Wide range of aggregations ❏ Query of raw data

“it can’t be a kv store or relational database ”
Danny Yuan, Software Engineer at Uber

Questions & Answers

THANK YOU! See you at Elasticsearch VN meetup https://facebook.com/groups/elasticsearchvn

Solve your search & analytics problems with Ela...

Solve your search & analytics problems with Elasticsearch

More Decks by Duy Do

Other Decks in Technology

Featured

Transcript