Slide 1

Slide 1 text

Search & Analytics with Elasticsearch Duy Do (@duydo) Barcamp Saigon 2016

Slide 2

Slide 2 text

Agenda Elasticsearch intro Use cases: GitHub, Sentifi and Uber Questions & Answers

Slide 3

Slide 3 text

About me ❏ A father, a husband and a software engineer ❏ Working with Elasticsearch since 2012 ❏ Creator of Vietnamese Elasticsearch community and analysis plugin ❏ Co-founder at Krom - a small, young startup ❏ Find @duydo on Twitter, GitHub, DuyDo.me or on the roads I run in the morning :-)

Slide 4

Slide 4 text

What is Elasticsearch?

Slide 5

Slide 5 text

In a sentence Elasticsearch is a distributed, search and analytics engine, designed for horizontal scalability with easy management.

Slide 6

Slide 6 text

in a nutshell ❏ Schema-less, JSON based document store ❏ Distributed and horizontal scalable ❏ Open source with Apache Licence 2.0 ❏ Built on top of Lucene, written in Java ❏ Extensible with plugin system ❏ Created by Shay Banon (@kimchy)

Slide 7

Slide 7 text

okay, tell us more...

Slide 8

Slide 8 text

unstructured (full-text) search

Slide 9

Slide 9 text

structured search

Slide 10

Slide 10 text

Sorting

Slide 11

Slide 11 text

Pagination

Slide 12

Slide 12 text

highlight

Slide 13

Slide 13 text

AGGREGATIONS

Slide 14

Slide 14 text

COOL! HOW about Scalability?

Slide 15

Slide 15 text

Run elasticsearch on your laptop or hundreds of servers with petabytes of data.

Slide 16

Slide 16 text

WONDERFUL! WE’RE EXCITED TO SEE Which problems elasticsearch can solve

Slide 17

Slide 17 text

product store Sell your products online Store product catalog & inventory Search & autocomplete suggestions Explore product category, material, brand Filter product by price, color, seller

Slide 18

Slide 18 text

log analytics Logstash Collect & parse your log or transaction data Mine for trends, statistics, summarizations, or anomalies

Slide 19

Slide 19 text

alerting Take action based on changes in your data Provide the capability for users to save searches in e-commerce website Monitors items purchased per minute and the number of items listed per minute

Slide 20

Slide 20 text

analytics/bi Investigate, Analyze, Visualize, Ad-hoc Queries Use Kibana to create custom dashboards to visualize your data Use wide range of aggregations to perform complex business intelligence queries

Slide 21

Slide 21 text

sounds great! We’re curious to know who uses elasticsearch for their business

Slide 22

Slide 22 text

ELASTICSEARCH IS EVERYWHERE

Slide 23

Slide 23 text

cool! show us some use cases in detail

Slide 24

Slide 24 text

Elasticsearch at GitHub

Slide 25

Slide 25 text

What is GitHub? GitHub is a web-based Git repository hosting service. ● Distributed version control and source code management ● Access control and several collaboration: bug tracking, feature requests, task management and wikis

Slide 26

Slide 26 text

The challenge How do you satisfy the search needs of GitHub's 4 million users while simultaneously providing tactical operational insights that help you iteratively improve customer service?

Slide 27

Slide 27 text

“Search is the core of GitHub” Tim Pease, Operation Engineer at GitHub

Slide 28

Slide 28 text

WHY ELASTICSEARCH?

Slide 29

Slide 29 text

Enable Powerful Search For Users And Developers ❏ Scale out to meet the needs of burgeoning user base by migrating away from Apache Solr to Elasticsearch ❏ Index and query almost any type of publicly exposed data ❏ Enable deep programmatic search for developer applications ❏ Provide near real-time indexing as soon as users upload new data

Slide 30

Slide 30 text

Leverage Analytics On Search Data ❏ Reveal rogue users by querying indexed logging data ❏ Find so ware bugs within the GitHub platform by indexing all alerts, events, logs and tracking the rate of specfic code exceptions ❏ Make queries that go beyond standard SQL

Slide 31

Slide 31 text

“You can do lots of queries on that data using Elasticsearch that a standard SQL database won’t support” Tim Pease, Operation Engineer at GitHub

Slide 32

Slide 32 text

4M+ USERS

Slide 33

Slide 33 text

8M+ CODE REPOSITORIES

Slide 34

Slide 34 text

2B+ ISSUES, PUll REQUESTS, WIKIS, SOURCE CODE

Slide 35

Slide 35 text

300+ AVG SEARCH REQUESTS PER MINUTE

Slide 36

Slide 36 text

Elasticsearch at Sentifi

Slide 37

Slide 37 text

What is sentifi? Sentifi is building the largest online ecosystem of crowd-experts and influencers in global financial markets.

Slide 38

Slide 38 text

The challenge How do you satisfy the search needs of users and the analysts while simultaneously providing financial insights, market intelligence for your customers?

Slide 39

Slide 39 text

“Analytics is the core of Sentifi” Duy Do, Former Software Engineer at Sentifi

Slide 40

Slide 40 text

WHY ELASTICSEARCH?

Slide 41

Slide 41 text

Enable Powerful Search For Users and Analysts ❏ Scale out to meet the needs of burgeoning publishers base by migrating away from MongoDB to Elasticsearch ❏ Index and query almost publishers data ❏ Detect similarity articles, tweets ❏ Provide near real-time indexing

Slide 42

Slide 42 text

Leverage Analytics on Publishers Data ❏ Build complex analytics using advanced queries and aggregations ❏ Monitor incoming messages

Slide 43

Slide 43 text

Search Suggestions Aggregations

Slide 44

Slide 44 text

Aggregations SIMILARity DETECTION STRUCTURED SEARCH

Slide 45

Slide 45 text

Aggregations

Slide 46

Slide 46 text

Aggregations

Slide 47

Slide 47 text

3,3M+ PUBLISHERS

Slide 48

Slide 48 text

150M+ ARTICLES, TWEETS PER MONTH

Slide 49

Slide 49 text

Elasticsearch at Uber

Slide 50

Slide 50 text

What is Uber? A location-based app that makes hiring an on-demand private driver easy. ● For riders, Uber is a taxi service ● For drivers, Uber allows you to be your own boss & pick your own hours

Slide 51

Slide 51 text

The challenges for storage system ❏ Data contains many dimensions, dozens of fields per event ❏ Granular data (hexagons, vehicle types, driver states, cities…) ❏ Unknown query patterns, any combination of dimensions ❏ Variety of aggregations (heatmap, top N, histogram, count(), avg(), sum(), percent(), geo) ❏ Large data volume (100Ks of events per sec or Bs of events per day)

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

10k HEXAGONS IN THE CITY

Slide 55

Slide 55 text

7 VEHICLE TYPES

Slide 56

Slide 56 text

13 DRIVER STATES

Slide 57

Slide 57 text

300 CITIES

Slide 58

Slide 58 text

1440 MINUTES PER DAY

Slide 59

Slide 59 text

393B POSSIBLE COMBINATIONS

Slide 60

Slide 60 text

minimal requirements ❏ OLAP with geospatial and time series support ❏ Support large amount of data ❏ Sub-second response time, fast scanning ❏ Wide range of aggregations ❏ Query of raw data

Slide 61

Slide 61 text

“it can’t be a kv store or relational database ” Danny Yuan, Software Engineer at Uber

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

Questions & Answers

Slide 64

Slide 64 text

THANK YOU! See you at Elasticsearch VN meetup https://facebook.com/groups/elasticsearchvn