Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Solve your search & analytics problems with Elasticsearch

Duy Do
November 30, 2016

Solve your search & analytics problems with Elasticsearch

This slide was for my talk about Elasticsearch at Barcamp Saigon 2016.

Duy Do

November 30, 2016
Tweet

More Decks by Duy Do

Other Decks in Technology

Transcript

  1. Search & Analytics
    with Elasticsearch
    Duy Do (@duydo)
    Barcamp Saigon 2016

    View full-size slide

  2. Agenda
    Elasticsearch intro
    Use cases: GitHub,
    Sentifi and Uber
    Questions & Answers

    View full-size slide

  3. About me
    ❏ A father, a husband and a software engineer
    ❏ Working with Elasticsearch since 2012
    ❏ Creator of Vietnamese Elasticsearch community and
    analysis plugin
    ❏ Co-founder at Krom - a small, young startup
    ❏ Find @duydo on Twitter, GitHub, DuyDo.me or on the roads
    I run in the morning :-)

    View full-size slide

  4. What is Elasticsearch?

    View full-size slide

  5. In a sentence
    Elasticsearch is a distributed,
    search and analytics engine,
    designed for horizontal scalability
    with easy management.

    View full-size slide

  6. in a nutshell
    ❏ Schema-less, JSON based document store
    ❏ Distributed and horizontal scalable
    ❏ Open source with Apache Licence 2.0
    ❏ Built on top of Lucene, written in Java
    ❏ Extensible with plugin system
    ❏ Created by Shay Banon (@kimchy)

    View full-size slide

  7. okay, tell us more...

    View full-size slide

  8. unstructured (full-text) search

    View full-size slide

  9. structured search

    View full-size slide

  10. AGGREGATIONS

    View full-size slide

  11. COOL! HOW about Scalability?

    View full-size slide

  12. Run elasticsearch on your laptop
    or hundreds of servers
    with petabytes of data.

    View full-size slide

  13. WONDERFUL! WE’RE EXCITED TO SEE
    Which problems elasticsearch can solve

    View full-size slide

  14. product store
    Sell your products online
    Store product catalog &
    inventory
    Search & autocomplete
    suggestions
    Explore product category,
    material, brand
    Filter product by price,
    color, seller

    View full-size slide

  15. log analytics
    Logstash
    Collect & parse your log
    or transaction data
    Mine for trends,
    statistics,
    summarizations, or
    anomalies

    View full-size slide

  16. alerting
    Take action based on changes in your
    data
    Provide the capability for
    users to save searches in
    e-commerce website
    Monitors items purchased
    per minute and the number
    of items listed per minute

    View full-size slide

  17. analytics/bi
    Investigate, Analyze, Visualize,
    Ad-hoc Queries
    Use Kibana to create
    custom dashboards to
    visualize your data
    Use wide range of
    aggregations to perform
    complex business
    intelligence queries

    View full-size slide

  18. sounds great! We’re curious to know
    who uses elasticsearch for their business

    View full-size slide

  19. ELASTICSEARCH IS EVERYWHERE

    View full-size slide

  20. cool! show us some use cases in detail

    View full-size slide

  21. Elasticsearch at GitHub

    View full-size slide

  22. What is GitHub?
    GitHub is a web-based Git repository hosting
    service.
    ● Distributed version control and source code
    management
    ● Access control and several collaboration:
    bug tracking, feature requests, task
    management and wikis

    View full-size slide

  23. The challenge
    How do you satisfy the search needs of
    GitHub's 4 million users while simultaneously
    providing tactical operational insights that
    help you iteratively improve customer
    service?

    View full-size slide

  24. “Search is the core of GitHub”
    Tim Pease, Operation Engineer at GitHub

    View full-size slide

  25. WHY ELASTICSEARCH?

    View full-size slide

  26. Enable Powerful Search For Users And Developers
    ❏ Scale out to meet the needs of burgeoning user base by
    migrating away from Apache Solr to Elasticsearch
    ❏ Index and query almost any type of publicly exposed data
    ❏ Enable deep programmatic search for developer
    applications
    ❏ Provide near real-time indexing as soon as users upload
    new data

    View full-size slide

  27. Leverage Analytics On Search Data
    ❏ Reveal rogue users by querying indexed logging data
    ❏ Find so ware bugs within the GitHub platform by indexing
    all alerts, events, logs and tracking the rate of specfic
    code exceptions
    ❏ Make queries that go beyond standard SQL

    View full-size slide

  28. “You can do lots of queries on that
    data using Elasticsearch that a
    standard SQL database won’t support”
    Tim Pease, Operation Engineer at GitHub

    View full-size slide

  29. 8M+
    CODE REPOSITORIES

    View full-size slide

  30. 2B+
    ISSUES, PUll REQUESTS, WIKIS, SOURCE CODE

    View full-size slide

  31. 300+
    AVG SEARCH REQUESTS PER MINUTE

    View full-size slide

  32. Elasticsearch at Sentifi

    View full-size slide

  33. What is sentifi?
    Sentifi is building the largest online
    ecosystem of crowd-experts and influencers in
    global financial markets.

    View full-size slide

  34. The challenge
    How do you satisfy the search needs of users
    and the analysts while simultaneously
    providing financial insights, market
    intelligence for your customers?

    View full-size slide

  35. “Analytics is the core of Sentifi”
    Duy Do, Former Software Engineer at Sentifi

    View full-size slide

  36. WHY ELASTICSEARCH?

    View full-size slide

  37. Enable Powerful Search For Users and Analysts
    ❏ Scale out to meet the needs of burgeoning publishers base
    by migrating away from MongoDB to Elasticsearch
    ❏ Index and query almost publishers data
    ❏ Detect similarity articles, tweets
    ❏ Provide near real-time indexing

    View full-size slide

  38. Leverage Analytics on Publishers Data
    ❏ Build complex analytics using advanced queries and
    aggregations
    ❏ Monitor incoming messages

    View full-size slide

  39. Search
    Suggestions
    Aggregations

    View full-size slide

  40. Aggregations
    SIMILARity DETECTION
    STRUCTURED SEARCH

    View full-size slide

  41. Aggregations

    View full-size slide

  42. Aggregations

    View full-size slide

  43. 3,3M+
    PUBLISHERS

    View full-size slide

  44. 150M+
    ARTICLES, TWEETS PER MONTH

    View full-size slide

  45. Elasticsearch at Uber

    View full-size slide

  46. What is Uber?
    A location-based app that makes hiring an
    on-demand private driver easy.
    ● For riders, Uber is a taxi service
    ● For drivers, Uber allows you to be your own
    boss & pick your own hours

    View full-size slide

  47. The challenges for storage system
    ❏ Data contains many dimensions, dozens of fields per event
    ❏ Granular data (hexagons, vehicle types, driver states,
    cities…)
    ❏ Unknown query patterns, any combination of dimensions
    ❏ Variety of aggregations (heatmap, top N, histogram,
    count(), avg(), sum(), percent(), geo)
    ❏ Large data volume (100Ks of events per sec or Bs of
    events per day)

    View full-size slide

  48. 10k
    HEXAGONS IN THE CITY

    View full-size slide

  49. 7
    VEHICLE TYPES

    View full-size slide

  50. 13
    DRIVER STATES

    View full-size slide

  51. 1440
    MINUTES PER DAY

    View full-size slide

  52. 393B
    POSSIBLE COMBINATIONS

    View full-size slide

  53. minimal requirements
    ❏ OLAP with geospatial and time series support
    ❏ Support large amount of data
    ❏ Sub-second response time, fast scanning
    ❏ Wide range of aggregations
    ❏ Query of raw data

    View full-size slide

  54. “it can’t be a kv store or relational database ”
    Danny Yuan, Software Engineer at Uber

    View full-size slide

  55. Questions & Answers

    View full-size slide

  56. THANK YOU!
    See you at Elasticsearch VN meetup
    https://facebook.com/groups/elasticsearchvn

    View full-size slide