Solve your search & analytics problems with Elasticsearch

6ac96bcb854145bb47eac6fa80b50d44?s=47 Duy Do
November 30, 2016

Solve your search & analytics problems with Elasticsearch

This slide was for my talk about Elasticsearch at Barcamp Saigon 2016.

6ac96bcb854145bb47eac6fa80b50d44?s=128

Duy Do

November 30, 2016
Tweet

Transcript

  1. Search & Analytics with Elasticsearch Duy Do (@duydo) Barcamp Saigon

    2016
  2. Agenda Elasticsearch intro Use cases: GitHub, Sentifi and Uber Questions

    & Answers
  3. About me ❏ A father, a husband and a software

    engineer ❏ Working with Elasticsearch since 2012 ❏ Creator of Vietnamese Elasticsearch community and analysis plugin ❏ Co-founder at Krom - a small, young startup ❏ Find @duydo on Twitter, GitHub, DuyDo.me or on the roads I run in the morning :-)
  4. What is Elasticsearch?

  5. In a sentence Elasticsearch is a distributed, search and analytics

    engine, designed for horizontal scalability with easy management.
  6. in a nutshell ❏ Schema-less, JSON based document store ❏

    Distributed and horizontal scalable ❏ Open source with Apache Licence 2.0 ❏ Built on top of Lucene, written in Java ❏ Extensible with plugin system ❏ Created by Shay Banon (@kimchy)
  7. okay, tell us more...

  8. unstructured (full-text) search

  9. structured search

  10. Sorting

  11. Pagination

  12. highlight

  13. AGGREGATIONS

  14. COOL! HOW about Scalability?

  15. Run elasticsearch on your laptop or hundreds of servers with

    petabytes of data.
  16. WONDERFUL! WE’RE EXCITED TO SEE Which problems elasticsearch can solve

  17. product store Sell your products online Store product catalog &

    inventory Search & autocomplete suggestions Explore product category, material, brand Filter product by price, color, seller
  18. log analytics Logstash Collect & parse your log or transaction

    data Mine for trends, statistics, summarizations, or anomalies
  19. alerting Take action based on changes in your data Provide

    the capability for users to save searches in e-commerce website Monitors items purchased per minute and the number of items listed per minute
  20. analytics/bi Investigate, Analyze, Visualize, Ad-hoc Queries Use Kibana to create

    custom dashboards to visualize your data Use wide range of aggregations to perform complex business intelligence queries
  21. sounds great! We’re curious to know who uses elasticsearch for

    their business
  22. ELASTICSEARCH IS EVERYWHERE

  23. cool! show us some use cases in detail

  24. Elasticsearch at GitHub

  25. What is GitHub? GitHub is a web-based Git repository hosting

    service. • Distributed version control and source code management • Access control and several collaboration: bug tracking, feature requests, task management and wikis
  26. The challenge How do you satisfy the search needs of

    GitHub's 4 million users while simultaneously providing tactical operational insights that help you iteratively improve customer service?
  27. “Search is the core of GitHub” Tim Pease, Operation Engineer

    at GitHub
  28. WHY ELASTICSEARCH?

  29. Enable Powerful Search For Users And Developers ❏ Scale out

    to meet the needs of burgeoning user base by migrating away from Apache Solr to Elasticsearch ❏ Index and query almost any type of publicly exposed data ❏ Enable deep programmatic search for developer applications ❏ Provide near real-time indexing as soon as users upload new data
  30. Leverage Analytics On Search Data ❏ Reveal rogue users by

    querying indexed logging data ❏ Find so ware bugs within the GitHub platform by indexing all alerts, events, logs and tracking the rate of specfic code exceptions ❏ Make queries that go beyond standard SQL
  31. “You can do lots of queries on that data using

    Elasticsearch that a standard SQL database won’t support” Tim Pease, Operation Engineer at GitHub
  32. 4M+ USERS

  33. 8M+ CODE REPOSITORIES

  34. 2B+ ISSUES, PUll REQUESTS, WIKIS, SOURCE CODE

  35. 300+ AVG SEARCH REQUESTS PER MINUTE

  36. Elasticsearch at Sentifi

  37. What is sentifi? Sentifi is building the largest online ecosystem

    of crowd-experts and influencers in global financial markets.
  38. The challenge How do you satisfy the search needs of

    users and the analysts while simultaneously providing financial insights, market intelligence for your customers?
  39. “Analytics is the core of Sentifi” Duy Do, Former Software

    Engineer at Sentifi
  40. WHY ELASTICSEARCH?

  41. Enable Powerful Search For Users and Analysts ❏ Scale out

    to meet the needs of burgeoning publishers base by migrating away from MongoDB to Elasticsearch ❏ Index and query almost publishers data ❏ Detect similarity articles, tweets ❏ Provide near real-time indexing
  42. Leverage Analytics on Publishers Data ❏ Build complex analytics using

    advanced queries and aggregations ❏ Monitor incoming messages
  43. Search Suggestions Aggregations

  44. Aggregations SIMILARity DETECTION STRUCTURED SEARCH

  45. Aggregations

  46. Aggregations

  47. 3,3M+ PUBLISHERS

  48. 150M+ ARTICLES, TWEETS PER MONTH

  49. Elasticsearch at Uber

  50. What is Uber? A location-based app that makes hiring an

    on-demand private driver easy. • For riders, Uber is a taxi service • For drivers, Uber allows you to be your own boss & pick your own hours
  51. The challenges for storage system ❏ Data contains many dimensions,

    dozens of fields per event ❏ Granular data (hexagons, vehicle types, driver states, cities…) ❏ Unknown query patterns, any combination of dimensions ❏ Variety of aggregations (heatmap, top N, histogram, count(), avg(), sum(), percent(), geo) ❏ Large data volume (100Ks of events per sec or Bs of events per day)
  52. None
  53. None
  54. 10k HEXAGONS IN THE CITY

  55. 7 VEHICLE TYPES

  56. 13 DRIVER STATES

  57. 300 CITIES

  58. 1440 MINUTES PER DAY

  59. 393B POSSIBLE COMBINATIONS

  60. minimal requirements ❏ OLAP with geospatial and time series support

    ❏ Support large amount of data ❏ Sub-second response time, fast scanning ❏ Wide range of aggregations ❏ Query of raw data
  61. “it can’t be a kv store or relational database ”

    Danny Yuan, Software Engineer at Uber
  62. None
  63. Questions & Answers

  64. THANK YOU! See you at Elasticsearch VN meetup https://facebook.com/groups/elasticsearchvn