Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Solve your search & analytics problems with Elasticsearch

Duy Do
November 30, 2016

Solve your search & analytics problems with Elasticsearch

This slide was for my talk about Elasticsearch at Barcamp Saigon 2016.

Duy Do

November 30, 2016
Tweet

More Decks by Duy Do

Other Decks in Technology

Transcript

  1. About me ❏ A father, a husband and a software

    engineer ❏ Working with Elasticsearch since 2012 ❏ Creator of Vietnamese Elasticsearch community and analysis plugin ❏ Co-founder at Krom - a small, young startup ❏ Find @duydo on Twitter, GitHub, DuyDo.me or on the roads I run in the morning :-)
  2. In a sentence Elasticsearch is a distributed, search and analytics

    engine, designed for horizontal scalability with easy management.
  3. in a nutshell ❏ Schema-less, JSON based document store ❏

    Distributed and horizontal scalable ❏ Open source with Apache Licence 2.0 ❏ Built on top of Lucene, written in Java ❏ Extensible with plugin system ❏ Created by Shay Banon (@kimchy)
  4. product store Sell your products online Store product catalog &

    inventory Search & autocomplete suggestions Explore product category, material, brand Filter product by price, color, seller
  5. log analytics Logstash Collect & parse your log or transaction

    data Mine for trends, statistics, summarizations, or anomalies
  6. alerting Take action based on changes in your data Provide

    the capability for users to save searches in e-commerce website Monitors items purchased per minute and the number of items listed per minute
  7. analytics/bi Investigate, Analyze, Visualize, Ad-hoc Queries Use Kibana to create

    custom dashboards to visualize your data Use wide range of aggregations to perform complex business intelligence queries
  8. What is GitHub? GitHub is a web-based Git repository hosting

    service. • Distributed version control and source code management • Access control and several collaboration: bug tracking, feature requests, task management and wikis
  9. The challenge How do you satisfy the search needs of

    GitHub's 4 million users while simultaneously providing tactical operational insights that help you iteratively improve customer service?
  10. Enable Powerful Search For Users And Developers ❏ Scale out

    to meet the needs of burgeoning user base by migrating away from Apache Solr to Elasticsearch ❏ Index and query almost any type of publicly exposed data ❏ Enable deep programmatic search for developer applications ❏ Provide near real-time indexing as soon as users upload new data
  11. Leverage Analytics On Search Data ❏ Reveal rogue users by

    querying indexed logging data ❏ Find so ware bugs within the GitHub platform by indexing all alerts, events, logs and tracking the rate of specfic code exceptions ❏ Make queries that go beyond standard SQL
  12. “You can do lots of queries on that data using

    Elasticsearch that a standard SQL database won’t support” Tim Pease, Operation Engineer at GitHub
  13. What is sentifi? Sentifi is building the largest online ecosystem

    of crowd-experts and influencers in global financial markets.
  14. The challenge How do you satisfy the search needs of

    users and the analysts while simultaneously providing financial insights, market intelligence for your customers?
  15. Enable Powerful Search For Users and Analysts ❏ Scale out

    to meet the needs of burgeoning publishers base by migrating away from MongoDB to Elasticsearch ❏ Index and query almost publishers data ❏ Detect similarity articles, tweets ❏ Provide near real-time indexing
  16. Leverage Analytics on Publishers Data ❏ Build complex analytics using

    advanced queries and aggregations ❏ Monitor incoming messages
  17. What is Uber? A location-based app that makes hiring an

    on-demand private driver easy. • For riders, Uber is a taxi service • For drivers, Uber allows you to be your own boss & pick your own hours
  18. The challenges for storage system ❏ Data contains many dimensions,

    dozens of fields per event ❏ Granular data (hexagons, vehicle types, driver states, cities…) ❏ Unknown query patterns, any combination of dimensions ❏ Variety of aggregations (heatmap, top N, histogram, count(), avg(), sum(), percent(), geo) ❏ Large data volume (100Ks of events per sec or Bs of events per day)
  19. minimal requirements ❏ OLAP with geospatial and time series support

    ❏ Support large amount of data ❏ Sub-second response time, fast scanning ❏ Wide range of aggregations ❏ Query of raw data
  20. “it can’t be a kv store or relational database ”

    Danny Yuan, Software Engineer at Uber