Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic: From Search to Analytics

Elastic: From Search to Analytics

Thiago Souza

July 26, 2019
Tweet

More Decks by Thiago Souza

Other Decks in Technology

Transcript

  1. Elastic Stack: from Search to Analytics (or how the magic

    works) Thiago Souza Sr. Product Marketing Engineer [email protected]
  2. 2 Um pouco sobre mim... • +15 anos de experiência

    com TI • Ex-Cortexiano (onde comecei a trabalhar com Elasticsearch) • Ex-Support Engineer @ Elastic • Product Marketing Engineer @ Elastic
  3. 5 Elasticsearch is an Open Source (Apache 2), Distributed, RESTful,

    Search Engine built on top of Lucene. http://www.elasticsearch.org - 2011
  4. 6 Apache LuceneTM is a high-performance, full-featured text search engine

    library written entirely in Java. https://lucene.apache.org/core
  5. 7 • Full-featured text search engine • Library written entirely

    in Java. ‒ Uma biblioteca Java off-line, acessada localmente. Elastic: from Search to Analytics Apache Lucene Id Texto Ids Termo 1 Amanhã vai chover no Rio de Janeiro 1 amanha 2 Rio de Janeiro tem muita praia! 1 chov 2 praia 1,2 rio ... Inverted Index
  6. 8 Elastic: from Search to Analytics Apache Lucene Amanhã vai

    chover no Rio de Janeiro Analyzer Term (Token) amanha vai chov rio janeiro
  7. 9 Elastic: from Search to Analytics Apache Lucene Amanhã vai

    chover no Rio de Janeiro Analyzer Term (Token) amanha vai chov rio janeiro Char Filter Tokenizer Token Filter 0 ... n 1 0 ... n Lucene Analyzer Pipeline
  8. 10 Lucene Index Elastic: from Search to Analytics Apache Lucene

    Documentos query: Amanhã chove no Rio A N A L Y Z E R id term 1,2,5,9,... term1 2,5,8,9,... term2 term amanha chov rio Documentos + Similarity Function score(query, doc, corpus) >= 0 TF-IDF BM25 etc ...
  9. 11 Elasticsearch is an Open Source (Apache 2), Distributed, RESTful,

    Search Engine built on top of Lucene. http://www.elasticsearch.org - 2011
  10. 12 Elastic: from Search to Analytics Elasticsearch $ curl -XPOST

    'localhost:9200/twitter/tweet/' -d '{ "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }' HTTP + JSON
  11. 13 Elastic: from Search to Analytics Elasticsearch • Index Sharding

    ‒ Divide os dados em N partições. ‒ Escalabilidade de Escrita. Index configurado com 3 shards e 1 réplica primário réplica Total 6 shards • Shard Replication ‒ Copia os dados em N partições. ‒ Escalabilidade de Leitura.
  12. 18 Elastic: from Search to Analytics Processamento Distribuído Coordinating Node

    Data Node Data Node Data Node query query partial results full (merged) results parallel execution Client
  13. 19 Elastic: from Search to Analytics Elasticsearch: v0.90.0 (2013) •

    Highly Available Distributed Search • RESTful API • Full Text Search • Multifaceted Search • Schemaless? (Schema-easy/lazy!)
  14. Elastic Stack Store, Search, & Analyze Elasticsearch Visualize & Manage

    Kibana Beats Ingest Logstash Metrics Logging APM Site Search Application Search Business Analytics Enterprise Search Security Analytics Future Solutions SaaS Elastic Cloud Self Managed Elastic Cloud Enterprise Standalone Deployment
  15. 21 Elastic: from Search to Analytics Elasticsearch Aggregations • Generalização

    do Facets para framework de agregações • Facets => Terms Aggregations • Outras Agregações: Mínimo, Máximo, Histograma, Média, Desvio Padrão, etc... • Lucene é também um storage orientado à coluna (i.e. Column Store)
  16. 22 Elastic: from Search to Analytics Column-Oriented Storage Row-Oriented Storage

    Column-Oriented Storage id Nome Idade Peso id Nome Idade Peso 1 João 34 81 1 João 34 81 X 2 Maria 51 65 2 Maria 51 65 3 José 53 76 3 José 53 76 Agregações mais rápidas AVG(Idade) = 46 SUM(Peso) = 222 https://www.elastic.co/blog/elasticsearch-as-a-column-store
  17. 23 Elastic: from Search to Analytics Elasticsearch Aggregations • Buckets

    Aggregations: Agrupa documentos em buckets, segundo algum critério. Exemplos: ‒ Histogram/Date Histogram ‒ Terms/Significant Terms ‒ Range ‒ GeoHash Grid ‒ etc... • Metrics Aggregations: Calcula métricas de documentos agrupados em buckets. Exemplos: ‒ Mínimo, Máximo, Média, etc… ‒ Percentis ‒ Geo Bounds/Geo Centroids ‒ Scripted ‒ etc... • Pipeline Aggregations: Agrega os valores de saída de outras agregações. Exemplos: ‒ Média Móvel ‒ Mínimo, Máximo, Média, etc... ‒ Percentis ‒ Soma Acumulativa ‒ etc...
  18. 24 Elastic: from Search to Analytics Elasticsearch Aggregations Log Acesso

    Website COUNT(*) = 3526 AVG(size) = 432.47 TERMS(browser) FIREFOX CHROME SAFARI COUNT(*) = 1492 MAX(version) = 1881 COUNT(*) = 1232 MAX(version) = 310 COUNT(*) = 802 MAX(version) = 9 ACCESS_LOG timestamp: datetime size: integer browser: string version: integer DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ...
  19. 25 Elastic: from Search to Analytics Pipeline Aggregations Log Acesso

    Website COUNT(*) = 3526 AVG(size) = 432.47 TERMS(browser) FIREFOX CHROME SAFARI COUNT(*) = 1492 MAX(version) = 1881 COUNT(*) = 1232 MAX(version) = 310 COUNT(*) = 802 MAX(version) = 9 MAX(COUNT(*)) = 1492 ACCESS_LOG timestamp: datetime size: integer browser: string version: integer DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ...
  20. 27 Elastic: from Search to Analytics Geo Point Indexing (Bkd-Tree)

    term postings (doc ids) 1 1, 2, 3, 4, 5 10 1, 2, 4 11 3, 5 100 1 101 2, 4 111 3, 5 1000 2 1010 4 1011 3 1110 3 1111 5
  21. 29

  22. 30

  23. 31

  24. Obrigado! • Web : www.elastic.co • Products : https://www.elastic.co/products •

    Forums : https://discuss.elastic.co/ • Community : https://www.elastic.co/community/meetups • Twitter : @elastic ela.st/contributor-br