Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic Stack: from Search to Analytics

Elastic Stack: from Search to Analytics

Thiago Souza

October 25, 2018
Tweet

More Decks by Thiago Souza

Other Decks in Technology

Transcript

  1. 2 Um pouco sobre mim... • +15 anos de experiência

    com TI • Trabalho com Elasticsearch desde 2010 (em produção desde 2013) • Sr. Support Engineer @ Elastic
  2. 5 Elasticsearch is an Open Source (Apache 2), Distributed, RESTful,

    Search Engine built on top of Lucene. http://www.elasticsearch.org - 2011
  3. 6 Apache LuceneTM is a high-performance, full-featured text search engine

    library written entirely in Java. https://lucene.apache.org/core
  4. 7 • Full-featured text search engine • Library written entirely

    in Java. ‒ Uma biblioteca Java off-line, acessada localmente. Elastic: from Search to Analytics Apache Lucene Id Texto Ids Termo 1 Amanhã vai chover no Rio de Janeiro 1 amanha 2 Rio de Janeiro tem muita praia! 1 chov 2 praia 1,2 rio ... Inverted Index
  5. 8 Elastic: from Search to Analytics Apache Lucene Amanhã vai

    chover no Rio de Janeiro Analyzer Term (Token) amanha vai chov rio janeiro
  6. 9 Elastic: from Search to Analytics Apache Lucene Amanhã vai

    chover no Rio de Janeiro Analyzer Term (Token) amanha vai chov rio janeiro Char Filter Tokenizer Token Filter 0 ... n 1 0 ... n Lucene Analyzer Pipeline
  7. 10 Lucene Index Elastic: from Search to Analytics Apache Lucene

    Documentos query: Amanhã chove no Rio A N A L Y Z E R id term 1,2,5,9,... term1 2,5,8,9,... term2 term amanha chov rio Documentos + Similarity Function score(query, doc) >= 0 TF-IDF BM25 etc ...
  8. 11 Elasticsearch is an Open Source (Apache 2), Distributed, RESTful,

    Search Engine built on top of Lucene. http://www.elasticsearch.org - 2011
  9. 12 Elastic: from Search to Analytics Elasticsearch $ curl -XPOST

    'localhost:9200/twitter/tweet/' -d '{ "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }' HTTP + JSON
  10. 13 Elastic: from Search to Analytics Elasticsearch • Index Sharding

    ‒ Divide os dados em N partições. ‒ Escalabilidade de Escrita. Index configurado com 3 shards e 1 réplica primário réplica Total 6 shards • Shard Replication ‒ Copia os dados em N partições. ‒ Escalabilidade de Leitura.
  11. 18 Elastic: from Search to Analytics Processamento Distribuído Coordinating Node

    Data Node Data Node Data Node query query partial results full (merged) results parallel execution Client
  12. 19 Elastic: from Search to Analytics Elasticsearch: v0.90.0 (2013) •

    Highly Available Distributed Search • RESTful API • Full Text Search • Multifaceted Search • Schemaless? (Schema-easy/lazy!)
  13. 20 "That's the end goal of Elasticsearch: we want to

    make data exploration, the ability to ahead and ask questions on your data and get results in milliseconds, available to end users." Shay Banon dotScale 2013
  14. 21 Elastic: from Search to Analytics Elasticsearch Aggregations • Generalização

    do Facets para framework de agregações • Facets => Terms Aggregations • Outras Agregações: Mínimo, Máximo, Histograma, Média, Desvio Padrão, etc... • Assim como as buscas, as agregações são executadas de forma distribuída (i.e. Map-Reduce like) • Lucene é também um storage orientado à coluna (i.e. Column Store)
  15. 22 Elastic: from Search to Analytics Column-Oriented Storage Row-Oriented Storage

    Column-Oriented Storage id Nome Idade Peso id Nome Idade Peso 1 João 34 81 1 João 34 81 X 2 Maria 51 65 2 Maria 51 65 3 José 53 76 3 José 53 76 Agregações mais rápidas AVG(Idade) = 46 SUM(Peso) = 222 https://www.elastic.co/blog/elasticsearch-as-a-column-store
  16. 23 Elastic: from Search to Analytics Elasticsearch Aggregations • Buckets

    Aggregations: Agrupa documentos em buckets, segundo algum critério. Exemplos: ‒ Histogram/Date Histogram ‒ Terms/Significant Terms ‒ Range ‒ GeoHash Grid ‒ etc... • Metrics Aggregations: Calcula métricas de documentos agrupados em buckets. Exemplos: ‒ Mínimo, Máximo, Média, etc… ‒ Percentis ‒ Geo Bounds/Geo Centroids ‒ Scripted ‒ etc... • Pipeline Aggregations: Agrega os valores de saída de outras agregações. Exemplos: ‒ Média Móvel ‒ Mínimo, Máximo, Média, etc... ‒ Percentis ‒ Soma Acumulativa ‒ etc...
  17. 24 Elastic: from Search to Analytics Elasticsearch Aggregations Log Acesso

    Website COUNT(*) = 3526 AVG(size) = 432.47 TERMS(browser) FIREFOX CHROME SAFARI COUNT(*) = 1492 MAX(version) = 1881 COUNT(*) = 1232 MAX(version) = 310 COUNT(*) = 802 MAX(version) = 9 ACCESS_LOG timestamp: datetime size: integer browser: string version: integer DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ...
  18. 25 Elastic: from Search to Analytics Pipeline Aggregations Log Acesso

    Website COUNT(*) = 3526 AVG(size) = 432.47 TERMS(browser) FIREFOX CHROME SAFARI COUNT(*) = 1492 MAX(version) = 1881 COUNT(*) = 1232 MAX(version) = 310 COUNT(*) = 802 MAX(version) = 9 MAX(COUNT(*)) = 1492 ACCESS_LOG timestamp: datetime size: integer browser: string version: integer DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ...
  19. Elastic Stack Store, Search, & Analyze Elasticsearch Visualize & Manage

    Kibana Beats Ingest Logstash Metrics Logging APM Site Search Application Search Business Analytics Enterprise Search Security Analytics Future Solutions SaaS Elastic Cloud Self Managed Elastic Cloud Enterprise Standalone Deployment
  20. 28 Filebeat Elastic: from Search to Analytics Beats: Lightweight Data

    Shipper Libbeat Logstash Kafka Elasticsearch Redis Winlogbeat Heartbeat Metricbeat Packetbeat Auditbeat
  21. 30 INPUTS FILTERS OUTPUTS Elastic: from Search to Analytics Logstash:

    Data Processing Pipeline Beats Redis Elasticsearch File Kafka Elasticsearch File Amazon S3 Kafka Redis Grok Geoip Heartbeat Csv Json +50 Input Plugins +40 Filter Plugins +50 Output Plugins
  22. 33 Muito Obrigado! DÚVIDAS? Thiago Souza [email protected] Elastic Engineer Training

    - São Paulo - 05/11 a 08/11 https://training.elastic.co/location/SaoPaulo Elastic Certified Engineer https://www.elastic.co/training/certification Elastic Community https://discuss.elastic.co