Slide 1

Slide 1 text

Elastic Stack: from Search to Analytics Thiago Souza Sr. Support Engineer thiago@elastic.co

Slide 2

Slide 2 text

2 Um pouco sobre mim... • +15 anos de experiência com TI • Trabalho com Elasticsearch desde 2010 (em produção desde 2013) • Sr. Support Engineer @ Elastic

Slide 3

Slide 3 text

3 100,000+ Community Members 250M+ Product Downloads 5,000+ Subscription Customers Statistics since 2012, founding of Elastic

Slide 4

Slide 4 text

Tech Finance Telco Consumer 4 Enterprise Customers in Every Industry

Slide 5

Slide 5 text

5 Elasticsearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Lucene. http://www.elasticsearch.org - 2011

Slide 6

Slide 6 text

6 Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. https://lucene.apache.org/core

Slide 7

Slide 7 text

7 • Full-featured text search engine • Library written entirely in Java. ‒ Uma biblioteca Java off-line, acessada localmente. Elastic: from Search to Analytics Apache Lucene Id Texto Ids Termo 1 Amanhã vai chover no Rio de Janeiro 1 amanha 2 Rio de Janeiro tem muita praia! 1 chov 2 praia 1,2 rio ... Inverted Index

Slide 8

Slide 8 text

8 Elastic: from Search to Analytics Apache Lucene Amanhã vai chover no Rio de Janeiro Analyzer Term (Token) amanha vai chov rio janeiro

Slide 9

Slide 9 text

9 Elastic: from Search to Analytics Apache Lucene Amanhã vai chover no Rio de Janeiro Analyzer Term (Token) amanha vai chov rio janeiro Char Filter Tokenizer Token Filter 0 ... n 1 0 ... n Lucene Analyzer Pipeline

Slide 10

Slide 10 text

10 Lucene Index Elastic: from Search to Analytics Apache Lucene Documentos query: Amanhã chove no Rio A N A L Y Z E R id term 1,2,5,9,... term1 2,5,8,9,... term2 term amanha chov rio Documentos + Similarity Function score(query, doc) >= 0 TF-IDF BM25 etc ...

Slide 11

Slide 11 text

11 Elasticsearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Lucene. http://www.elasticsearch.org - 2011

Slide 12

Slide 12 text

12 Elastic: from Search to Analytics Elasticsearch $ curl -XPOST 'localhost:9200/twitter/tweet/' -d '{ "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }' HTTP + JSON

Slide 13

Slide 13 text

13 Elastic: from Search to Analytics Elasticsearch • Index Sharding ‒ Divide os dados em N partições. ‒ Escalabilidade de Escrita. Index configurado com 3 shards e 1 réplica primário réplica Total 6 shards • Shard Replication ‒ Copia os dados em N partições. ‒ Escalabilidade de Leitura.

Slide 14

Slide 14 text

14 Elastic: from Search to Analytics Elasticsearch node 1

Slide 15

Slide 15 text

15 node 2 node 1 Elastic: from Search to Analytics Elasticsearch

Slide 16

Slide 16 text

16 node 2 node 1 Elastic: from Search to Analytics Elasticsearch node 3

Slide 17

Slide 17 text

17 node 2 node 1 Elastic: from Search to Analytics Elasticsearch node 3 node 4

Slide 18

Slide 18 text

18 Elastic: from Search to Analytics Processamento Distribuído Coordinating Node Data Node Data Node Data Node query query partial results full (merged) results parallel execution Client

Slide 19

Slide 19 text

19 Elastic: from Search to Analytics Elasticsearch: v0.90.0 (2013) • Highly Available Distributed Search • RESTful API • Full Text Search • Multifaceted Search • Schemaless? (Schema-easy/lazy!)

Slide 20

Slide 20 text

20 "That's the end goal of Elasticsearch: we want to make data exploration, the ability to ahead and ask questions on your data and get results in milliseconds, available to end users." Shay Banon dotScale 2013

Slide 21

Slide 21 text

21 Elastic: from Search to Analytics Elasticsearch Aggregations • Generalização do Facets para framework de agregações • Facets => Terms Aggregations • Outras Agregações: Mínimo, Máximo, Histograma, Média, Desvio Padrão, etc... • Assim como as buscas, as agregações são executadas de forma distribuída (i.e. Map-Reduce like) • Lucene é também um storage orientado à coluna (i.e. Column Store)

Slide 22

Slide 22 text

22 Elastic: from Search to Analytics Column-Oriented Storage Row-Oriented Storage Column-Oriented Storage id Nome Idade Peso id Nome Idade Peso 1 João 34 81 1 João 34 81 X 2 Maria 51 65 2 Maria 51 65 3 José 53 76 3 José 53 76 Agregações mais rápidas AVG(Idade) = 46 SUM(Peso) = 222 https://www.elastic.co/blog/elasticsearch-as-a-column-store

Slide 23

Slide 23 text

23 Elastic: from Search to Analytics Elasticsearch Aggregations • Buckets Aggregations: Agrupa documentos em buckets, segundo algum critério. Exemplos: ‒ Histogram/Date Histogram ‒ Terms/Significant Terms ‒ Range ‒ GeoHash Grid ‒ etc... • Metrics Aggregations: Calcula métricas de documentos agrupados em buckets. Exemplos: ‒ Mínimo, Máximo, Média, etc… ‒ Percentis ‒ Geo Bounds/Geo Centroids ‒ Scripted ‒ etc... • Pipeline Aggregations: Agrega os valores de saída de outras agregações. Exemplos: ‒ Média Móvel ‒ Mínimo, Máximo, Média, etc... ‒ Percentis ‒ Soma Acumulativa ‒ etc...

Slide 24

Slide 24 text

24 Elastic: from Search to Analytics Elasticsearch Aggregations Log Acesso Website COUNT(*) = 3526 AVG(size) = 432.47 TERMS(browser) FIREFOX CHROME SAFARI COUNT(*) = 1492 MAX(version) = 1881 COUNT(*) = 1232 MAX(version) = 310 COUNT(*) = 802 MAX(version) = 9 ACCESS_LOG timestamp: datetime size: integer browser: string version: integer DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ...

Slide 25

Slide 25 text

25 Elastic: from Search to Analytics Pipeline Aggregations Log Acesso Website COUNT(*) = 3526 AVG(size) = 432.47 TERMS(browser) FIREFOX CHROME SAFARI COUNT(*) = 1492 MAX(version) = 1881 COUNT(*) = 1232 MAX(version) = 310 COUNT(*) = 802 MAX(version) = 9 MAX(COUNT(*)) = 1492 ACCESS_LOG timestamp: datetime size: integer browser: string version: integer DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ... DATE_HISTOGRAM(hour, timestamp) ...

Slide 26

Slide 26 text

Elastic Stack Store, Search, & Analyze Elasticsearch Visualize & Manage Kibana Beats Ingest Logstash Metrics Logging APM Site Search Application Search Business Analytics Enterprise Search Security Analytics Future Solutions SaaS Elastic Cloud Self Managed Elastic Cloud Enterprise Standalone Deployment

Slide 27

Slide 27 text

Beats: Lightweight Data Shipper

Slide 28

Slide 28 text

28 Filebeat Elastic: from Search to Analytics Beats: Lightweight Data Shipper Libbeat Logstash Kafka Elasticsearch Redis Winlogbeat Heartbeat Metricbeat Packetbeat Auditbeat

Slide 29

Slide 29 text

Logstash: Data Processing Pipeline

Slide 30

Slide 30 text

30 INPUTS FILTERS OUTPUTS Elastic: from Search to Analytics Logstash: Data Processing Pipeline Beats Redis Elasticsearch File Kafka Elasticsearch File Amazon S3 Kafka Redis Grok Geoip Heartbeat Csv Json +50 Input Plugins +40 Filter Plugins +50 Output Plugins

Slide 31

Slide 31 text

Kibana: Data Exploration & Visualization

Slide 32

Slide 32 text

32 Elastic: from Search to Analytics Kibana: Data Exploration & Visualization

Slide 33

Slide 33 text

33 Muito Obrigado! DÚVIDAS? Thiago Souza thiago@elastic.co Elastic Engineer Training - São Paulo - 05/11 a 08/11 https://training.elastic.co/location/SaoPaulo Elastic Certified Engineer https://www.elastic.co/training/certification Elastic Community https://discuss.elastic.co