Elasticsearch for Centralized Logging, Fulltext Search, and NoSQL

Elasticsearch for Centralized Logging, Fulltext Search, and NoSQL [email protected]

What is Elasticsearch? • search server based on Lucene •
schemaless document store (JSON in, JSON out) • RESTful HTTP interface on :9200 • full-text search engine • geographical search of many kinds • faceting engine NoSQL database • developed in Java, Apache License 2.0

Starting with concepts and terms, modeling data*** Elasticsearch Relational model
Index** Database Document type Schema for a table Document Row in table, like a hash table Field Column Document ID / _id Primary key Filter, Query SQL Select Shard, Replica Partitioned table, Replication? ** index is also used as a verb, to index a document. This is equivalent to an INSERT OR UPDATE statement in an RDBMS. *** Strings, numerics, geographical coordinates, attachments, arrays, subdocuments, nested docs

Recommended architecture for ES Reserve memory, about 50% of available
RAM (not all as JVM heap, either) Pin pages in memory to avoid swapping with Mlockall Create separate nodes for roles like master, data, query (HTTP load balancing) Tune settings for quorum & recovery (e.g. minimum master nodes) Pay attention to document locality (route to same shard) Monitor memory usage or face catastrophic failure Secure access to the http interface (and port :9300, the transport client!) Scale by common patterns like per user or per time period vs

Starting with full text search • Analysis: Breaking apart data
to make it more searchable • Faceting: representing the data in more than one way (_source, exact, tokenized) • Two major types of search (usually combined for speed): • Filters (Fast, cacheable) with boolean results - [“type”: “string”, "index”: “not_analyzed”] • Queries (Slow, not cacheable) with fuzzy scoring Example of a the ‘snowball’ analyzer: GET /_analyze?analyzer=snowball&text=gator%20linux% 20users%20group%20is%20awesome Output: gator, linux, user, group, awesom

Example: Creating an index $ curl -XPUT 'http://localhost:9200/twitter/' $ curl
-XPUT 'http://localhost:9200/twitter/' -d ' index : number_of_shards : 3 number_of_replicas : 2 ' (That’s right, YAML. This is also where mappings might go.)

Example: Indexing a document (could be create or update) $
curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{ "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" }' Output: { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_version" : 1, "created" : true }

Example: Fetch a document, delete is almost identical (- XDELETE)
$ curl -XGET 'http://localhost:9200/twitter/tweet/1' { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_version" : 1, "found": true, "_source" : { "user" : "kimchy", "postDate" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" } }

Example: mapping example { "product": { "properties": { "ProductId": {
"type": "string", "index": "not_analyzed" }, "ProductEnabled": { "type": "boolean" }, "PiecesIncluded": { "type": "long" }, "LastModified": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss. SSS" }, "AvailableInventory": { "type": "float" }, "Price": { "type": "float" }, "LongDescription": { "type": "string", "include_in_all" : true }, "ProductName" : { "type" : "multi_field", "include_in_all" : true, "fields" : { "ProductName": { "type": "string", "index": "not_analyzed" }, "lowercase": { "type": "string", "analyzer": "lowercase_analyzer" }, "suggest" : { "type": "string", "analyzer": "suggest_analyzer" } } } } } }

Example: Search with Query as query string For example, we
can search on all documents across all types within the twitter index: $ curl -XGET 'http://localhost:9200/twitter/_search?q=user:kimchy' We can also search within specific types: $ curl -XGET 'http://localhost:9200/twitter/tweet,user/_search? q=user:kimchy' We can also search all tweets with a certain tag across several indices (for example, when each user has his own index): $ curl -XGET 'http://localhost:9200/kimchy, elasticsearch/tweet/_search?q=tag:wow' Or we can search all tweets across all available indices using _all placeholder: $ curl - XGET 'http://localhost:9200/_all/tweet/_search?q=tag:wow' Or even search across all indices and all types: $ curl -XGET 'http://localhost:9200/_search?q=tag:wow' www.elasticsearch.org/guide/en/elasticsearch/reference/current/search- search.html

What else does it get used for? • NoSQL databases:
if you can deal without a few features • Logs and Statistics: projects like logstash and kibana, github uses it for exception tracking • Fast Visualizations: really anything where you want realtime filtering • make anything searchable… The ‘unique holy triangle,’ in a single product: “data exploration capabilities, unstructured search, structured search, and aggregations or analytics.“ It can even accept a query first, and notify you of new search results once new documents are indexed.

Living in the trenches with ES • Project maturity: there
has been some criticism of the documentation’s ‘findability’ and new tools and libraries are still emerging every week • Schemas still matter: eventually, customers will want to do custom ‘mappings’ and use custom routing to drive data to particular shards, use aliases for those shards • Typical Java tuning: garbage collection, locking/threading, I/O, and algorithm issues • Make good choices: 1 index with 50 shards should perform the same as 50 indices with 1 shard, but the common patterns are index per user or index per time unit.

What is Logstash? • collect, process, and forward events (logs)
• JSON-based configuration • Inputs > Codecs > Filters > Outputs • Inputs for files, sockets, syslog, irc, gelf, irc, twitter, graphite, heroku, imap, jmx • Codecs for json, collectd, graphite, plain • Filters for dates, location, urldecode, mutate, grok, geoip • Ouputs for Elasticsearch, MQs, Nagios, Databases, and many, many more

Most basic logstash config input { stdin { } }
input { tcp { host => localhost port => 1234 } } filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } } output { elasticsearch { host => "192.168.10.1" } stdout { codec => rubydebug } }

Logstash magic - Elasticsearch output, transport proto. - Template for
how to index, defaults to ‘logstash-<date>` - Combine with curator to tend time series data, clean up by date or size - Alternatives: logstash-forwarder with lumberjack

What is Kibana? • browser based analytics and search interface
for Elasticsearch that was developed primarily to view Logstash event data • Understands timeseries and compares time series • Dashboards and charts with drill-down functionality • Kibana 3 talks directly to ES, Kibana 4 (Beta) proxies Examples here, here, and here

Live demo!!

How can you learn more? • Try tutorials: see links
below. • Plug it in: with projects like Drupal, Magento, WordPress • make anything searchable… libraries for Python, Ruby, Java, Scala, Node... http://joelabrahamsson.com/elasticsearch-101/ http://people.mozilla.org/~wkahngreene/elastic/index.html http://www.elasticsearch.org/guide/ https://www.found.no/foundation/elasticsearch-from-the-bottom-up/ https://www.youtube.com/watch?v=lWKEphKIG8U http://www.slideshare.net/aszegedi/everything-i-ever-learned-about-jvm-performance-tuning-twitter Questions?

Elasticsearch for Centralized Logging, Fulltext...

Elasticsearch for Centralized Logging, Fulltext Search, and NoSQL

Martin Smith

More Decks by Martin Smith

Featured

Transcript

Elasticsearch for Centralized Logging, Fulltext Search, and NoSQL [email protected]

What is Elasticsearch? • search server based on Lucene •

Starting with concepts and terms, modeling data*** Elasticsearch Relational model

Recommended architecture for ES Reserve memory, about 50% of available

Starting with full text search • Analysis: Breaking apart data

Example: Creating an index $ curl -XPUT 'http://localhost:9200/twitter/' $ curl

Example: Indexing a document (could be create or update) $

Example: Fetch a document, delete is almost identical (- XDELETE)

Example: mapping example { "product": { "properties": { "ProductId": {

Example: Search with Query as query string For example, we

What else does it get used for? • NoSQL databases:

Living in the trenches with ES • Project maturity: there

What is Logstash? • collect, process, and forward events (logs)

Most basic logstash config input { stdin { } }

Logstash magic - Elasticsearch output, transport proto. - Template for

What is Kibana? • browser based analytics and search interface

Live demo!!

How can you learn more? • Try tutorials: see links