$>whoami Pere Urbon-Bayes (Software Engineer since ever) Been working always with Databases, Data and Analytics. GraphDevRoom@FOSDEM When not coding I enjoy my time with my wife and kid, I’m also on movies and tv series, use to like running, basically doing everything to enjoy live.
What is ElasticSearch? • Document oriented (search/store) engine • Realtime (near) analytics • Schema free • Distributed • Multitenant • There is an API for nearly everything
Important terms A cluster consists of one or more nodes which share the same cluster name. Each cluster has a single master node which is chosen automatically by the cluster and which can be replaced if the current master node fails. A node is a running instance of elasticsearch which belongs to a cluster. Multiple nodes can be started on a single server for testing purposes, but usually you should have one node per server. At startup, a node will use unicast (or multicast, if specified) to discover an existing cluster with the same cluster name and will try to join that cluster.
Important terms Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has 5 primary shards. You can specify fewer or more primary shards to scale the number of documents that your index can handle. Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes: increase failover and performance.
What can you do? • Unstructured Search • Get all the articles that contain the words Berlin and Beer. • Structured Search • Get all the requests with status 404. • Analytics • Get the average travel time. • Combinations of the previous.
Getting started and some tips • Configuration under: conf/elasticsearch.yml • IMPORTANT: adjust the cluster.name or disable multicast!. • To start the service: bin/elasticsearch [-d] • Please don’t use more than 32Gb per node! • http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance- enhancements-7.html • No more than 50% of available RAM • Use SSD, the cost is worth it
The Index API Used to manage indices, the settings, aliases, mappings, templates and warmers. $ curl -XPUT 'http://localhost:9200/twitter/' -d '{ "settings" : { "index" : { "number_of_shards" : 3, "number_of_replicas" : 2 } } }' PUT an index:
The Mapping API Is like a schema definition in a relational database. Each index has a mapping, which defines each type within the index, plus a number of index-wide settings. $ curl -XPUT 'http://localhost:9200/twitter/_mapping/tweet' -d ' { "tweet" : { "properties" : { "message" : {"type" : "string", "store" : true } } } } ' PUT a mapping:
The Percolator Traditionally we know our data, then create an schema and use a database as storage so we can query it later on. Same here with Elasticsearch. However the percolator works the other way around, first you store queries into an index, and then via the api you check if a document match one of this queries. Very handy for example for alerting systems, where you define a query for each alerts, and then check the events agains it.
We can also have conditionals! output { if [action] == “alert” { pagerdutty {} } } Including the classical: keywords: IF, ELSE IF, ELSE. operators: and, or, nand, xor and !. variables…
Behind Kibana Kibana is an open source (Apache licence), analytics and search dashboard for ElasticSearch, snap to setup and start using it. Democratise the access to your data, empowering more team members to make practical use of it. Seamless integration with Logstash, Apache Flume, Fluentd among others.