ElasticSearch: The Missing Intro

Slide 1

Slide 1 text

lastic the missing tutorial lastic Erik Rose & Laura Thomson Mozilla earch earch

Slide 2

Slide 2 text

lastic the missing tutorial lastic Erik Rose & Laura Thomson Mozilla earch earch

Slide 3

Slide 3 text

housekeeping • Make sure ES is installed. If you haven’t installed it yet and you’re on a Mac, just install 1.1.x. • Exercise code: clone the git repo at (or just visit)  https://github.com/erikrose/oscon-elasticsearch/ • Make faces.

Slide 4

Slide 4 text

• Full-text search • Big data • Faceting • Geographical queries what it’s good for

Slide 5

Slide 5 text

Shay Banon, Heavy Lifter

Slide 6

Slide 6 text

the rest of us ?

Slide 7

Slide 7 text

characteristics

Slide 8

Slide 8 text

• Elasticsearch wraps Lucene. • Read/write/admin via REST • Native format is JSON (vs XML). lucene++ JSON HTTP on port 9200

Slide 9

Slide 9 text

• CAP: consistency, availability, partition tolerance • “pick any two” ! • “When it comes to CAP, in a very high level, elasticsearch gives up on partition tolerance” (2010) CAP

Slide 10

Slide 10 text

• …it’s not that simple ! • Consistency is mostly eventual. • Availability is variable. • Partition tolerant it’s not. ! • Read http://aphyr.com/posts/317-call-me-maybe-elasticsearch (and despair). CAP

Slide 11

Slide 11 text

• Generally not suitable as a primary data store. • It’s a distributed search engine ! • Easy to get started • Easy to integrate with your existing web app • Easy to conﬁgure it not-too-terribly • Enables fast search with cool features what it’s good for, redux

Slide 12

Slide 12 text

deﬁnitions

Slide 13

Slide 13 text

• node — a machine in your cluster • cluster — the set of nodes running ES • master node — Elected by the cluster. If the master fails, another node will take over. nodes and clusters

Slide 14

Slide 14 text

• shard — A Lucene index. Each piece of data you store is written to a primary shard. Primary shards are distributed over the cluster. ! • replica — Each shard has a set of distributed replicas (copies). Data written to a primary shard is copied to replicas on different nodes. shards and replicas

Slide 15

Slide 15 text

self-defense

Slide 16

Slide 16 text

# Unicast discovery allows to explicitly control which nodes will be used # to discover the cluster. It can be used when multicast is not present, # or to restrict the cluster communication-wise. # # 1. Disable multicast discovery (enabled by default): # discovery.zen.ping.multicast.enabled: false exercise: ﬁx clustering and listening # Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens # on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node # communication. (the range means that if the port is busy, it will automatically # try the next port). ! # Set the bind address specifically (IPv4 or IPv6): # network.bind_host: 127.0.0.1

Slide 17

Slide 17 text

% cd elasticsearch-1.2.2 ! % bin/elasticsearch ! # On the Mac: % JAVA_HOME=$(/usr/libexec/java_home -v 1.7) bin/elasticsearch exercise: start up and check % curl -s -XGET 'http://127.0.0.1:9200/_cluster/health?pretty' { "cluster_name" : "grinchertoo", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 19, "active_shards" : 19, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 13 }

Slide 18

Slide 18 text

exercise: tool up curl

Slide 19

Slide 19 text

exercise: tool up BBEdit’s shell worksheets:  http://pine.barebones.com/ﬁles/BBEdit_10.5.11.dmg

Slide 20

Slide 20 text

exercise: tool up Marvel/Sense: http://www.elasticsearch.org/overview/marvel/download/

Slide 21

Slide 21 text

data structure basics

Slide 22

Slide 22 text

index doctype another doctype {… }

Slide 23

Slide 23 text

curl -s -XPUT 'http://localhost:9200/test/' exercise: make an index

Slide 24

Slide 24 text

IDs 6a8ca01c-7896-48e9-! 81cc-9f70661fcb32

Slide 25

Slide 25 text

# Make a doc:  curl -s XPUT 'http://127.0.0.1:9200/test/book/1' -d '{  "title": "All About Fish",  "author": "Fishy McFishstein",  "pages": 3015  }' ! # Make sure it's there:  curl -s -XGET 'http://127.0.0.1:9200/test/book/1?pretty' {  "_index" : "test",  "_type" : "book",  "_id" : "1",  "_version" : 2,  "found" : true,  "_source" : {  "title": "All About Fish",  "author": "Fishy McFishstein",  "pages": 3015  }  } exercise: make a doc

Slide 26

Slide 26 text

# Delete the doc: curl -s -XDELETE 'http://localhost:9200/test/book/1' exercise: make a doc

Slide 27