Elasticsearch: The Missing Tutorial

Slide 1

Slide 1 text

lastic the missing tutorial lastic Erik Rose (@ErikRose) Laura Thomson (@lxt) Mozilla earch earch

Slide 2

Slide 2 text

lastic the missing tutorial lastic Erik Rose (@ErikRose) Laura Thomson (@lxt) Mozilla earch earch

Slide 3

Slide 3 text

housekeeping

Slide 4

Slide 4 text

housekeeping • Make sure ES is installed. If you haven’t installed it yet and you’re on a Mac, just install 1.1.x.

Slide 5

Slide 5 text

housekeeping • Make sure ES is installed. If you haven’t installed it yet and you’re on a Mac, just install 1.1.x. • Exercise code: clone the git repo at (or just visit) https://github.com/erikrose/oscon-elasticsearch/

Slide 6

Slide 6 text

Slide 7

Slide 7 text

what it’s good for

Slide 8

Slide 8 text

• Full-text search what it’s good for

Slide 9

Slide 9 text

• Full-text search • Big data what it’s good for

Slide 10

Slide 10 text

• Full-text search • Big data • Faceting what it’s good for

Slide 11

Slide 11 text

• Full-text search • Big data • Faceting • Geographical queries what it’s good for

Slide 12

Slide 12 text

Shay Banon, Heavy Lifter

Slide 13

Slide 13 text

the rest of us ?

Slide 14

Slide 14 text

the rest of us ?

Slide 15

Slide 15 text

characteristics

Slide 16

Slide 16 text

• Elasticsearch wraps Lucene. • Read/write/admin via REST • Native format is JSON (vs XML). lucene++ JSON HTTP on port 9200

Slide 17

Slide 17 text

• CAP: consistency, availability, partition tolerance • “pick any two” • “When it comes to CAP, in a very high level, elasticsearch gives up on partition tolerance” (2010) CAP

Slide 18

Slide 18 text

• …it’s not that simple • Consistency is mostly eventual. • Availability is variable. • Partition tolerant it’s not. • Read http://aphyr.com/posts/317-call-me-maybe-elasticsearch (and despair). CAP

Slide 19

Slide 19 text

• Generally not suitable as a primary data store. • It’s a distributed search engine • Easy to get started • Easy to integrate with your existing web app • Easy to conﬁgure it not-too-terribly • Enables fast search with cool features what it’s good for, redux

Slide 20

Slide 20 text

deﬁnitions

Slide 21

Slide 21 text

• node — a machine in your cluster • cluster — the set of nodes running ES • master node — Elected by the cluster. If the master fails, another node will take over. nodes and clusters

Slide 22

Slide 22 text

• shard — A Lucene index. Each piece of data you store is written to a primary shard. Primary shards are distributed over the cluster. • replica — Each shard has a set of distributed replicas (copies). Data written to a primary shard is copied to replicas on different nodes. shards and replicas

Slide 23

Slide 23 text

self-defense

Slide 24

Slide 24 text

# Unicast discovery allows to explicitly control which nodes will be used # to discover the cluster. It can be used when multicast is not present, # or to restrict the cluster communication-wise. # # 1. Disable multicast discovery (enabled by default): # discovery.zen.ping.multicast.enabled: false exercise: ﬁx clustering and listening # Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens # on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node # communication. (the range means that if the port is busy, it will automatically # try the next port). # Set the bind address specifically (IPv4 or IPv6): # network.bind_host: 127.0.0.1

Slide 25

Slide 25 text

% cd elasticsearch-1.2.2 % bin/elasticsearch # On the Mac: % JAVA_HOME=$(/usr/libexec/java_home -v 1.7) bin/elasticsearch exercise: start up and check % curl -s -XGET 'http://127.0.0.1:9200/_cluster/health?pretty' { "cluster_name" : "grinchertoo", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 19, "active_shards" : 19, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 13 }

Slide 26

Slide 26 text

exercise: tool up curl

Slide 27

Slide 27 text

exercise: tool up BBEdit’s shell worksheets: http://pine.barebones.com/ﬁles/BBEdit_10.5.11.dmg

Slide 28

Slide 28 text

exercise: tool up Marvel/Sense: http://www.elasticsearch.org/overview/marvel/download/

Slide 29

Slide 29 text

exercise: tool up Marvel/Sense: http://www.elasticsearch.org/overview/marvel/download/

Slide 30

Slide 30 text

data structure basics

Slide 31

Slide 31 text

index

Slide 32

Slide 32 text

index doctype

Slide 33

Slide 33 text

index doctype

Slide 34

Slide 34 text

index doctype {… }

Slide 35

Slide 35 text

index doctype another doctype {… }

Slide 36

Slide 36 text

index doctype another doctype {… }

Slide 37

Slide 37 text

curl -s -XPUT 'http://localhost:9200/test/' exercise: make an index

Slide 38

Slide 38 text

IDs

Slide 39

Slide 39 text

IDs 6a8ca01c-7896-48e9- 81cc-9f70661fcb32

Slide 40

Slide 40 text

exercise: make a doc

Slide 41

Slide 41 text

# Make a doc: curl -s XPUT 'http://127.0.0.1:9200/test/book/1' -d '{ "title": "All About Fish", "author": "Fishy McFishstein", "pages": 3015 }' exercise: make a doc

Slide 42

Slide 42 text

Slide 43

Slide 43 text

# Make a doc: curl -s XPUT 'http://127.0.0.1:9200/test/book/1' -d '{ "title": "All About Fish", "author": "Fishy McFishstein", "pages": 3015 }' # Make sure it's there: curl -s -XGET 'http://127.0.0.1:9200/test/book/1?pretty' { "_index" : "test", "_type" : "book", "_id" : "1", "_version" : 2, "found" : true, "_source" : { "title": "All About Fish", "author": "Fishy McFishstein", "pages": 3015 } } exercise: make a doc

Slide 44

Slide 44 text

# Delete the doc: curl -s -XDELETE 'http://localhost:9200/test/book/1' exercise: make a doc

Slide 45