Slide 1

Slide 1 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Elasticsearch and MIT Sloan Data Analytics Hackathon Cambridge, MA - May 10, 2014 Elasticsearch Quick Introduction

Slide 2

Slide 2 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited About Me • Igor Motov • Developer at Elasticsearch Inc. • Github: imotov • Twitter: @imotov

Slide 3

Slide 3 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited About Elasticsearch Inc. • Founded in 2012 By the people behind the Elasticsearch and Apache Lucene http://www.elasticsearch.com Headquarters: Amsterdam and Los Altos, CA • We provide Training (public & onsite) Development support Production support subscription (SLA)

Slide 4

Slide 4 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited About Elasticsearch • Real time search and analytics engine JSON-oriented, Apache Lucene-based • Automatic Schema Detection Enables control of it when needed • Distributed Scales Up+Out, Highly Available • Multi-tenancy Dynamically create/delete indices • API centric Most functionality is exposed through an API

Slide 5

Slide 5 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Basic Concepts • Cluster a group of nodes sharing the same set of indices • Node a running Elasticsearch instance (typically JVM process) • Index a set of documents of possibly different types stored in one or more shards • Type a set of documents in an index that share the same schema • Shard a Lucene index, allocated on one of the nodes

Slide 6

Slide 6 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Basic Concepts - Document • JSON Object ! ! ! ! ! ! • Identified by index/type/id { "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "land_area": 48.277, "density": 12793, "ansi": 619463, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" }

Slide 7

Slide 7 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Downloading elasticsearch • http://www.elasticsearch.org/download/ Windows Everything else

Slide 8

Slide 8 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited What’s in a distribution? . ├── LICENSE.txt ├── NOTICE.txt ├── README.textile ├── bin │ ├── elasticsearch │ ├── elasticsearch.in.sh │ └── plugin ├── config │ ├── elasticsearch.yml │ └── logging.yml ├── data │ └── elasticsearch ├── lib │ ├── elasticsearch-x.y.z.jar │ ├── ... │ └── └── logs ├── elasticsearch.log └── elasticsearch_index_search_slowlog.log executable scripts node config files data storage libs log files

Slide 9

Slide 9 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Configuration (multicast) • Configuration config/elasticsearch.yml cluster.name: "elasticsearch-imotov" unique name

Slide 10

Slide 10 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Configuration (stand-alone) • Configuration config/elasticsearch.yml cluster.name: "elasticsearch-imotov" network.host: "127.0.0.1" discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["localhost:9300", "localhost:9301", “localhost:9302"] unique name listen only on localhost disable multicast search for other nodes on localhost

Slide 11

Slide 11 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Starting elasticsearch • Foreground ! ! • Background $ bin/elasticsearch $ bin/elasticsearch -d

Slide 12

Slide 12 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Is it running? { "status" : 200, "name" : "Kamal", "version" : { "number" : "1.1.1", "build_hash" : "f1585f096d3f3985e73456debdc1a0745f512bbc", "build_timestamp" : "2014-04-16T14:27:12Z", "build_snapshot" : false, "lucene_version" : "4.7" }, "tagline" : "You Know, for Search" } $ curl -XGET "http://localhost:9200/?pretty"

Slide 13

Slide 13 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Communicating with Elasticsearch • REST API Curl Ruby Python PHP Perl JavaScript (community supported) • Binary Protocol Java

Slide 14

Slide 14 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Pick your client • Java included in distribution • Ruby, PHP, Perl, Python http://www.elasticsearch.org/blog/unleash-the-clients-ruby- python-php-perl/ • Everything Else http://www.elasticsearch.org/guide/clients/

Slide 15

Slide 15 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Indexing a document $ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{ "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "land_area": 48.277, "density": 12793, "ansi": 619463, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" }' {"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":1}

Slide 16

Slide 16 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Getting a document { "_index" : "test-data", "_type" : "cities", "_id" : "21", "_version" : 1, "exists" : true, "_source" : { "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "land_area": 48.277, "density": 12793, "ansi": 619463, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" } } $ curl -XGET "http://localhost:9200/test-data/cities/21?pretty"

Slide 17

Slide 17 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Updating a document $ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{ "rank": 21, "city": "Boston", "state": "Massachusetts", "population2010": 617594, "population2012": 636479, "land_area": 48.277, "density": 12793, "ansi": 619463, "location": { "lat": 42.332, "lon": 71.0202 }, "abbreviation": "MA" }' {"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":2}

Slide 18

Slide 18 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Searching $ curl -XGET 'http://localhost:9200/test-data/cities/_search?pretty' -d '{ "query": { "match": { "city": "Boston" } } }'

Slide 19

Slide 19 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Searching { "took" : 5, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 6.1357985, "hits" : [ { "_index" : "test-data", "_type" : "cities", "_id" : "21", "_score" : 6.1357985, "_source" : {"rank":"21","city":"Boston",...} } ] } }

Slide 20

Slide 20 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Range Queries $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "range": { "population2012": { "from": 500000, "to": 1000000 } } } }'

Slide 21

Slide 21 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Boolean Queries $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "bool": { "should": [{ "match": { "state": "Texas"} }, { "match": { "state": "California"} }], "must": { "range": { "population2012": { "from": 500000, "to": 1000000 } } }, "minimum_should_match": 1 } } }'

Slide 22

Slide 22 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited MatchAll Query $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "match_all": { } } }'

Slide 23

Slide 23 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Sorting and Paging $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{ "query": { "match_all": { } }, "sort": [ {"state": {"order": "asc"}}, {"population2010": {"order": "desc"}} ], "from": 0, "size": 20 }'

Slide 24

Slide 24 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Analysis • By default string are - Divided into words (tokens) - All tokens are converted to lower-case

Slide 25

Slide 25 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Analysis Example • “Elasticsearch is a powerful open source search and analytics engine.” 1. elasticsearch 2. is 3. a 4. powerful 5. open 6. source 7. search 8. and 9. analytics 10. engine

Slide 26

Slide 26 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Customizing the mapping curl -XPUT 'http://localhost:9200/my_index/' -d '{ "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 0 } }, "mappings": { "my_type": { "properties": { "description": { "type": "string" }, "sku": { "type": "string", "index": "not_analyzed" }, "count": { "type": "integer" }, "price": { "type": "float" }, "location": { "type": "geo_point" } } } } }' exact match analyzed text geo location

Slide 27

Slide 27 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Elasticsearch Reference • http://www.elasticsearch.org/guide/

Slide 28

Slide 28 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Ideas for hackathon • Explore data wikipedia twitter enron emails • Play with Kibana • Build Elasticsearch plugins • Get prizes

Slide 29

Slide 29 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited Elasticsearch Meetup http://www.meetup.com/Elasticsearch-Boston/

Slide 30

Slide 30 text

Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited We are hiring http://www.elasticsearch.com/about/jobs/