Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Quick Introduction to Elasticsearch

Quick Introduction to Elasticsearch

Elasticsearch and MIT Sloan Data Analytics Hackathon
Cambridge, MA - May 10, 2014

Igor Motov

May 10, 2014
Tweet

More Decks by Igor Motov

Other Decks in Programming

Transcript

  1. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Elasticsearch and MIT Sloan Data Analytics Hackathon
    Cambridge, MA - May 10, 2014
    Elasticsearch
    Quick Introduction

    View Slide

  2. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    About Me
    • Igor Motov
    • Developer at Elasticsearch Inc.
    • Github: imotov
    • Twitter: @imotov

    View Slide

  3. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    About Elasticsearch Inc.
    • Founded in 2012
    By the people behind the Elasticsearch and Apache Lucene
    http://www.elasticsearch.com
    Headquarters: Amsterdam and Los Altos, CA
    • We provide
    Training (public & onsite)
    Development support
    Production support subscription (SLA)

    View Slide

  4. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    About Elasticsearch
    • Real time search and analytics engine
    JSON-oriented, Apache Lucene-based
    • Automatic Schema Detection
    Enables control of it when needed
    • Distributed
    Scales Up+Out, Highly Available
    • Multi-tenancy
    Dynamically create/delete indices
    • API centric
    Most functionality is exposed through an API

    View Slide

  5. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Basic Concepts
    • Cluster
    a group of nodes sharing the same set of indices
    • Node
    a running Elasticsearch instance (typically JVM process)
    • Index
    a set of documents of possibly different types
    stored in one or more shards
    • Type
    a set of documents in an index that share the same schema
    • Shard
    a Lucene index, allocated on one of the nodes

    View Slide

  6. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Basic Concepts - Document
    • JSON Object
    !
    !
    !
    !
    !
    !
    • Identified by index/type/id
    {
    "rank": 21,
    "city": "Boston",
    "state": "Massachusetts",
    "population2010": 617594,
    "land_area": 48.277,
    "density": 12793,
    "ansi": 619463,
    "location": {
    "lat": 42.332,
    "lon": 71.0202
    },
    "abbreviation": "MA"
    }

    View Slide

  7. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Downloading elasticsearch
    • http://www.elasticsearch.org/download/
    Windows Everything else

    View Slide

  8. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    What’s in a distribution?
    .
    ├── LICENSE.txt
    ├── NOTICE.txt
    ├── README.textile
    ├── bin
    │ ├── elasticsearch
    │ ├── elasticsearch.in.sh
    │ └── plugin
    ├── config
    │ ├── elasticsearch.yml
    │ └── logging.yml
    ├── data
    │ └── elasticsearch
    ├── lib
    │ ├── elasticsearch-x.y.z.jar
    │ ├── ...
    │ └──
    └── logs
    ├── elasticsearch.log
    └── elasticsearch_index_search_slowlog.log
    executable scripts
    node config files
    data storage
    libs
    log files

    View Slide

  9. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Configuration (multicast)
    • Configuration config/elasticsearch.yml
    cluster.name: "elasticsearch-imotov"
    unique

    name

    View Slide

  10. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Configuration (stand-alone)
    • Configuration config/elasticsearch.yml
    cluster.name: "elasticsearch-imotov"
    network.host: "127.0.0.1"
    discovery.zen.ping.multicast.enabled: false
    discovery.zen.ping.unicast.hosts: ["localhost:9300", "localhost:9301", “localhost:9302"]
    unique

    name
    listen only

    on localhost
    disable

    multicast
    search for other

    nodes on localhost

    View Slide

  11. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Starting elasticsearch
    • Foreground
    !
    !
    • Background
    $ bin/elasticsearch
    $ bin/elasticsearch -d

    View Slide

  12. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Is it running?
    {
    "status" : 200,
    "name" : "Kamal",
    "version" : {
    "number" : "1.1.1",
    "build_hash" : "f1585f096d3f3985e73456debdc1a0745f512bbc",
    "build_timestamp" : "2014-04-16T14:27:12Z",
    "build_snapshot" : false,
    "lucene_version" : "4.7"
    },
    "tagline" : "You Know, for Search"
    }
    $ curl -XGET "http://localhost:9200/?pretty"

    View Slide

  13. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Communicating with Elasticsearch
    • REST API
    Curl
    Ruby
    Python
    PHP
    Perl
    JavaScript (community supported)
    • Binary Protocol
    Java

    View Slide

  14. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Pick your client
    • Java
    included in distribution
    • Ruby, PHP, Perl, Python
    http://www.elasticsearch.org/blog/unleash-the-clients-ruby-
    python-php-perl/
    • Everything Else
    http://www.elasticsearch.org/guide/clients/

    View Slide

  15. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Indexing a document
    $ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{
    "rank": 21,
    "city": "Boston",
    "state": "Massachusetts",
    "population2010": 617594,
    "land_area": 48.277,
    "density": 12793,
    "ansi": 619463,
    "location": {
    "lat": 42.332,
    "lon": 71.0202
    },
    "abbreviation": "MA"
    }'
    {"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":1}

    View Slide

  16. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Getting a document
    {
    "_index" : "test-data",
    "_type" : "cities",
    "_id" : "21",
    "_version" : 1,
    "exists" : true, "_source" : {
    "rank": 21,
    "city": "Boston",
    "state": "Massachusetts",
    "population2010": 617594,
    "land_area": 48.277,
    "density": 12793,
    "ansi": 619463,
    "location": {
    "lat": 42.332,
    "lon": 71.0202
    },
    "abbreviation": "MA"
    }
    }
    $ curl -XGET "http://localhost:9200/test-data/cities/21?pretty"

    View Slide

  17. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Updating a document
    $ curl -XPUT "http://localhost:9200/test-data/cities/21" -d '{
    "rank": 21,
    "city": "Boston",
    "state": "Massachusetts",
    "population2010": 617594,
    "population2012": 636479,
    "land_area": 48.277,
    "density": 12793,
    "ansi": 619463,
    "location": {
    "lat": 42.332,
    "lon": 71.0202
    },
    "abbreviation": "MA"
    }'
    {"ok":true,"_index":"test-data","_type":"cities","_id":"21","_version":2}

    View Slide

  18. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Searching
    $ curl -XGET 'http://localhost:9200/test-data/cities/_search?pretty' -d '{
    "query": {
    "match": {
    "city": "Boston"
    }
    }
    }'

    View Slide

  19. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Searching
    {
    "took" : 5,
    "timed_out" : false,
    "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
    },
    "hits" : {
    "total" : 1,
    "max_score" : 6.1357985,
    "hits" : [ {
    "_index" : "test-data",
    "_type" : "cities",
    "_id" : "21",
    "_score" : 6.1357985, "_source" : {"rank":"21","city":"Boston",...}
    } ]
    }
    }

    View Slide

  20. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Range Queries
    $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{
    "query": {
    "range": {
    "population2012": {
    "from": 500000,
    "to": 1000000
    }
    }
    }
    }'

    View Slide

  21. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Boolean Queries
    $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{
    "query": {
    "bool": {
    "should": [{
    "match": { "state": "Texas"}
    }, {
    "match": { "state": "California"}
    }],
    "must": {
    "range": {
    "population2012": {
    "from": 500000,
    "to": 1000000
    }
    }
    },
    "minimum_should_match": 1
    }
    }
    }'

    View Slide

  22. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    MatchAll Query
    $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{
    "query": {
    "match_all": { }
    }
    }'

    View Slide

  23. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Sorting and Paging
    $ curl -XGET "http://localhost:9200/test-data/cities/_search?pretty" -d '{
    "query": {
    "match_all": { }
    },
    "sort": [
    {"state": {"order": "asc"}},
    {"population2010": {"order": "desc"}}
    ],
    "from": 0,
    "size": 20
    }'

    View Slide

  24. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Analysis
    • By default string are
    - Divided into words (tokens)
    - All tokens are converted to lower-case

    View Slide

  25. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Analysis Example
    • “Elasticsearch is a powerful open source search
    and analytics engine.”
    1. elasticsearch
    2. is
    3. a
    4. powerful
    5. open
    6. source
    7. search
    8. and
    9. analytics
    10. engine

    View Slide

  26. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Customizing the mapping
    curl -XPUT 'http://localhost:9200/my_index/' -d '{
    "settings": {
    "index": {
    "number_of_shards": 1,
    "number_of_replicas": 0
    }
    },
    "mappings": {
    "my_type": {
    "properties": {
    "description": { "type": "string" },
    "sku": { "type": "string", "index": "not_analyzed" },
    "count": { "type": "integer" },
    "price": { "type": "float" },
    "location": { "type": "geo_point" }
    }
    }
    }
    }'
    exact

    match
    analyzed

    text
    geo

    location

    View Slide

  27. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Elasticsearch Reference
    • http://www.elasticsearch.org/guide/

    View Slide

  28. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Ideas for hackathon
    • Explore data
    wikipedia
    twitter
    enron emails
    • Play with Kibana
    • Build Elasticsearch plugins
    • Get prizes

    View Slide

  29. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    Elasticsearch Meetup
    http://www.meetup.com/Elasticsearch-Boston/

    View Slide

  30. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission is strictly prohibited
    We are hiring
    http://www.elasticsearch.com/about/jobs/

    View Slide