Slide 1

Slide 1 text

ElasticSearch with Tire @AbookYun, Polydice Inc. 1 Wednesday, February 6, 13

Slide 2

Slide 2 text

It’s all about Search • How does search work? • ElasticSearch • Tire 2 Wednesday, February 6, 13

Slide 3

Slide 3 text

How does search work? A collection of articles • Article.find(1).to_json { title: “One”, content: “The ruby is a pink to blood-red colored gemstone.” } • Article.find(2).to_json { title: “Two”, content: “Ruby is a dynamic, reflective, general-purpose object- oriented programming language.” } • Article.find(3).to_json { title: “Three”, content: “Ruby is a song by English rock band.” } 3 Wednesday, February 6, 13

Slide 4

Slide 4 text

How does search work? How do you search? Article.where(“content like ?”, “%ruby%”) 4 Wednesday, February 6, 13

Slide 5

Slide 5 text

How does search work? The inverted index T0 = “it is what it is” T1 = “what is it” T2 = “it is a banana” “a”: {2} “banana”: {2} “is”: {0, 1, 2} “it”: {0, 1, 2} “what”: {0, 1} A term search for the terms “what”, “is” and “it” {0, 1} ∩ {0, 1} ∩ {0, 1, 2} = {0, 1} 5 Wednesday, February 6, 13

Slide 6

Slide 6 text

How does search work? The inverted index TOKEN ARTICLES ARTICLES ARTICLES ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3 6 Wednesday, February 6, 13

Slide 7

Slide 7 text

How does search work? The inverted index Article.search(“ruby”) Article.search(“ruby”) Article.search(“ruby”) Article.search(“ruby”) ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3 7 Wednesday, February 6, 13

Slide 8

Slide 8 text

How does search work? The inverted index Article.search(“song”) Article.search(“song”) Article.search(“song”) Article.search(“song”) ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3 8 Wednesday, February 6, 13

Slide 9

Slide 9 text

module SimpleSearch def index document, content tokens = analyze content store document, tokens puts "Indexed document #{document} with tokens:", tokens.inspect, "\n" end def analyze content # Split content by words into "tokens" content.split(/\W/). # Downcase every word map { |word| word.downcase }. # Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^\d+/ || word == '' } end def store document_id, tokens tokens.each do |token| ((INDEX[token] ||= []) << document_id).uniq! end end def search token puts "Results for token '#{token}':" INDEX[token].each { |document| " * #{document}" } end INDEX = {} STOPWORDS = %w(a an and are as at but by for if in is it no not of on or that the then there) extend self end 9 Wednesday, February 6, 13

Slide 10

Slide 10 text

SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.” How does search work? Indexing documents 10 Wednesday, February 6, 13

Slide 11

Slide 11 text

SimpleSearch.index “article1”, “Ruby is a language. Java is also a language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.” Indexed document article1 with tokens: [“ruby”, “language”, “java”, “also”, “language”] Indexed document article2 with tokens: [“ruby”, “song”] Indexed document article3 with tokens: [“ruby”, “stone”] Indexed document article4 with tokens: [“java”, “language”] How does search work? Indexing documents 11 Wednesday, February 6, 13

Slide 12

Slide 12 text

print SimpleSearch::INDEX { “ruby” => [“article1”, “article2”, “article3”], “language” => [“article1”, “article4”], “java” => [“article1”, “article4”], “also” => [“article1”], “stone” => [“article3”], “song” => [“article2”] } How does search work? Index 12 Wednesday, February 6, 13

Slide 13

Slide 13 text

SimpleSearch.search “ruby” Results for token ‘ruby’: * article1 * article2 * article3 How does search work? Search the index 13 Wednesday, February 6, 13

Slide 14

Slide 14 text

How does search work? Search is ... Inverted Index { “ruby”: [1,2,3], “language”: [1,4] } + Relevance Scoring • How many matching terms does this document contain? • How frequently does each term appear in all your documents? • ... other complicated algorithms. 14 Wednesday, February 6, 13

Slide 15

Slide 15 text

ElasticSearch ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene. http://github.com/elasticsearch/elasticsearch 15 Wednesday, February 6, 13

Slide 16

Slide 16 text

ElasticSearch Terminology Relational DB ElasticSearch Database Index Table Type Row Document Column Field Schema Mapping Index *Everything SQL query DSL 16 Wednesday, February 6, 13

Slide 17

Slide 17 text

# Add document curl -XPUT ‘http://localhost:9200/articles/article/1’ -d ‘{ “title”: “One” } # Delete document curl -XDELETE ‘http://localhost:9200/articles/article/1’ # Search curl -XGET ‘http://localhost:9200/articles/_search?q=One’ ElasticSearch RESTful 17 Wednesday, February 6, 13

Slide 18

Slide 18 text

# Query curl -XGET ‘http://localhost:9200/articles/article/_search’ -d ‘{ “query”: { “term”: { “title”: “One” } } }’ # Results { “_shards”: { “total”: 5, “success”: 5, “failed”: 0 }, “hits”: { “total”: 1, “hits”: [{ “_index”: “articles”, “_type”: “article”, “_id”: “1”, “_source”: { “title”: “One”, “content”: “Ruby is a pink to blood-red colored gemstone.” } }] } ElasticSearch JSON in / JSON out 18 Wednesday, February 6, 13

Slide 19

Slide 19 text

ElasticSearch Distributed Automatic Discovery Protocol Node 1 Node 2 Node 3 Node 4 Master The discovery module is responsible for discovering nodes within a cluster, as well as electing a master node. The responsibility of the master node is to maintain the global cluster global cluster state, and act if nodes join or leave the cluster by reassigning shards. 19 Wednesday, February 6, 13

Slide 20

Slide 20 text

ElasticSearch Distributed Index A by default, every Index will split into 5 shards and duplicated in 1 replicas. A3 A2 A1 A5 A4 A3’ A2’ A1’ A5’ A4’ Shards Replicas 20 Wednesday, February 6, 13

Slide 21

Slide 21 text

Queries - query_string - term - wildcard - boosting - bool - filtered - fuzzy - range - geo_shape - ... Filters - term - query - range - bool - and - or - not - limit - match_all - ... ElasticSearch Query DSL 21 Wednesday, February 6, 13

Slide 22

Slide 22 text

Queries - query_string - term - wildcard - boosting - bool - filtered - fuzzy - range - geo_shape - ... Filters - term - query - range - bool - and - or - not - limit - match_all - ... ElasticSearch Query DSL With Relevance Without Cache With Cache Without Relevance 22 Wednesday, February 6, 13

Slide 23

Slide 23 text

curl -X DELETE "http://localhost:9200/articles" curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "One", "tags" : ["foo"]}' curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two", "tags" : ["foo", "bar"]}' curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}' curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' ElasticSearch Facets 23 Wednesday, February 6, 13

Slide 24

Slide 24 text

"facets" : { "tags" : { "_type" : "terms", "missing" : 0, "total": 5, "other": 0, "terms" : [ { "term" : "foo", "count" : 2 }, { "term" : "bar", "count" : 2 }, { "term" : "baz", "count" : 1 } ] } ElasticSearch Facets 24 Wednesday, February 6, 13

Slide 25

Slide 25 text

curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d ' { "article": { "properties": { "tags": { "type": "string", "analyzer": "keyword" }, "title": { "type": "string", "analyzer": "snowball", "boost": 10.0 }, "content": { "type": "string", "analyzer": "snowball" } } } }' curl -XGET 'http://localhost:9200/articles/article/_mapping' ElasticSearch Mapping 25 Wednesday, February 6, 13

Slide 26

Slide 26 text

curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d ' { “article”: { “properties”: { “title”: { “type”: “string”, “analyzer”: “trigrams” } } } }’ curl -XPUT ‘localhost:9200/articles/article -d ‘{ “title”: “cupertino” }’ ElasticSearch Analyzer C C n o i u p e r t u p u p e p e r . . . 26 Wednesday, February 6, 13

Slide 27

Slide 27 text

Tire A rich Ruby API and DSL for the ElasticSearch search engine. http://github.com/karmi/tire/ 27 Wednesday, February 6, 13

Slide 28

Slide 28 text

Tire ActiveRecord Integration # New rails application $ rails new searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb # Callback class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks end # Create a article Article.create :title => "I Love Elasticsearch", :content => "...", :author => "Captain Nemo", :published_on => Time.now # Search Article.search do query { string 'love' } facet('timeline') { date :published_on, :interval => 'month' } sort { by :published_on, 'desc' } end 28 Wednesday, February 6, 13

Slide 29

Slide 29 text

Tire ActiveRecord Integration class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks # Setting settings :number_of_shards => 3, :number_of_replicas => 2, :analysis => { :analyzer => { :url_analyzer => { ‘tokenizer’ => ‘lowercase’, ‘filter’ => [‘stop’, ‘url_ngram’] } } } # Mapping mapping do indexes :title, :analyzer => :not_analyzer, :boost => 100 indexes :content, :analyzer => ‘snowball’ end end 29 Wednesday, February 6, 13

Slide 30

Slide 30 text

Reference # github http://github.com/elasticsearch/elasticsearch http://github.com/karmi/tire/ # Slides https://speakerdeck.com/kimchy/the-road-to-a-distributed-search-engine https://speakerdeck.com/karmi/elasticsearch-your-data-your-search-euruko-2011 https://speakerdeck.com/clintongormley/to-infinity-and-beyond 30 Wednesday, February 6, 13

Slide 31

Slide 31 text

Thanks 31 Wednesday, February 6, 13