ElasticSearch with Tire

19c4a0b3d480fc14181f6818568fe32f?s=47 David Yun
February 06, 2013

ElasticSearch with Tire

Brief Introduction of ElasticSearch and Tire.

19c4a0b3d480fc14181f6818568fe32f?s=128

David Yun

February 06, 2013
Tweet

Transcript

  1. ElasticSearch with Tire @AbookYun, Polydice Inc. 1 Wednesday, February 6,

    13
  2. It’s all about Search • How does search work? •

    ElasticSearch • Tire 2 Wednesday, February 6, 13
  3. How does search work? A collection of articles • Article.find(1).to_json

    { title: “One”, content: “The ruby is a pink to blood-red colored gemstone.” } • Article.find(2).to_json { title: “Two”, content: “Ruby is a dynamic, reflective, general-purpose object- oriented programming language.” } • Article.find(3).to_json { title: “Three”, content: “Ruby is a song by English rock band.” } 3 Wednesday, February 6, 13
  4. How does search work? How do you search? Article.where(“content like

    ?”, “%ruby%”) 4 Wednesday, February 6, 13
  5. How does search work? The inverted index T0 = “it

    is what it is” T1 = “what is it” T2 = “it is a banana” “a”: {2} “banana”: {2} “is”: {0, 1, 2} “it”: {0, 1, 2} “what”: {0, 1} A term search for the terms “what”, “is” and “it” {0, 1} ∩ {0, 1} ∩ {0, 1, 2} = {0, 1} 5 Wednesday, February 6, 13
  6. How does search work? The inverted index TOKEN ARTICLES ARTICLES

    ARTICLES ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3 6 Wednesday, February 6, 13
  7. How does search work? The inverted index Article.search(“ruby”) Article.search(“ruby”) Article.search(“ruby”)

    Article.search(“ruby”) ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3 7 Wednesday, February 6, 13
  8. How does search work? The inverted index Article.search(“song”) Article.search(“song”) Article.search(“song”)

    Article.search(“song”) ruby article_1 article_2 article_3 pink article_1 gemstone article_1 dynamic article_2 reflective article_2 programming article_2 song article_3 english article_3 rock article_3 8 Wednesday, February 6, 13
  9. module SimpleSearch def index document, content tokens = analyze content

    store document, tokens puts "Indexed document #{document} with tokens:", tokens.inspect, "\n" end def analyze content # Split content by words into "tokens" content.split(/\W/). # Downcase every word map { |word| word.downcase }. # Reject stop words, digits and whitespace reject { |word| STOPWORDS.include?(word) || word =~ /^\d+/ || word == '' } end def store document_id, tokens tokens.each do |token| ((INDEX[token] ||= []) << document_id).uniq! end end def search token puts "Results for token '#{token}':" INDEX[token].each { |document| " * #{document}" } end INDEX = {} STOPWORDS = %w(a an and are as at but by for if in is it no not of on or that the then there) extend self end 9 Wednesday, February 6, 13
  10. SimpleSearch.index “article1”, “Ruby is a language. Java is also a

    language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.” How does search work? Indexing documents 10 Wednesday, February 6, 13
  11. SimpleSearch.index “article1”, “Ruby is a language. Java is also a

    language.” SimpleSearch.index “article2”, “Ruby is a song.” SimpleSearch.index “article3”, “Ruby is a stone.” SimpleSearch.index “article4”, “Java is a language.” Indexed document article1 with tokens: [“ruby”, “language”, “java”, “also”, “language”] Indexed document article2 with tokens: [“ruby”, “song”] Indexed document article3 with tokens: [“ruby”, “stone”] Indexed document article4 with tokens: [“java”, “language”] How does search work? Indexing documents 11 Wednesday, February 6, 13
  12. print SimpleSearch::INDEX { “ruby” => [“article1”, “article2”, “article3”], “language” =>

    [“article1”, “article4”], “java” => [“article1”, “article4”], “also” => [“article1”], “stone” => [“article3”], “song” => [“article2”] } How does search work? Index 12 Wednesday, February 6, 13
  13. SimpleSearch.search “ruby” Results for token ‘ruby’: * article1 * article2

    * article3 How does search work? Search the index 13 Wednesday, February 6, 13
  14. How does search work? Search is ... Inverted Index {

    “ruby”: [1,2,3], “language”: [1,4] } + Relevance Scoring • How many matching terms does this document contain? • How frequently does each term appear in all your documents? • ... other complicated algorithms. 14 Wednesday, February 6, 13
  15. ElasticSearch ElasticSearch is an Open Source (Apache 2), Distributed, RESTful,

    Search Engine built on top of Apache Lucene. http://github.com/elasticsearch/elasticsearch 15 Wednesday, February 6, 13
  16. ElasticSearch Terminology Relational DB ElasticSearch Database Index Table Type Row

    Document Column Field Schema Mapping Index *Everything SQL query DSL 16 Wednesday, February 6, 13
  17. # Add document curl -XPUT ‘http://localhost:9200/articles/article/1’ -d ‘{ “title”: “One”

    } # Delete document curl -XDELETE ‘http://localhost:9200/articles/article/1’ # Search curl -XGET ‘http://localhost:9200/articles/_search?q=One’ ElasticSearch RESTful 17 Wednesday, February 6, 13
  18. # Query curl -XGET ‘http://localhost:9200/articles/article/_search’ -d ‘{ “query”: { “term”:

    { “title”: “One” } } }’ # Results { “_shards”: { “total”: 5, “success”: 5, “failed”: 0 }, “hits”: { “total”: 1, “hits”: [{ “_index”: “articles”, “_type”: “article”, “_id”: “1”, “_source”: { “title”: “One”, “content”: “Ruby is a pink to blood-red colored gemstone.” } }] } ElasticSearch JSON in / JSON out 18 Wednesday, February 6, 13
  19. ElasticSearch Distributed Automatic Discovery Protocol Node 1 Node 2 Node

    3 Node 4 Master The discovery module is responsible for discovering nodes within a cluster, as well as electing a master node. The responsibility of the master node is to maintain the global cluster global cluster state, and act if nodes join or leave the cluster by reassigning shards. 19 Wednesday, February 6, 13
  20. ElasticSearch Distributed Index A by default, every Index will split

    into 5 shards and duplicated in 1 replicas. A3 A2 A1 A5 A4 A3’ A2’ A1’ A5’ A4’ Shards Replicas 20 Wednesday, February 6, 13
  21. Queries - query_string - term - wildcard - boosting -

    bool - filtered - fuzzy - range - geo_shape - ... Filters - term - query - range - bool - and - or - not - limit - match_all - ... ElasticSearch Query DSL 21 Wednesday, February 6, 13
  22. Queries - query_string - term - wildcard - boosting -

    bool - filtered - fuzzy - range - geo_shape - ... Filters - term - query - range - bool - and - or - not - limit - match_all - ... ElasticSearch Query DSL With Relevance Without Cache With Cache Without Relevance 22 Wednesday, February 6, 13
  23. curl -X DELETE "http://localhost:9200/articles" curl -X POST "http://localhost:9200/articles/article" -d '{"title"

    : "One", "tags" : ["foo"]}' curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two", "tags" : ["foo", "bar"]}' curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}' curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "T*"} }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' ElasticSearch Facets 23 Wednesday, February 6, 13
  24. "facets" : { "tags" : { "_type" : "terms", "missing"

    : 0, "total": 5, "other": 0, "terms" : [ { "term" : "foo", "count" : 2 }, { "term" : "bar", "count" : 2 }, { "term" : "baz", "count" : 1 } ] } ElasticSearch Facets 24 Wednesday, February 6, 13
  25. curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d ' { "article": { "properties": {

    "tags": { "type": "string", "analyzer": "keyword" }, "title": { "type": "string", "analyzer": "snowball", "boost": 10.0 }, "content": { "type": "string", "analyzer": "snowball" } } } }' curl -XGET 'http://localhost:9200/articles/article/_mapping' ElasticSearch Mapping 25 Wednesday, February 6, 13
  26. curl -XPUT 'http://localhost:9200/articles/article/_mapping' -d ' { “article”: { “properties”: {

    “title”: { “type”: “string”, “analyzer”: “trigrams” } } } }’ curl -XPUT ‘localhost:9200/articles/article -d ‘{ “title”: “cupertino” }’ ElasticSearch Analyzer C C n o i u p e r t u p u p e p e r . . . 26 Wednesday, February 6, 13
  27. Tire A rich Ruby API and DSL for the ElasticSearch

    search engine. http://github.com/karmi/tire/ 27 Wednesday, February 6, 13
  28. Tire ActiveRecord Integration # New rails application $ rails new

    searchapp -m https://raw.github.com/karmi/tire/master/examples/rails-application-template.rb # Callback class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks end # Create a article Article.create :title => "I Love Elasticsearch", :content => "...", :author => "Captain Nemo", :published_on => Time.now # Search Article.search do query { string 'love' } facet('timeline') { date :published_on, :interval => 'month' } sort { by :published_on, 'desc' } end 28 Wednesday, February 6, 13
  29. Tire ActiveRecord Integration class Article < ActiveRecord::Base include Tire::Model::Search include

    Tire::Model::Callbacks # Setting settings :number_of_shards => 3, :number_of_replicas => 2, :analysis => { :analyzer => { :url_analyzer => { ‘tokenizer’ => ‘lowercase’, ‘filter’ => [‘stop’, ‘url_ngram’] } } } # Mapping mapping do indexes :title, :analyzer => :not_analyzer, :boost => 100 indexes :content, :analyzer => ‘snowball’ end end 29 Wednesday, February 6, 13
  30. Reference # github http://github.com/elasticsearch/elasticsearch http://github.com/karmi/tire/ # Slides https://speakerdeck.com/kimchy/the-road-to-a-distributed-search-engine https://speakerdeck.com/karmi/elasticsearch-your-data-your-search-euruko-2011 https://speakerdeck.com/clintongormley/to-infinity-and-beyond

    30 Wednesday, February 6, 13
  31. Thanks 31 Wednesday, February 6, 13