Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting Started With ElasticSearch

Getting Started With ElasticSearch

A simple overview of elasticsearch, the tire gem, and usage patterns I have discovered.

Ken Collins

March 15, 2013
Tweet

More Decks by Ken Collins

Other Decks in Technology

Transcript

  1. Today’s Topics • About ElasticSearch • Ruby Libraries • How

    You Might Use ElasticSearch • Other Uses 2
  2. About ElasticSearch • Painless Setup & Use • Schema Free

    & Documented Oriented –Index data using JSON over HTTP. 5
  3. About ElasticSearch • Painless Setup & Use • Schema Free

    & Documented Oriented –Index data using JSON over HTTP. 7
  4. About ElasticSearch • Painless Setup & Use • Schema Free

    & Documented Oriented –Index data using JSON over HTTP. –Schema mappings. 7
  5. About ElasticSearch • Painless Setup & Use • Schema Free

    & Documented Oriented –Index data using JSON over HTTP. –Schema mappings. • Build For The Cloud. Multi-Tenant. 7
  6. About ElasticSearch • Painless Setup & Use • Schema Free

    & Documented Oriented –Index data using JSON over HTTP. –Schema mappings. • Build For The Cloud. Multi-Tenant. –Not coupled to a database. 7
  7. About ElasticSearch • Painless Setup & Use • Schema Free

    & Documented Oriented –Index data using JSON over HTTP. –Schema mappings. • Build For The Cloud. Multi-Tenant. –Not coupled to a database. 9
  8. About ElasticSearch • Painless Setup & Use • Schema Free

    & Documented Oriented –Index data using JSON over HTTP. –Schema mappings. • Build For The Cloud. Multi-Tenant. –Not coupled to a database. –Distributed Nodes & Shards 9
  9. About ElasticSearch • Painless Setup & Use • Schema Free

    & Documented Oriented –Index data using JSON over HTTP. –Schema mappings. • Build For The Cloud. Multi-Tenant. –Not coupled to a database. –Distributed Nodes & Shards –Gateway - Time Machine For Search. 9
  10. About ElasticSearch • Great Features –Facets –Highlighting –Geo Location –Custom

    Scripts • Open Source –Apache 2 License –Hosted On Github 10
  11. $ curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{ "user": "metaskills", "post_date": "2013-03-07T13:12:00",

    "message": "Trying out elasticsearch" }' ElasticSearch Guides (Index) 13 $ curl -XPOST http://localhost:9200/twitter/ -d '{ "settings": {"number_of_shards": 10}, "mappings": { "twitter_card": { "_source": {"enabled": false}, "properties": { "title": {"type": "string", "index": "not_analyzed"} } } } }' “1” Left off for auto ID.
  12. ElasticSearch Guides (Mappings) 15 “Mapping is the process of defining

    how a document should be mapped to the Search Engine, including its searchable characteristics such as which fields are searchable and if or how they are tokenized.” $ curl -XPUT http://localhost:9200/twitter/tweet/_mapping -d '{ "tweet": { "properties": { "message": {"type": "string", "store": "yes"} } } }'
  13. “The Brown-Cow's Part_No. #A.BC123-456 [email protected]” ElasticSearch Guides (Analysis) 17 keyword:

    The Brown-Cow's Part_No. #A.BC123-456 [email protected] whitespace: The, Brown-Cow's, Part_No., #A.BC123-456, [email protected] simple: the, brown, cow, s, part, no, a, bc, joe, bloggs, com standard: brown, cow's, part_no, a.bc123, 456, joe, bloggs.com snowball (English): brown, cow, part_no, a.bc123, 456, joe, bloggs.com
  14. ElasticSearch Guides (Analysis) 18 • Analyzers • Standard • Simple

    • Whitespace • Stop • Keyword • Pattern • Language • Snowball • Custom • Tokenizers • Edge NGram • Keyword • Letter • Lowercase • NGram • Standard • Whitespace • Pattern • UAX URL Email • Path Hierarchy • Token Filter • Standard • ASCII Folding • Length • Lowercase • NGram • Edge NGram • Porter Stem • Shingle • Stop • Word Delimiter • Stemmer • Stemmer Ovrd. • Keyword Mkr. • KStem • Snowball • Phonetic • Synonym • Compound Word • Reverse • Elision • Truncate • Unique • Pattern Replace • Trim • Char Filter • Mapping • HTML Strip • Plugin • ICU
  15. ElasticSearch Guides (Analysis) 18 • Analyzers • Standard • Simple

    • Whitespace • Stop • Keyword • Pattern • Language • Snowball • Custom • Tokenizers • Edge NGram • Keyword • Letter • Lowercase • NGram • Standard • Whitespace • Pattern • UAX URL Email • Path Hierarchy • Token Filter • Standard • ASCII Folding • Length • Lowercase • NGram • Edge NGram • Porter Stem • Shingle • Stop • Word Delimiter • Stemmer • Stemmer Ovrd. • Keyword Mkr. • KStem • Snowball • Phonetic • Synonym • Compound Word • Reverse • Elision • Truncate • Unique • Pattern Replace • Trim • Char Filter • Mapping • HTML Strip • Plugin • ICU
  16. ElasticSearch Guides (Analysis) 18 • Analyzers • Standard • Simple

    • Whitespace • Stop • Keyword • Pattern • Language • Snowball • Custom • Tokenizers • Edge NGram • Keyword • Letter • Lowercase • NGram • Standard • Whitespace • Pattern • UAX URL Email • Path Hierarchy • Token Filter • Standard • ASCII Folding • Length • Lowercase • NGram • Edge NGram • Porter Stem • Shingle • Stop • Word Delimiter • Stemmer • Stemmer Ovrd. • Keyword Mkr. • KStem • Snowball • Phonetic • Synonym • Compound Word • Reverse • Elision • Truncate • Unique • Pattern Replace • Trim • Char Filter • Mapping • HTML Strip • Plugin • ICU
  17. ElasticSearch Guides (Query DSL) 21 • Queries • match •

    multi_match • bool • boosting • ids • custom_score • custom_boost_factor • constant_score • dis_max • field • filtered • flt • flt_field • fuzzy • has_child • has_parent • match_all • mlt • mlt_field • prefix • query_string • range • span_first • span_near • span_not • span_or • span_term • term • terms • top_children • wildcard • nested • custom_filters_score • indices • text • geo_shape • Filters • and • bool • exists • ids • limit • type • geo_bbox • geo_distance • geo_distance_range • geo_polygon • geo_shape • has_child • has_parent • match_all • missing • not • numeric_range • or • prefix • query • range • script • term • terms • nested
  18. ElasticSearch Guides (Query DSL) 21 • Queries • match •

    multi_match • bool • boosting • ids • custom_score • custom_boost_factor • constant_score • dis_max • field • filtered • flt • flt_field • fuzzy • has_child • has_parent • match_all • mlt • mlt_field • prefix • query_string • range • span_first • span_near • span_not • span_or • span_term • term • terms • top_children • wildcard • nested • custom_filters_score • indices • text • geo_shape • Filters • and • bool • exists • ids • limit • type • geo_bbox • geo_distance • geo_distance_range • geo_polygon • geo_shape • has_child • has_parent • match_all • missing • not • numeric_range • or • prefix • query • range • script • term • terms • nested
  19. Tire.index 'articles' do delete create :mappings => { :article =>

    { :properties => { :id => { :type => 'string', :index => 'not_analyzed' }, :title => { :type => 'string', :boost => 2.0 }, :tags => { :type => 'string', :analyzer => 'keyword' }, :content => { :type => 'string', :analyzer => 'snowball' } } } } store :title => 'One', :tags => ['ruby'] store :title => 'Two', :tags => ['ruby', 'python'] store :title => 'Three', :tags => ['java'] refresh end Ruby Libraries (Tire) 25
  20. Ruby Libraries (Tire) 27 • DSL To ElasticSearch • Declarative

    Block or Imperative Style • ActiveModel Integration
  21. class Article < ActiveRecord::Base include Tire::Model::Search include Tire::Model::Callbacks mapping do

    indexes :id, :index => :not_analyzed indexes :title, :analyzer => 'snowball', :boost => 100 indexes :content, :analyzer => 'snowball' indexes :content_size, :as => 'content.size' indexes :author, :analyzer => 'keyword' indexes :published_on, :type => 'date', :include_in_all => false end after_save { update_index if state == 'published' } def to_indexed_json attributes.slice(...).to_json end end Ruby Libraries (Tire) 28
  22. Ruby Libraries (Tire) 29 • DSL To ElasticSearch • Declarative

    Block or Imperative Style • ActiveModel Integration
  23. Ruby Libraries (Tire) 29 • DSL To ElasticSearch • Declarative

    Block or Imperative Style • ActiveModel Integration • Contributed Components Gem
  24. Tire.search 'articles' do query { string 'title:T*' } filter :terms,

    :tags => ['ruby'] sort { by :title, 'desc' } facet('global-tags', :global => true) { terms :tags } facet('current-tags') { terms :tags } end Ruby Libraries (Tire) 30
  25. Tire.search 'articles' do query { string 'title:T*' } filter :terms,

    :tags => ['ruby'] sort { by :title, 'desc' } facet('global-tags', :global => true) { terms :tags } facet('current-tags') { terms :tags } end Ruby Libraries (Tire) 30 { "fields": ["name", "shortDescription", "longDescription"], "query": { "query_string": { "fields": ["name"], "query": "+camera +laptop", "use_dis_max": true } } }
  26. Tire.search({ fields: ["name", "shortDescription", "longDescription"], query: { query_string: { fields:

    ["name"], query: "+camera +laptop", use_dis_max: true } } }) Ruby Libraries (Tire) 31
  27. Ruby Libraries (Tire) 32 • DSL To ElasticSearch • Declarative

    Block or Imperative Style • ActiveModel Integration • Contributed Components Gem
  28. Ruby Libraries (Tire) 32 • DSL To ElasticSearch • Declarative

    Block or Imperative Style • ActiveModel Integration • Contributed Components Gem • Tire::Search::Search#to_curl
  29. Account Service Usage Concept (service-ext) 35 /search/organization?query=... /search/email?query=... /search/address?query=... /search?query=...

    /search/full_name?query=... Index ? Index ? Index ? • Cluster • Nodes • Settings • Storage
  30. Account Service Usage Concept (service-ext) 35 /search/organization?query=... /search/email?query=... /search/address?query=... /search?query=...

    /search/full_name?query=... Index ? Index ? Index ? • Cluster • Nodes • Settings • Storage • Shards
  31. Account Service Usage Concept (service-ext) 35 /search/organization?query=... /search/email?query=... /search/address?query=... /search?query=...

    /search/full_name?query=... Index ? Index ? Index ? • Cluster • Nodes • Settings • Storage • Shards • Replicas
  32. Account Service Usage Concept (service-ext) 35 /search/organization?query=... /search/email?query=... /search/address?query=... /search?query=...

    /search/full_name?query=... Index ? Index ? Index ? • Cluster • Nodes • Settings • Storage • Shards • Replicas
  33. #2 App /email/search Usage Concept (add-on) 37 #1 Service #2

    Service #3 Service /search/full_name /search/address /search/text /search/subjects /search/text /search/full_name /search/address #1 App /articles/search /catalog/search
  34. #2 App /email/search Usage Concept (add-on) 38 #1 Service #3

    Service /search/full_name /search/address /search/full_name /search/address
  35. Account Search /full_name /email /address Usage Concept (lateral-biz-need) 42 Service

    #1 App #1 App #2 Service #2 Standard Document Representation
  36. Usage Concept (lateral-biz-need) 43 #2 App #1 Service #2 Service

    #3 Service #1 App $ curl -XGET http://localhost:9200/foo,bar/tweet/_search?q=tag:wow
  37. Usage Concept (lateral-biz-need) 43 #2 App #1 Service #2 Service

    #3 Service #1 App $ curl -XGET http://localhost:9200/test/_msearch --data-binary @requests {"index" : "test"} {"query" : {"match_all" : {}}, "from" : 0, "size" : 10} {"index" : "test", "search_type" : "count"} {"query" : {"match_all" : {}}} {} {"query" : {"match_all" : {}}} {"query" : {"match_all" : {}}} {"search_type" : "count"} {"query" : {"match_all" : {}}}
  38. Usage Concept (lateral-biz-need) 43 #2 App #1 Service #2 Service

    #3 Service #1 App http://www.elasticsearch.org/guide/reference/api/search/indices-types.html http://www.elasticsearch.org/guide/reference/api/multi-search.html
  39. ElasticSearch FTW! 45 #2 Service #2 App #1 App #1

    Service Search Service #1 #3 Service