Using Elasticsearch with Rails Applications

Slide 1

Slide 1 text

Using Elasticsearch with Rails Applications Brian Gugliemetti

Slide 2

Slide 2 text

Goals •Teach basic Elasticsearch terms & concepts •How to use Elasticsearch with Rails •Queries & Filters •How to manage Elasticsearch

Slide 3

Slide 3 text

What is Elasticsearch? •Open source distributed RESTful search engine built on top of Apache Lucene •Provides clustering, failover, and auto-discovery •Schema-less document store •JSON Document-based

Slide 4

Slide 4 text

How We Got Here •Aggregating product catalogs from multiple partners •Wanted auto-complete for product searches •Filterable searches on product groups •Replace database full-text search •Bring site search in-house

Slide 5

Slide 5 text

Elasticsearch Terms Node A node is a instance of elasticsearch which belongs to a cluster. Shard (set at index creation only, equivalent to Lucene index) Data partition within a node. There are primary and replica shards. More shards == faster indexing Replica (can modify at runtime) Copy of a primary shard used to increase performance and handle failover. More replicas == faster searching

Slide 6

Slide 6 text

More Elasticsearch Terms! Index The index is the top level data partition. There can be many indices within a cluster. Split across multiple shards. Document Type Each document in an index has a type and an index can contain many different document types. Document ID Unique identiﬁer for the document within the index/type namespace. Can explicitly set or allow to auto-generate.

Slide 7

Slide 7 text

Using Elasticsearch with the Browser

Slide 8

Slide 8 text

Using Tire gem with AR 1.Install and start elasticsearch 2.Add tire to Gemﬁle 3.bundle install 4.edit AR models to include: •Tire::Model::Search •Tire::Model::Callbacks 5.Import data into ES •rake environment tire:import CLASS='MODEL_NAME' •or Model.import

Slide 9

Slide 9 text

Using Tire with AR Tire::Model::Search Query DSL, index settings and mapping Tire::Model::Callbacks Uses AR callbacks on CUD to update index after_save/after_destroy

Slide 10

Slide 10 text

Caveats with Callbacks Depending on volume and response time requirements, implement work queue for updates (resque, sidekiq, delayed_job, etc.) Index updates happen in current thread (http request to ES)

Slide 11

Slide 11 text

Tire’s Default Behavior Indexes all model attributes except id, though uses id as the document id inside elasticsearch Creates named index equivalent to database table name Document type defaults to model_name.to_s.underscore class Topic < ActiveRecord::Base end Stored in index ‘topics’ with document type ‘topic’: http://localhost:9200/topics/topic/id

Slide 12

Slide 12 text

More Elasticsearch Terms! Mapping http://localhost:9200/INDEX_NAME/_mapping Deﬁnes ﬁelds and data-types within an index/type tire do index_name 'topics' mapping do indexes :id, :index => :not_analyzed indexes :created_at, :type => 'date' indexes :forum_id, :type => 'integer' indexes :subject, :type => 'string', :analyzer => 'keyword', :boost => 100 indexes :last_post_author, :type => 'string' indexes :last_post_text, :type => 'string', :as => Proc.new{ posts.active.any? ? posts.active.last.text : "" } indexes :last_post_updated_at, :type => 'date' indexes :ranking, :type => 'integer' indexes :deleted, :type => 'integer' indexes :text, :type => 'string', :analyzer => 'keyword', :as => Proc.new{ posts.active.collect{ |post| post.text }.join(' ') } end end

Slide 13

Slide 13 text

More Elasticsearch Terms! Analyzer Determines how elasticsearch indexes a given ﬁeld. Composed of a tokenizer and zero or more ﬁlters. I hate the Wetlands. They’re stupid and wet, and there are bugs everywhere. And I think I maced a crane, Michael. hate wetlands they stupid wet bugs everywhere think maced crane michael

Slide 14

Slide 14 text

Autocomplete Demo def autocomplete results = Topic.subject_matches(params[:term]) render :json => results, :callback => params[:callback] end topic_controller.rb def self.subject_matches(term) tire.search do query { string "subject:#{term}" } end end topic.rb

Slide 15

Slide 15 text

Autocomplete Demo $("#autocomplete").autocomplete({ source: function(request, response) { jQuery.ajax({ url: "http://localhost:4000/topic/autocomplete.json", dataType: "jsonp", data: { term: request.term }, success: function(data) { var rows = []; for (var i = 0; i < data.length; i++) { rows.push({ data[i].subject), value:data[i].subject }); } response(rows); } }); } });

Slide 16

Slide 16 text

Other Query Examples tire.search :page => params[:page], :per_page => 25 do query { string "#{term}" } end tire.search do query { string "#{term}" } filter :terms, :last_post_author => ["spiceworks"] facet "last_author" do terms :last_post_author end end Paging Filters (facets)

Slide 17

Slide 17 text

Other Query Examples tire.search :page => params[:page], :per_page => 25 do boolean do must { string "subject:#{term}" } should { string "category:#{category}" } must_not { string "inactive:true" } end sort do by :subject by :created_at end end Boolean with multiple sort

Slide 18

Slide 18 text

Raw Queries Tire.search "catalog", {"query"=> {"custom_score"=> {"query"=> {"bool"=> {"should"=> [{"field"=>{"keywords"=>"mouse AND keyboard AND bluetooth"}}, {"text"=> {"category"=>{"query"=>"bluetooth", "analyzer"=>"synonym"}}}], "minimum_number_should_match"=>1}}, "script"=>"_score * doc['popularity'].value"}}, "size"=>25, "from"=>0, "facets"=> {"vendor"=>{"terms"=>{"field"=>"vendor", "size"=>20}}, "manufacturer"=>{"terms"=>{"field"=>"manufacturer", "size"=>40}}, "avg_price"=>{"range"=>{"field"=>"avg_price", "ranges"=>[{"from"=>0}]}}, "avg_rating"=>{"range"=>{"field"=>"avg_rating", "ranges"=>[{"from"=>0}]}}}}

Slide 19

Slide 19 text

Tire and Multi-node Clusters 1) Need some type of load balancer, 2) or elasticsearch instance conﬁgured with: node.client set to true node.data set to false

Slide 20

Slide 20 text

Monitoring UIs paramedic bigdesk written by Lukáš Vlček (Jboss/Red Hat) written by Karel Minarik (Tire author)

Slide 21

Slide 21 text

Monitoring URLs cluster state cluster settings http://localhost:9200/_cluster/state http://localhost:9200/_cluster/settings index statistics http://localhost:9200/_stats

Slide 22

Slide 22 text

Scaling Elasticsearch Max cluster size dependent on index settings: nbr_shards + (nbr_replicas * nbr_shards) = max_nodes 5 + (1 * 5) = 10 nodes Defaults are 5 shards with 1 replica: number of shards is speciﬁed at index creation time and cannot be changed number of replicas can be changed anytime •Allow you to control where documents are stored and queried. •Without routing, all shards are queried and results are aggregated •With routing, you specify the shard to be queried Index and query routing

Slide 23

Slide 23 text

BONUS! Elasticsearch 0.90.0 Released on Monday Now using Lucene 4. Webinar tomorrow at 9AM PDT for overview

Slide 24

Slide 24 text

[email protected]