Using Elasticsearch with Rails Applications

Using Elasticsearch with Rails Applications Brian Gugliemetti

Goals •Teach basic Elasticsearch terms & concepts •How to use
Elasticsearch with Rails •Queries & Filters •How to manage Elasticsearch

What is Elasticsearch? •Open source distributed RESTful search engine built
on top of Apache Lucene •Provides clustering, failover, and auto-discovery •Schema-less document store •JSON Document-based

How We Got Here •Aggregating product catalogs from multiple partners
•Wanted auto-complete for product searches •Filterable searches on product groups •Replace database full-text search •Bring site search in-house

Elasticsearch Terms Node A node is a instance of elasticsearch
which belongs to a cluster. Shard (set at index creation only, equivalent to Lucene index) Data partition within a node. There are primary and replica shards. More shards == faster indexing Replica (can modify at runtime) Copy of a primary shard used to increase performance and handle failover. More replicas == faster searching

More Elasticsearch Terms! Index The index is the top level
data partition. There can be many indices within a cluster. Split across multiple shards. Document Type Each document in an index has a type and an index can contain many different document types. Document ID Unique identiﬁer for the document within the index/type namespace. Can explicitly set or allow to auto-generate.

Using Elasticsearch with the Browser

Using Tire gem with AR 1.Install and start elasticsearch 2.Add
tire to Gemﬁle 3.bundle install 4.edit AR models to include: •Tire::Model::Search •Tire::Model::Callbacks 5.Import data into ES •rake environment tire:import CLASS='MODEL_NAME' •or Model.import

Using Tire with AR Tire::Model::Search Query DSL, index settings and
mapping Tire::Model::Callbacks Uses AR callbacks on CUD to update index after_save/after_destroy

Caveats with Callbacks Depending on volume and response time requirements,
implement work queue for updates (resque, sidekiq, delayed_job, etc.) Index updates happen in current thread (http request to ES)

Tire’s Default Behavior Indexes all model attributes except id, though
uses id as the document id inside elasticsearch Creates named index equivalent to database table name Document type defaults to model_name.to_s.underscore class Topic < ActiveRecord::Base end Stored in index ‘topics’ with document type ‘topic’: http://localhost:9200/topics/topic/id

More Elasticsearch Terms! Mapping http://localhost:9200/INDEX_NAME/_mapping Deﬁnes ﬁelds and data-types within
an index/type tire do index_name 'topics' mapping do indexes :id, :index => :not_analyzed indexes :created_at, :type => 'date' indexes :forum_id, :type => 'integer' indexes :subject, :type => 'string', :analyzer => 'keyword', :boost => 100 indexes :last_post_author, :type => 'string' indexes :last_post_text, :type => 'string', :as => Proc.new{ posts.active.any? ? posts.active.last.text : "" } indexes :last_post_updated_at, :type => 'date' indexes :ranking, :type => 'integer' indexes :deleted, :type => 'integer' indexes :text, :type => 'string', :analyzer => 'keyword', :as => Proc.new{ posts.active.collect{ |post| post.text }.join(' ') } end end

More Elasticsearch Terms! Analyzer Determines how elasticsearch indexes a given
ﬁeld. Composed of a tokenizer and zero or more ﬁlters. I hate the Wetlands. They’re stupid and wet, and there are bugs everywhere. And I think I maced a crane, Michael. hate wetlands they stupid wet bugs everywhere think maced crane michael

Autocomplete Demo def autocomplete results = Topic.subject_matches(params[:term]) render :json =>
results, :callback => params[:callback] end topic_controller.rb def self.subject_matches(term) tire.search do query { string "subject:#{term}" } end end topic.rb

Autocomplete Demo $("#autocomplete").autocomplete({ source: function(request, response) { jQuery.ajax({ url: "http://localhost:4000/topic/autocomplete.json",
dataType: "jsonp", data: { term: request.term }, success: function(data) { var rows = []; for (var i = 0; i < data.length; i++) { rows.push({ data[i].subject), value:data[i].subject }); } response(rows); } }); } }); <input name="subject" type="text" id="autocomplete">

Other Query Examples tire.search :page => params[:page], :per_page => 25
do query { string "#{term}" } end tire.search do query { string "#{term}" } filter :terms, :last_post_author => ["spiceworks"] facet "last_author" do terms :last_post_author end end Paging Filters (facets)

Other Query Examples tire.search :page => params[:page], :per_page => 25
do boolean do must { string "subject:#{term}" } should { string "category:#{category}" } must_not { string "inactive:true" } end sort do by :subject by :created_at end end Boolean with multiple sort

Raw Queries Tire.search "catalog", {"query"=> {"custom_score"=> {"query"=> {"bool"=> {"should"=> [{"field"=>{"keywords"=>"mouse
AND keyboard AND bluetooth"}}, {"text"=> {"category"=>{"query"=>"bluetooth", "analyzer"=>"synonym"}}}], "minimum_number_should_match"=>1}}, "script"=>"_score * doc['popularity'].value"}}, "size"=>25, "from"=>0, "facets"=> {"vendor"=>{"terms"=>{"field"=>"vendor", "size"=>20}}, "manufacturer"=>{"terms"=>{"field"=>"manufacturer", "size"=>40}}, "avg_price"=>{"range"=>{"field"=>"avg_price", "ranges"=>[{"from"=>0}]}}, "avg_rating"=>{"range"=>{"field"=>"avg_rating", "ranges"=>[{"from"=>0}]}}}}

Tire and Multi-node Clusters 1) Need some type of load
balancer, 2) or elasticsearch instance conﬁgured with: node.client set to true node.data set to false

Monitoring UIs paramedic bigdesk written by Lukáš Vlček (Jboss/Red Hat)
written by Karel Minarik (Tire author)

Monitoring URLs cluster state cluster settings http://localhost:9200/_cluster/state http://localhost:9200/_cluster/settings index statistics
http://localhost:9200/_stats

Scaling Elasticsearch Max cluster size dependent on index settings: nbr_shards
+ (nbr_replicas * nbr_shards) = max_nodes 5 + (1 * 5) = 10 nodes Defaults are 5 shards with 1 replica: number of shards is speciﬁed at index creation time and cannot be changed number of replicas can be changed anytime •Allow you to control where documents are stored and queried. •Without routing, all shards are queried and results are aggregated •With routing, you specify the shard to be queried Index and query routing

BONUS! Elasticsearch 0.90.0 Released on Monday Now using Lucene 4.
Webinar tomorrow at 9AM PDT for overview

[email protected]

Using Elasticsearch with Rails Applications

Using Elasticsearch with Rails Applications

Brian Gugliemetti

Other Decks in Programming

Featured

Transcript