Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using Elasticsearch with Rails Applications

Using Elasticsearch with Rails Applications

From RailsConf 2013 Portland

Brian Gugliemetti

May 01, 2013
Tweet

Other Decks in Programming

Transcript

  1. Goals •Teach basic Elasticsearch terms & concepts •How to use

    Elasticsearch with Rails •Queries & Filters •How to manage Elasticsearch
  2. What is Elasticsearch? •Open source distributed RESTful search engine built

    on top of Apache Lucene •Provides clustering, failover, and auto-discovery •Schema-less document store •JSON Document-based
  3. How We Got Here •Aggregating product catalogs from multiple partners

    •Wanted auto-complete for product searches •Filterable searches on product groups •Replace database full-text search •Bring site search in-house
  4. Elasticsearch Terms Node A node is a instance of elasticsearch

    which belongs to a cluster. Shard (set at index creation only, equivalent to Lucene index) Data partition within a node. There are primary and replica shards. More shards == faster indexing Replica (can modify at runtime) Copy of a primary shard used to increase performance and handle failover. More replicas == faster searching
  5. More Elasticsearch Terms! Index The index is the top level

    data partition. There can be many indices within a cluster. Split across multiple shards. Document Type Each document in an index has a type and an index can contain many different document types. Document ID Unique identifier for the document within the index/type namespace. Can explicitly set or allow to auto-generate.
  6. Using Tire gem with AR 1.Install and start elasticsearch 2.Add

    tire to Gemfile 3.bundle install 4.edit AR models to include: •Tire::Model::Search •Tire::Model::Callbacks 5.Import data into ES •rake environment tire:import CLASS='MODEL_NAME' •or Model.import
  7. Using Tire with AR Tire::Model::Search Query DSL, index settings and

    mapping Tire::Model::Callbacks Uses AR callbacks on CUD to update index after_save/after_destroy
  8. Caveats with Callbacks Depending on volume and response time requirements,

    implement work queue for updates (resque, sidekiq, delayed_job, etc.) Index updates happen in current thread (http request to ES)
  9. Tire’s Default Behavior Indexes all model attributes except id, though

    uses id as the document id inside elasticsearch Creates named index equivalent to database table name Document type defaults to model_name.to_s.underscore class Topic < ActiveRecord::Base end Stored in index ‘topics’ with document type ‘topic’: http://localhost:9200/topics/topic/id
  10. More Elasticsearch Terms! Mapping http://localhost:9200/INDEX_NAME/_mapping Defines fields and data-types within

    an index/type tire do index_name 'topics' mapping do indexes :id, :index => :not_analyzed indexes :created_at, :type => 'date' indexes :forum_id, :type => 'integer' indexes :subject, :type => 'string', :analyzer => 'keyword', :boost => 100 indexes :last_post_author, :type => 'string' indexes :last_post_text, :type => 'string', :as => Proc.new{ posts.active.any? ? posts.active.last.text : "" } indexes :last_post_updated_at, :type => 'date' indexes :ranking, :type => 'integer' indexes :deleted, :type => 'integer' indexes :text, :type => 'string', :analyzer => 'keyword', :as => Proc.new{ posts.active.collect{ |post| post.text }.join(' ') } end end
  11. More Elasticsearch Terms! Analyzer Determines how elasticsearch indexes a given

    field. Composed of a tokenizer and zero or more filters. I hate the Wetlands. They’re stupid and wet, and there are bugs everywhere. And I think I maced a crane, Michael. hate wetlands they stupid wet bugs everywhere think maced crane michael
  12. Autocomplete Demo def autocomplete results = Topic.subject_matches(params[:term]) render :json =>

    results, :callback => params[:callback] end topic_controller.rb def self.subject_matches(term) tire.search do query { string "subject:#{term}" } end end topic.rb
  13. Autocomplete Demo $("#autocomplete").autocomplete({ source: function(request, response) { jQuery.ajax({ url: "http://localhost:4000/topic/autocomplete.json",

    dataType: "jsonp", data: { term: request.term }, success: function(data) { var rows = []; for (var i = 0; i < data.length; i++) { rows.push({ data[i].subject), value:data[i].subject }); } response(rows); } }); } }); <input name="subject" type="text" id="autocomplete">
  14. Other Query Examples tire.search :page => params[:page], :per_page => 25

    do query { string "#{term}" } end tire.search do query { string "#{term}" } filter :terms, :last_post_author => ["spiceworks"] facet "last_author" do terms :last_post_author end end Paging Filters (facets)
  15. Other Query Examples tire.search :page => params[:page], :per_page => 25

    do boolean do must { string "subject:#{term}" } should { string "category:#{category}" } must_not { string "inactive:true" } end sort do by :subject by :created_at end end Boolean with multiple sort
  16. Raw Queries Tire.search "catalog", {"query"=> {"custom_score"=> {"query"=> {"bool"=> {"should"=> [{"field"=>{"keywords"=>"mouse

    AND keyboard AND bluetooth"}}, {"text"=> {"category"=>{"query"=>"bluetooth", "analyzer"=>"synonym"}}}], "minimum_number_should_match"=>1}}, "script"=>"_score * doc['popularity'].value"}}, "size"=>25, "from"=>0, "facets"=> {"vendor"=>{"terms"=>{"field"=>"vendor", "size"=>20}}, "manufacturer"=>{"terms"=>{"field"=>"manufacturer", "size"=>40}}, "avg_price"=>{"range"=>{"field"=>"avg_price", "ranges"=>[{"from"=>0}]}}, "avg_rating"=>{"range"=>{"field"=>"avg_rating", "ranges"=>[{"from"=>0}]}}}}
  17. Tire and Multi-node Clusters 1) Need some type of load

    balancer, 2) or elasticsearch instance configured with: node.client set to true node.data set to false
  18. Scaling Elasticsearch Max cluster size dependent on index settings: nbr_shards

    + (nbr_replicas * nbr_shards) = max_nodes 5 + (1 * 5) = 10 nodes Defaults are 5 shards with 1 replica: number of shards is specified at index creation time and cannot be changed number of replicas can be changed anytime •Allow you to control where documents are stored and queried. •Without routing, all shards are queried and results are aggregated •With routing, you specify the shard to be queried Index and query routing