Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Finding the Needle (DjangoCon US 2013)

Ben Lopatin
September 03, 2013

Finding the Needle (DjangoCon US 2013)

The ability to find content on a site is important to users, and there are some great tools that make a sometimes tricky problem a lot simpler. This talk will address the search problem, introduce a few of the tools at your disposal, and provide a way to get started using Django Haystack.

The goals is for developers new to search to understand what's different about using a search engine as an additional service, to be aware of some of the "gotchas", and to know not just what's possible but how to get started.

Ben Lopatin

September 03, 2013
Tweet

More Decks by Ben Lopatin

Other Decks in Programming

Transcript

  1. n Hendrik 05 work Synopsis Pla es' Linkia levis was

    in fact P. lanc n's name in his 1810 work Prodromus Florae diae et Insulae Van Diemen, and echoed Persoon's vanilles' original name and specimen. In the 1995 ing revision of Finding the Needle, reviewed the m erial of Linkia levis, and found that Cavanilles had m erial from both Search and Django. He set one spe ee, which was clearly P. levis, as the lectotype, whi aterial with the description. Common names include eebung, willow geebung and smooth synonyms. Th derived from the language word. Like most other onia levis has seven chromosomes th her Proteaceae. In 1870, G gement of Pe th DjangoCon 2013 - Ben Lopatin
  2. ➜    ~  whoami Ben  Lopatin  (bennylope) ➜    ~

     echo  $HOME Richmond,  VA  /  Washington,  DC Principal and developer @ Wellfire Interactive
  3. 1.Understand the search problem 2.Role of the search engine 3.Nifty

    search features 4.Adding search with Haystack 5.Implementation strategies 6.Limitations and options
  4. def make_point(point_string): # Returns a Point or None from coords

    string def spatial_search_view(request): # Clean form, build initial searchqueryset bottom_left = make_point(request.GET.get('bl', '')) top_right = make_point(request.GET.get('tr', '')) if bottom_left and top_right: queryset = queryset.within('coords', bottom_left, bottom_right)
  5. Field boosting Document boosting Term boosting def prepare(self, obj): data

    = super(ThisIndex, self).prepare(obj) data['_boost'] = 1.2 return data
  6. ElasticSearch analysis settings { "index" : { "analysis" : {

    "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonyms"] } }, "filter" : { "synonyms" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } }
  7. ElasticSearch analysis settings { "index" : { "analysis" : {

    "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonyms"] } }, "filter" : { "synonyms" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } }
  8. ElasticSearch analysis settings { "index" : { "analysis" : {

    "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonyms"] } }, "filter" : { "synonyms" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } }
  9. ElasticSearch analysis settings { "index" : { "analysis" : {

    "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonyms"] } }, "filter" : { "synonyms" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } }
  10. ElasticSearch analysis settings { "index" : { "analysis" : {

    "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonyms"] } }, "filter" : { "synonyms" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } }
  11. ElasticSearch analysis settings { "index" : { "analysis" : {

    "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonyms"] } }, "filter" : { "synonyms" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } }
  12. ElasticSearch analysis settings { "index" : { "analysis" : {

    "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonyms"] } }, "filter" : { "synonyms" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } }
  13. ElasticSearch analysis settings { "index" : { "analysis" : {

    "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonyms"] } }, "filter" : { "synonyms" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } }
  14. ElasticSearch analysis settings { "index" : { "analysis" : {

    "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonyms"] } }, "filter" : { "synonyms" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } }
  15. if current_mapping != self.existing_mapping: try: # Make sure the index

    is there first. self.conn.create_index(self.index_name, self.DEFAULT_SETTINGS) self.conn.put_mapping(self.index_name, 'modelresult', current_mapping) self.existing_mapping = current_mapping except Exception: if not self.silently_fail: raise
  16. if field_class.field_type in ['date', 'datetime']: field_mapping['type'] = 'date' elif field_class.field_type

    == 'integer': field_mapping['type'] = 'long' elif field_class.field_type == 'float': field_mapping['type'] = 'float' elif field_class.field_type == 'boolean': field_mapping['type'] = 'boolean' elif field_class.field_type == 'ngram': field_mapping['analyzer'] = "ngram_analyzer" elif field_class.field_type == 'edge_ngram': field_mapping['analyzer'] = "edgengram_analyzer" elif field_class.field_type == 'location': field_mapping['type'] = 'geo_point' # ... code skipped here if field_mapping['type'] == 'string' and field_class.indexed: field_mapping["term_vector"] = "with_positions_offsets" if not hasattr(field_class, 'facet_for') and not\ field_class.field_type in('ngram', 'edge_ngram'): field_mapping["analyzer"] = "snowball"
  17. def build_schema(self, fields): content_field_name, mapping = super(ConfigurableElasticBackend, self).build_schema(fields) for field_name,

    field_class in fields.items(): field_mapping = mapping[field_class.index_fieldname] if field_mapping['type'] == 'string' and field_class.indexed: if not hasattr(field_class, 'facet_for') and not \ field_class.field_type in('ngram', 'edge_ngram'): field_mapping['analyzer'] = self.DEFAULT_ANALYZER) mapping.update({field_class.index_fieldname: field_mapping}) return (content_field_name, mapping)
  18. def build_schema(self, fields): content_field_name, mapping = super(ConfigurableElasticBackend, self).build_schema(fields) for field_name,

    field_class in fields.items(): field_mapping = mapping[field_class.index_fieldname] if field_mapping['type'] == 'string' and field_class.indexed: if not hasattr(field_class, 'facet_for') and not \ field_class.field_type in('ngram', 'edge_ngram'): field_mapping['analyzer'] = getattr(field_class, 'analyzer', self.DEFAULT_ANALYZER) mapping.update({field_class.index_fieldname: field_mapping}) return (content_field_name, mapping)
  19. from haystack.fields import CharField as BaseCharField class ConfigurableFieldMixin(object): def __init__(self,

    **kwargs): self.analyzer = kwargs.pop('analyzer', None) super(ConfigurableFieldMixin, self).__init__(**kwargs) class CharField(ConfigurableFieldMixin, BaseCharField): pass
  20. n Hendrik 05 work Synopsis Pla es' Linkia levis was

    in fact P. lanc n's name in his 1810 work Prodromus Florae diae et Insulae Van Diemen, and echoed Persoon's vanilles' original name and specimen. In the 1995 ing revision of THE END, reviewed the mounted m a levis, and found that Cavanilles had mounted ma h Search and Django. He set one specimen of the t s clearly P. levis, as the lectotype, which aligned th e description. Common names include broad-leave llow geebung and smooth synonyms. The term ge om the language word. Like most other members o vis has seven chromosomes that are la e. In 1870, George Benth rsoonia in Volu nu DjangoCon 2013 - Ben Lopatin ciafactbook.herokuapp.com tinyurl.com/finding-the-needle