Searching in Neos with Elasticsearch - InspiringCon 2015 in Kolbermoor

Searching in Neos with Elasticsearch - InspiringCon 2015 in Kolbermoor

Searching of content is a core feature of almost every website, especially for bigger ones. This talk will explain the general approach of proactively indexing content, also highlighting the supported indexing systems like ElasticSearch. It will especially focus on indexing custom data, and show how this can be integrated with data from external systems like Magento. Furthermore, we’ll highlight some features which can be built upon a flexible indexing system, such as tagging or categorization of content. You’ll see that search combined with custom content types solves lots of use cases where custom home-grewn solutions had to be implemented beforehand.

30c0b6f50f67163bee8500aa4115d126?s=128

Sebastian Kurfürst

March 28, 2015
Tweet

Transcript

  1. F I R S T N A M E L

    A S T N A M E @ S K U R F U E R S T S E A R C H I N G I N N E O S S E B A S T I A N K U R F Ü R S T
  2. Sebastian Kurfürst @skurfuerst

  3. None
  4. exply.io Enterprise Search meets Business Intelligence

  5. None
  6. None
  7. features (Page) main (ContentCollection) … (Headline) … (Text) roadmap (Page)

    neostypo3org (Page) Tree of Nodes de en de en de en en en unsere-codesprints (Page) de
  8. TYPO3CR is great for Tree Traversal

  9. None
  10. Find all articles written by Sebastian. Display the first three

    locations tagged with ConferenceLocation. What are the newest pages in a certain category?
  11. features (Page) main (ContentCollection) … (Headline) … (Text) roadmap (Page)

    neostypo3org (Page) unsere-codesprints (Page)
  12. features (Page) main (ContentCollection) … (Headline) … (Text) roadmap (Page)

    neostypo3org (Page) unsere-codesprints (Page)
  13. All Documents & Content Currently Relevant Content

  14. Currently, TYPO3CR does not yet effectively provide this set-based view

    on nodes.
  15. to the rescue!

  16. Getting Started

  17. 1. Set up ElasticSearch # ElasticSearch 1.4.4 - config/elasticsearch.yml script.disable_dynamic:

    sandbox script.groovy.sandbox.class_whitelist: java.util.LinkedHashMap script.groovy.sandbox.receiver_whitelist: java.util.Iterator, 
 java.lang.Object, java.util.Map, java.util.Map$Entry script.groovy.sandbox.enabled: true cluster.name: [PUT_YOUR_CUSTOM_NAME_HERE] network.host: 127.0.0.1 index.number_of_shards: 1 index.number_of_replicas: 0
  18. 2. Start ElasticSearch bin/elasticsearch

  19. composer require --prefer-source typo3/typo3cr-search @dev composer require --prefer-source flowpack/elasticsearch- contentrepositoryadaptor

    @dev 2. Require the CR Adaptor TODO: no @dev anymore!
  20. /flow nodeindex:build 3. Indexing

  21. 4. Debugging Tools http://localhost:9200/_plugin/head/ http://localhost:9200/_plugin/sense/

  22. None
  23. None
  24. None
  25. composer require --prefer-source flowpack/searchplugin @dev

  26. None
  27. This is fulltext search.

  28. None
  29. Node Querying

  30. Article Category Tag contains tagged with

  31. 1. Node References # build up relation in NodeTypes.yaml 'Sandstorm.News:Article':

    superTypes: ['TYPO3.Neos:Document'] ... properties: tags: type: references ui: label: 'Tags' inspector: editorOptions: # allow only references to tags nodeTypes: ['Sandstorm.News:Tag']
  32. 2. Query in TypoScript # replace main content area by

    a custom TypoScript object prototype(PrimaryContent).newsTag { condition = ${q(node).is('[instanceof Sandstorm.News:Tag]')} type = 'Sandstorm.News:Tag' }
  33. 2. Query in TypoScript # inherits from Template by default

    prototype(Sandstorm.News:Tag) { latestArticlesTaggedWithTag = ${...} }
  34. 2. Query in TypoScript latestArticlesTaggedWithTag = ${Search.query(site) # search underneath

    this site
 .nodeType('Sandstorm.News:Article') # filter by node type .exactMatch('tags', node) # where tag == current tag .limit(3) # first 3 results .sortDesc('publishDate') # and sort by publishing date desc .execute()}
  35. 3. Use in Template <f:for each="{latestArticlesTaggedWithTag}" as="singleArticle"> <neos:link.node node="{singleArticle}"> <!--

    Render single article as you would like --> </neos:link.node> </f:for>
  36. ${Search.query(site)
 .fulltext('Alice') .execute()}

  37. Node References together with Search Queries

  38. None
  39. ElasticSearch Core Concepts

  40. A T2 T1 T3 Normalized Data in a relational DB

    A T1 A T2 A T3 Denormalized Data in an index
  41. GET Index/Type/Document-ID

  42. GET Index/Type/_mapping

  43. GET Index/Type/_mapping _all

  44. typo3cr-1426882860 typo3cr-1426885219 typo3cr Index Aliases allow index rebuilds!

  45. Hey, InspiringCon 2015 Hey, InspiringCon 2015 Tokenization Token Filtering Hey

    InspiringCon 2015 hey inspiringcon 2015 Indexing Pipeline InspiringCon2015 Search Pipeline InspiringCon 2015 inspiringcon 2015
  46. None
  47. ${Search.query(site)
 .fulltext('Alice') .execute()} .log() 15-03-23 07:20:50 1820 DEBUG Query Log

    (): {"query":{"filtered":{"query":{"bool":{"must":[{"match_all":[]},{"query_string": {"query":"Alice"}}]}},"filter":{"bool":{"must":[{"term":{"__parentPath":"\/sites\/neosdemotypo3org"}},{"terms":{"__workspace":["live"]}}],"should":[],"must_not":[{"term": {"_hidden":true}},{"range":{"_hiddenBeforeDateTime":{"gt":"now"}}},{"range":{"_hiddenAfterDateTime":{"lt":"now"}}}]}}}},"fields":["__path"],"highlight":{"fields": {"__fulltext*":{"fragment_size":150,"no_match_size":150,"number_of_fragments":2}}}} -- execution time: 10.998010635376 ms -- Total Results: 28 Data/Logs/ElasticSearch.log
  48. None
  49. None
  50. Aggregations calculate statistical information about the current result. TODO Kibana

    Screenshot
  51. Aggregations calculate statistical information about the current result.

  52. Fine-Tuning ElasticSearch+Neos

  53. 1. ElasticSearch Schema

  54. 1. ElasticSearch Schema TYPO3:
 TYPO3CR:
 Search:
 defaultConfigurationPerType:
 string:
 elasticSearchMapping:
 type:

    string
 include_in_all: false
 boolean:
 elasticSearchMapping:
 type: boolean
 date:
 elasticSearchMapping:
 type: date
 format: 'date_time_no_millis'
 include_in_all: false
 Settings.yaml NodeTypes.yaml 
 'TYPO3.Neos:Node': &node
 properties:
 '__identifier':
 search:
 elasticSearchMapping:
 type: string
 index: not_analyzed
 include_in_all: false
 
 defaults overrides indexing: '${node.identifier}'
  55. 2. ElasticSearch Indexing indexing: '${Indexing.buildAllPathPrefixes(node.parentPath)}' indexing: '${node.identifier}'

  56. 3. Fulltext Searching We at InspiringCon (Article) main (ContentCollection) …

    (Headline) … (Text) collect all content Fulltext Root
  57. 3. Fulltext Searching We at InspiringCon (Article) main (ContentCollection) …

    (Headline) … (Text) h1 h2 ... text
  58. 3. Fulltext Searching # predefined in Neos 'TYPO3.Neos:Document':
 search:
 fulltext:


    isRoot: true 'TYPO3.Neos.NodeTypes:Text':
 properties:
 'text':
 search:
 fulltextExtractor: '${Indexing.extractHtmlTags(value)}' 'Sandstorm.News:Article':
 properties:
 'title':
 search:
 fulltextExtractor: '${Indexing.extractInto("h1", value)}'
  59. Indexing Additional Data

  60. typo3cr-1426882860 search Index Aliases allow to link multiple indices. products-2976886808

  61. ElasticSearch Rivers can poll data from other systems.

  62. None
  63. ElasticSearch too big for your project? Use SimpleSearch! composer require

    --prefer-source flowpack/simplesearch- contentrepositoryadaptor @dev
  64. None
  65. Resources http://www.elasticsearch.org/guide/ README of Flowpack.ElasticSearch.ContentRepositoryAdaptor

  66. Thank You!

  67. None