Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Searching in Neos with Elasticsearch - InspiringCon 2015 in Kolbermoor

Searching in Neos with Elasticsearch - InspiringCon 2015 in Kolbermoor

Searching of content is a core feature of almost every website, especially for bigger ones. This talk will explain the general approach of proactively indexing content, also highlighting the supported indexing systems like ElasticSearch. It will especially focus on indexing custom data, and show how this can be integrated with data from external systems like Magento. Furthermore, we’ll highlight some features which can be built upon a flexible indexing system, such as tagging or categorization of content. You’ll see that search combined with custom content types solves lots of use cases where custom home-grewn solutions had to be implemented beforehand.

Sebastian Kurfürst

March 28, 2015
Tweet

More Decks by Sebastian Kurfürst

Other Decks in Technology

Transcript

  1. F I R S T N A M E L A S T N A M E
    @ S K U R F U E R S T
    S E A R C H I N G
    I N N E O S
    S E B A S T I A N
    K U R F Ü R S T

    View Slide

  2. Sebastian
    Kurfürst
    @skurfuerst

    View Slide

  3. View Slide

  4. exply.io
    Enterprise Search meets
    Business Intelligence

    View Slide

  5. View Slide

  6. View Slide


  7. View Slide

  8. features (Page)
    main (ContentCollection)
    … (Headline)
    … (Text)
    roadmap (Page)
    neostypo3org (Page)
    Tree of
    Nodes
    de en
    de en
    de en
    en
    en
    unsere-codesprints (Page) de

    View Slide

  9. TYPO3CR is great for
    Tree Traversal

    View Slide

  10. View Slide

  11. Find all articles written by Sebastian.
    Display the first three locations tagged with
    ConferenceLocation.
    What are the newest pages in a certain category?

    View Slide

  12. features (Page)
    main (ContentCollection)
    … (Headline)
    … (Text)
    roadmap (Page)
    neostypo3org (Page)
    unsere-codesprints (Page)

    View Slide

  13. features (Page)
    main (ContentCollection)
    … (Headline)
    … (Text)
    roadmap (Page)
    neostypo3org (Page)
    unsere-codesprints (Page)

    View Slide

  14. All Documents & Content
    Currently Relevant Content

    View Slide

  15. Currently, TYPO3CR does
    not yet effectively provide
    this set-based view on nodes.

    View Slide

  16. to the rescue!

    View Slide

  17. Getting Started

    View Slide

  18. 1. Set up ElasticSearch
    # ElasticSearch 1.4.4 - config/elasticsearch.yml
    script.disable_dynamic: sandbox
    script.groovy.sandbox.class_whitelist: java.util.LinkedHashMap
    script.groovy.sandbox.receiver_whitelist: java.util.Iterator, 

    java.lang.Object, java.util.Map, java.util.Map$Entry
    script.groovy.sandbox.enabled: true
    cluster.name: [PUT_YOUR_CUSTOM_NAME_HERE]
    network.host: 127.0.0.1
    index.number_of_shards: 1
    index.number_of_replicas: 0

    View Slide

  19. 2. Start ElasticSearch
    bin/elasticsearch

    View Slide

  20. composer require --prefer-source typo3/typo3cr-search @dev
    composer require --prefer-source flowpack/elasticsearch-
    contentrepositoryadaptor @dev
    2. Require the CR Adaptor
    TODO: no @dev anymore!

    View Slide

  21. /flow nodeindex:build
    3. Indexing

    View Slide

  22. 4. Debugging Tools
    http://localhost:9200/_plugin/head/
    http://localhost:9200/_plugin/sense/

    View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. composer require --prefer-source flowpack/searchplugin @dev

    View Slide

  27. View Slide

  28. This is fulltext search.

    View Slide

  29. View Slide

  30. Node Querying

    View Slide

  31. Article
    Category
    Tag
    contains
    tagged
    with

    View Slide

  32. 1. Node References
    # build up relation in NodeTypes.yaml
    'Sandstorm.News:Article':
    superTypes: ['TYPO3.Neos:Document']
    ...
    properties:
    tags:
    type: references
    ui:
    label: 'Tags'
    inspector:
    editorOptions:
    # allow only references to tags
    nodeTypes: ['Sandstorm.News:Tag']

    View Slide

  33. 2. Query in TypoScript
    # replace main content area by a custom TypoScript object
    prototype(PrimaryContent).newsTag {
    condition = ${q(node).is('[instanceof Sandstorm.News:Tag]')}
    type = 'Sandstorm.News:Tag'
    }

    View Slide

  34. 2. Query in TypoScript
    # inherits from Template by default
    prototype(Sandstorm.News:Tag) {
    latestArticlesTaggedWithTag = ${...}
    }

    View Slide

  35. 2. Query in TypoScript
    latestArticlesTaggedWithTag =
    ${Search.query(site) # search underneath this site

    .nodeType('Sandstorm.News:Article') # filter by node type
    .exactMatch('tags', node) # where tag == current tag
    .limit(3) # first 3 results
    .sortDesc('publishDate') # and sort by publishing date desc
    .execute()}

    View Slide

  36. 3. Use in Template





    View Slide

  37. ${Search.query(site)

    .fulltext('Alice')
    .execute()}

    View Slide

  38. Node References
    together with
    Search Queries

    View Slide

  39. View Slide

  40. ElasticSearch Core Concepts

    View Slide

  41. A
    T2
    T1
    T3
    Normalized Data
    in a relational DB
    A T1
    A T2
    A T3
    Denormalized Data
    in an index

    View Slide

  42. GET Index/Type/Document-ID

    View Slide

  43. GET Index/Type/_mapping

    View Slide

  44. GET Index/Type/_mapping
    _all

    View Slide

  45. typo3cr-1426882860
    typo3cr-1426885219
    typo3cr
    Index Aliases
    allow index rebuilds!

    View Slide

  46. Hey, InspiringCon 2015
    Hey, InspiringCon 2015
    Tokenization
    Token Filtering
    Hey InspiringCon 2015
    hey inspiringcon 2015
    Indexing Pipeline
    InspiringCon2015
    Search Pipeline
    InspiringCon 2015
    inspiringcon 2015

    View Slide

  47. View Slide

  48. ${Search.query(site)

    .fulltext('Alice')
    .execute()}
    .log()
    15-03-23 07:20:50 1820 DEBUG Query Log (): {"query":{"filtered":{"query":{"bool":{"must":[{"match_all":[]},{"query_string":
    {"query":"Alice"}}]}},"filter":{"bool":{"must":[{"term":{"__parentPath":"\/sites\/neosdemotypo3org"}},{"terms":{"__workspace":["live"]}}],"should":[],"must_not":[{"term":
    {"_hidden":true}},{"range":{"_hiddenBeforeDateTime":{"gt":"now"}}},{"range":{"_hiddenAfterDateTime":{"lt":"now"}}}]}}}},"fields":["__path"],"highlight":{"fields":
    {"__fulltext*":{"fragment_size":150,"no_match_size":150,"number_of_fragments":2}}}} -- execution time: 10.998010635376 ms -- Total Results: 28
    Data/Logs/ElasticSearch.log

    View Slide

  49. View Slide

  50. View Slide

  51. Aggregations calculate
    statistical information
    about the current result.
    TODO Kibana Screenshot

    View Slide

  52. Aggregations calculate
    statistical information
    about the current result.

    View Slide

  53. Fine-Tuning ElasticSearch+Neos

    View Slide

  54. 1. ElasticSearch Schema

    View Slide

  55. 1. ElasticSearch Schema
    TYPO3:

    TYPO3CR:

    Search:

    defaultConfigurationPerType:

    string:

    elasticSearchMapping:

    type: string

    include_in_all: false

    boolean:

    elasticSearchMapping:

    type: boolean

    date:

    elasticSearchMapping:

    type: date

    format: 'date_time_no_millis'

    include_in_all: false

    Settings.yaml NodeTypes.yaml

    'TYPO3.Neos:Node': &node

    properties:

    '__identifier':

    search:

    elasticSearchMapping:

    type: string

    index: not_analyzed

    include_in_all: false


    defaults overrides
    indexing: '${node.identifier}'

    View Slide

  56. 2. ElasticSearch Indexing
    indexing: '${Indexing.buildAllPathPrefixes(node.parentPath)}'
    indexing: '${node.identifier}'

    View Slide

  57. 3. Fulltext Searching
    We at InspiringCon (Article)
    main (ContentCollection)
    … (Headline)
    … (Text)
    collect all
    content
    Fulltext Root

    View Slide

  58. 3. Fulltext Searching
    We at InspiringCon (Article)
    main (ContentCollection)
    … (Headline)
    … (Text)
    h1 h2 ... text

    View Slide

  59. 3. Fulltext Searching
    # predefined in Neos
    'TYPO3.Neos:Document':

    search:

    fulltext:

    isRoot: true
    'TYPO3.Neos.NodeTypes:Text':

    properties:

    'text':

    search:

    fulltextExtractor: '${Indexing.extractHtmlTags(value)}'
    'Sandstorm.News:Article':

    properties:

    'title':

    search:

    fulltextExtractor: '${Indexing.extractInto("h1", value)}'

    View Slide

  60. Indexing Additional Data

    View Slide

  61. typo3cr-1426882860
    search
    Index Aliases allow to link multiple
    indices.
    products-2976886808

    View Slide

  62. ElasticSearch Rivers
    can poll data from other systems.

    View Slide

  63. View Slide

  64. ElasticSearch too big for your project?
    Use SimpleSearch!
    composer require --prefer-source flowpack/simplesearch-
    contentrepositoryadaptor @dev

    View Slide

  65. View Slide

  66. Resources
    http://www.elasticsearch.org/guide/
    README of Flowpack.ElasticSearch.ContentRepositoryAdaptor

    View Slide

  67. Thank You!

    View Slide

  68. View Slide