Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Searching in Neos with Elasticsearch - InspiringCon 2015 in Kolbermoor

Searching in Neos with Elasticsearch - InspiringCon 2015 in Kolbermoor

Searching of content is a core feature of almost every website, especially for bigger ones. This talk will explain the general approach of proactively indexing content, also highlighting the supported indexing systems like ElasticSearch. It will especially focus on indexing custom data, and show how this can be integrated with data from external systems like Magento. Furthermore, we’ll highlight some features which can be built upon a flexible indexing system, such as tagging or categorization of content. You’ll see that search combined with custom content types solves lots of use cases where custom home-grewn solutions had to be implemented beforehand.

Sebastian Kurfürst

March 28, 2015
Tweet

More Decks by Sebastian Kurfürst

Other Decks in Technology

Transcript

  1. F I R S T N A M E L A S T N A M E
    @ S K U R F U E R S T
    S E A R C H I N G
    I N N E O S
    S E B A S T I A N
    K U R F Ü R S T

    View full-size slide

  2. Sebastian
    Kurfürst
    @skurfuerst

    View full-size slide

  3. exply.io
    Enterprise Search meets
    Business Intelligence

    View full-size slide

  4. features (Page)
    main (ContentCollection)
    … (Headline)
    … (Text)
    roadmap (Page)
    neostypo3org (Page)
    Tree of
    Nodes
    de en
    de en
    de en
    en
    en
    unsere-codesprints (Page) de

    View full-size slide

  5. TYPO3CR is great for
    Tree Traversal

    View full-size slide

  6. Find all articles written by Sebastian.
    Display the first three locations tagged with
    ConferenceLocation.
    What are the newest pages in a certain category?

    View full-size slide

  7. features (Page)
    main (ContentCollection)
    … (Headline)
    … (Text)
    roadmap (Page)
    neostypo3org (Page)
    unsere-codesprints (Page)

    View full-size slide

  8. features (Page)
    main (ContentCollection)
    … (Headline)
    … (Text)
    roadmap (Page)
    neostypo3org (Page)
    unsere-codesprints (Page)

    View full-size slide

  9. All Documents & Content
    Currently Relevant Content

    View full-size slide

  10. Currently, TYPO3CR does
    not yet effectively provide
    this set-based view on nodes.

    View full-size slide

  11. to the rescue!

    View full-size slide

  12. Getting Started

    View full-size slide

  13. 1. Set up ElasticSearch
    # ElasticSearch 1.4.4 - config/elasticsearch.yml
    script.disable_dynamic: sandbox
    script.groovy.sandbox.class_whitelist: java.util.LinkedHashMap
    script.groovy.sandbox.receiver_whitelist: java.util.Iterator, 

    java.lang.Object, java.util.Map, java.util.Map$Entry
    script.groovy.sandbox.enabled: true
    cluster.name: [PUT_YOUR_CUSTOM_NAME_HERE]
    network.host: 127.0.0.1
    index.number_of_shards: 1
    index.number_of_replicas: 0

    View full-size slide

  14. 2. Start ElasticSearch
    bin/elasticsearch

    View full-size slide

  15. composer require --prefer-source typo3/typo3cr-search @dev
    composer require --prefer-source flowpack/elasticsearch-
    contentrepositoryadaptor @dev
    2. Require the CR Adaptor
    TODO: no @dev anymore!

    View full-size slide

  16. /flow nodeindex:build
    3. Indexing

    View full-size slide

  17. 4. Debugging Tools
    http://localhost:9200/_plugin/head/
    http://localhost:9200/_plugin/sense/

    View full-size slide

  18. composer require --prefer-source flowpack/searchplugin @dev

    View full-size slide

  19. This is fulltext search.

    View full-size slide

  20. Node Querying

    View full-size slide

  21. Article
    Category
    Tag
    contains
    tagged
    with

    View full-size slide

  22. 1. Node References
    # build up relation in NodeTypes.yaml
    'Sandstorm.News:Article':
    superTypes: ['TYPO3.Neos:Document']
    ...
    properties:
    tags:
    type: references
    ui:
    label: 'Tags'
    inspector:
    editorOptions:
    # allow only references to tags
    nodeTypes: ['Sandstorm.News:Tag']

    View full-size slide

  23. 2. Query in TypoScript
    # replace main content area by a custom TypoScript object
    prototype(PrimaryContent).newsTag {
    condition = ${q(node).is('[instanceof Sandstorm.News:Tag]')}
    type = 'Sandstorm.News:Tag'
    }

    View full-size slide

  24. 2. Query in TypoScript
    # inherits from Template by default
    prototype(Sandstorm.News:Tag) {
    latestArticlesTaggedWithTag = ${...}
    }

    View full-size slide

  25. 2. Query in TypoScript
    latestArticlesTaggedWithTag =
    ${Search.query(site) # search underneath this site

    .nodeType('Sandstorm.News:Article') # filter by node type
    .exactMatch('tags', node) # where tag == current tag
    .limit(3) # first 3 results
    .sortDesc('publishDate') # and sort by publishing date desc
    .execute()}

    View full-size slide

  26. 3. Use in Template





    View full-size slide

  27. ${Search.query(site)

    .fulltext('Alice')
    .execute()}

    View full-size slide

  28. Node References
    together with
    Search Queries

    View full-size slide

  29. ElasticSearch Core Concepts

    View full-size slide

  30. A
    T2
    T1
    T3
    Normalized Data
    in a relational DB
    A T1
    A T2
    A T3
    Denormalized Data
    in an index

    View full-size slide

  31. GET Index/Type/Document-ID

    View full-size slide

  32. GET Index/Type/_mapping

    View full-size slide

  33. GET Index/Type/_mapping
    _all

    View full-size slide

  34. typo3cr-1426882860
    typo3cr-1426885219
    typo3cr
    Index Aliases
    allow index rebuilds!

    View full-size slide

  35. Hey, InspiringCon 2015
    Hey, InspiringCon 2015
    Tokenization
    Token Filtering
    Hey InspiringCon 2015
    hey inspiringcon 2015
    Indexing Pipeline
    InspiringCon2015
    Search Pipeline
    InspiringCon 2015
    inspiringcon 2015

    View full-size slide

  36. ${Search.query(site)

    .fulltext('Alice')
    .execute()}
    .log()
    15-03-23 07:20:50 1820 DEBUG Query Log (): {"query":{"filtered":{"query":{"bool":{"must":[{"match_all":[]},{"query_string":
    {"query":"Alice"}}]}},"filter":{"bool":{"must":[{"term":{"__parentPath":"\/sites\/neosdemotypo3org"}},{"terms":{"__workspace":["live"]}}],"should":[],"must_not":[{"term":
    {"_hidden":true}},{"range":{"_hiddenBeforeDateTime":{"gt":"now"}}},{"range":{"_hiddenAfterDateTime":{"lt":"now"}}}]}}}},"fields":["__path"],"highlight":{"fields":
    {"__fulltext*":{"fragment_size":150,"no_match_size":150,"number_of_fragments":2}}}} -- execution time: 10.998010635376 ms -- Total Results: 28
    Data/Logs/ElasticSearch.log

    View full-size slide

  37. Aggregations calculate
    statistical information
    about the current result.
    TODO Kibana Screenshot

    View full-size slide

  38. Aggregations calculate
    statistical information
    about the current result.

    View full-size slide

  39. Fine-Tuning ElasticSearch+Neos

    View full-size slide

  40. 1. ElasticSearch Schema

    View full-size slide

  41. 1. ElasticSearch Schema
    TYPO3:

    TYPO3CR:

    Search:

    defaultConfigurationPerType:

    string:

    elasticSearchMapping:

    type: string

    include_in_all: false

    boolean:

    elasticSearchMapping:

    type: boolean

    date:

    elasticSearchMapping:

    type: date

    format: 'date_time_no_millis'

    include_in_all: false

    Settings.yaml NodeTypes.yaml

    'TYPO3.Neos:Node': &node

    properties:

    '__identifier':

    search:

    elasticSearchMapping:

    type: string

    index: not_analyzed

    include_in_all: false


    defaults overrides
    indexing: '${node.identifier}'

    View full-size slide

  42. 2. ElasticSearch Indexing
    indexing: '${Indexing.buildAllPathPrefixes(node.parentPath)}'
    indexing: '${node.identifier}'

    View full-size slide

  43. 3. Fulltext Searching
    We at InspiringCon (Article)
    main (ContentCollection)
    … (Headline)
    … (Text)
    collect all
    content
    Fulltext Root

    View full-size slide

  44. 3. Fulltext Searching
    We at InspiringCon (Article)
    main (ContentCollection)
    … (Headline)
    … (Text)
    h1 h2 ... text

    View full-size slide

  45. 3. Fulltext Searching
    # predefined in Neos
    'TYPO3.Neos:Document':

    search:

    fulltext:

    isRoot: true
    'TYPO3.Neos.NodeTypes:Text':

    properties:

    'text':

    search:

    fulltextExtractor: '${Indexing.extractHtmlTags(value)}'
    'Sandstorm.News:Article':

    properties:

    'title':

    search:

    fulltextExtractor: '${Indexing.extractInto("h1", value)}'

    View full-size slide

  46. Indexing Additional Data

    View full-size slide

  47. typo3cr-1426882860
    search
    Index Aliases allow to link multiple
    indices.
    products-2976886808

    View full-size slide

  48. ElasticSearch Rivers
    can poll data from other systems.

    View full-size slide

  49. ElasticSearch too big for your project?
    Use SimpleSearch!
    composer require --prefer-source flowpack/simplesearch-
    contentrepositoryadaptor @dev

    View full-size slide

  50. Resources
    http://www.elasticsearch.org/guide/
    README of Flowpack.ElasticSearch.ContentRepositoryAdaptor

    View full-size slide