Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PoC for LLM search on Plone

PoC for LLM search on Plone

「Plone Conference 2023 / LT」 2023-10-04
–Collaboration Wanted! –
Manabu TERADA (@terapyon)

Manabu TERADA

October 04, 2023
Tweet

More Decks by Manabu TERADA

Other Decks in Technology

Transcript

  1. copyright © 2023 CMS Comunications Inc. all rights reserved.
    PoC for LLM search on Plone
    –Collaboration Wanted! –
    Manabu TERADA (@terapyon)
    「Plone Conference 2023 / LT」 2023-10-04

    View full-size slide

  2. copyright © 2023 CMS Comunications Inc. all rights reserved.
    Motivation
    ● We want Intranet Plone to have higher functionality of search.
    ● Not only words but also sentences to be searched.
    ● Not use of OpenAI, Intranet data should not got out beyond the
    boundary.

    View full-size slide

  3. copyright © 2023 CMS Comunications Inc. all rights reserved.
    Basics of LLM vector search
    ● Generating vectors from text documents.
    ● Storing it into Vector DB.
    ● Generate a vector from a search text.
    ● Comparing/Searching items by vector with similarity algorithm.

    View full-size slide

  4. copyright © 2023 CMS Comunications Inc. all rights reserved.
    PoC of LLM vector search – (1)
    ● I made PoC system for vector search without Plone
    ● Embedding model is "intfloat/multilingual-e5-large", No OpenAI is involved.
    ● Hosted at https://huggingface.co/spaces/terapyon/gh-issue-search

    View full-size slide

  5. copyright © 2023 CMS Comunications Inc. all rights reserved.
    PoC has worked! But it is not with Plone
    I want to use vector search on Plone

    View full-size slide

  6. copyright © 2023 CMS Comunications Inc. all rights reserved.
    Structure of my sample package
    ● I made a sample package
    ● Vector search for Plone site

    View full-size slide

  7. copyright © 2023 CMS Comunications Inc. all rights reserved.
    Technical Feature
    ● A new Index class reference from ZCTextIndex
    ● Adding the Index on portal_catalog for auto indexing.
    ● Embedding model is "intfloat/multilingual-e5-large", No OpenAI is involved.
    ● As a consequence, a new keyword args are added on portal_catalog for search

    View full-size slide

  8. copyright © 2023 CMS Comunications Inc. all rights reserved.
    Added index and interface for Plone
    Added Index on
    ZCatalog
    Added interface on
    portal_catalog

    View full-size slide

  9. copyright © 2023 CMS Comunications Inc. all rights reserved.
    Supporters and Contributors WANTED!
    ● My package is PoC (sample)
    ○ https://github.com/cmscom/c2.search.llm
    ● Repo and the name space might be changed.
    ○ Plone name space or Collective name space repo?
    ● Splitting the function for the package
    ○ Base index
    ○ Setting up Plone

    View full-size slide

  10. copyright © 2023 CMS Comunications Inc. all rights reserved.
    Problem?
    ● The sample package requires a GPU
    ● Tried to run it on my MacBook Air (M1), but it did not work
    ● How to evaluate?

    View full-size slide

  11. copyright © 2023 CMS Comunications Inc. all rights reserved.
    Sprint
    ● I will join this conference sprint,
    ○ I will be on Saturday, till early evening.
    ● Please JOIN the sprint!!

    View full-size slide

  12. copyright © 2023 CMS Comunications Inc. all rights reserved.
    Thank you!
    –Collaboration Wanted–
    Manabu TERADA (@terapyon)

    View full-size slide

  13. copyright © 2023 CMS Comunications Inc. all rights reserved.
    PyCon APAC 2023
    ● https://2023-apac.pycon.jp/
    ● Tokyo, Japan
    ● Oct 27th and 28th.
    ● (50% of the talks are in English)
    ● sprint: 29th.
    ● Tickets now for sale.
    ○ https://pretix.eu/pyconjp/2023-apac/

    View full-size slide