Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PoC for LLM search on Plone

PoC for LLM search on Plone

「Plone Conference 2023 / LT」 2023-10-04
–Collaboration Wanted! –
Manabu TERADA (@terapyon)

Manabu TERADA

October 04, 2023
Tweet

More Decks by Manabu TERADA

Other Decks in Technology

Transcript

  1. copyright © 2023 CMS Comunications Inc. all rights reserved. PoC

    for LLM search on Plone –Collaboration Wanted! – Manabu TERADA (@terapyon) 「Plone Conference 2023 / LT」 2023-10-04
  2. copyright © 2023 CMS Comunications Inc. all rights reserved. Motivation

    • We want Intranet Plone to have higher functionality of search. • Not only words but also sentences to be searched. • Not use of OpenAI, Intranet data should not got out beyond the boundary.
  3. copyright © 2023 CMS Comunications Inc. all rights reserved. Basics

    of LLM vector search • Generating vectors from text documents. • Storing it into Vector DB. • Generate a vector from a search text. • Comparing/Searching items by vector with similarity algorithm.
  4. copyright © 2023 CMS Comunications Inc. all rights reserved. PoC

    of LLM vector search – (1) • I made PoC system for vector search without Plone • Embedding model is "intfloat/multilingual-e5-large", No OpenAI is involved. • Hosted at https://huggingface.co/spaces/terapyon/gh-issue-search
  5. copyright © 2023 CMS Comunications Inc. all rights reserved. PoC

    has worked! But it is not with Plone I want to use vector search on Plone
  6. copyright © 2023 CMS Comunications Inc. all rights reserved. Structure

    of my sample package • I made a sample package • Vector search for Plone site
  7. copyright © 2023 CMS Comunications Inc. all rights reserved. Technical

    Feature • A new Index class reference from ZCTextIndex • Adding the Index on portal_catalog for auto indexing. • Embedding model is "intfloat/multilingual-e5-large", No OpenAI is involved. • As a consequence, a new keyword args are added on portal_catalog for search
  8. copyright © 2023 CMS Comunications Inc. all rights reserved. Added

    index and interface for Plone Added Index on ZCatalog Added interface on portal_catalog
  9. copyright © 2023 CMS Comunications Inc. all rights reserved. Supporters

    and Contributors WANTED! • My package is PoC (sample) ◦ https://github.com/cmscom/c2.search.llm • Repo and the name space might be changed. ◦ Plone name space or Collective name space repo? • Splitting the function for the package ◦ Base index ◦ Setting up Plone
  10. copyright © 2023 CMS Comunications Inc. all rights reserved. Problem?

    • The sample package requires a GPU • Tried to run it on my MacBook Air (M1), but it did not work • How to evaluate?
  11. copyright © 2023 CMS Comunications Inc. all rights reserved. Sprint

    • I will join this conference sprint, ◦ I will be on Saturday, till early evening. • Please JOIN the sprint!!
  12. copyright © 2023 CMS Comunications Inc. all rights reserved. Thank

    you! –Collaboration Wanted– Manabu TERADA (@terapyon)
  13. copyright © 2023 CMS Comunications Inc. all rights reserved. PyCon

    APAC 2023 • https://2023-apac.pycon.jp/ • Tokyo, Japan • Oct 27th and 28th. • (50% of the talks are in English) • sprint: 29th. • Tickets now for sale. ◦ https://pretix.eu/pyconjp/2023-apac/