Slide 1

Slide 1 text

copyright © 2023 CMS Comunications Inc. all rights reserved. PoC for LLM search on Plone –Collaboration Wanted! – Manabu TERADA (@terapyon) 「Plone Conference 2023 / LT」 2023-10-04

Slide 2

Slide 2 text

copyright © 2023 CMS Comunications Inc. all rights reserved. Motivation ● We want Intranet Plone to have higher functionality of search. ● Not only words but also sentences to be searched. ● Not use of OpenAI, Intranet data should not got out beyond the boundary.

Slide 3

Slide 3 text

copyright © 2023 CMS Comunications Inc. all rights reserved. Basics of LLM vector search ● Generating vectors from text documents. ● Storing it into Vector DB. ● Generate a vector from a search text. ● Comparing/Searching items by vector with similarity algorithm.

Slide 4

Slide 4 text

copyright © 2023 CMS Comunications Inc. all rights reserved. PoC of LLM vector search – (1) ● I made PoC system for vector search without Plone ● Embedding model is "intfloat/multilingual-e5-large", No OpenAI is involved. ● Hosted at https://huggingface.co/spaces/terapyon/gh-issue-search

Slide 5

Slide 5 text

copyright © 2023 CMS Comunications Inc. all rights reserved. PoC has worked! But it is not with Plone I want to use vector search on Plone

Slide 6

Slide 6 text

copyright © 2023 CMS Comunications Inc. all rights reserved. Structure of my sample package ● I made a sample package ● Vector search for Plone site

Slide 7

Slide 7 text

copyright © 2023 CMS Comunications Inc. all rights reserved. Technical Feature ● A new Index class reference from ZCTextIndex ● Adding the Index on portal_catalog for auto indexing. ● Embedding model is "intfloat/multilingual-e5-large", No OpenAI is involved. ● As a consequence, a new keyword args are added on portal_catalog for search

Slide 8

Slide 8 text

copyright © 2023 CMS Comunications Inc. all rights reserved. Added index and interface for Plone Added Index on ZCatalog Added interface on portal_catalog

Slide 9

Slide 9 text

copyright © 2023 CMS Comunications Inc. all rights reserved. Supporters and Contributors WANTED! ● My package is PoC (sample) ○ https://github.com/cmscom/c2.search.llm ● Repo and the name space might be changed. ○ Plone name space or Collective name space repo? ● Splitting the function for the package ○ Base index ○ Setting up Plone

Slide 10

Slide 10 text

copyright © 2023 CMS Comunications Inc. all rights reserved. Problem? ● The sample package requires a GPU ● Tried to run it on my MacBook Air (M1), but it did not work ● How to evaluate?

Slide 11

Slide 11 text

copyright © 2023 CMS Comunications Inc. all rights reserved. Sprint ● I will join this conference sprint, ○ I will be on Saturday, till early evening. ● Please JOIN the sprint!!

Slide 12

Slide 12 text

copyright © 2023 CMS Comunications Inc. all rights reserved. Thank you! –Collaboration Wanted– Manabu TERADA (@terapyon)

Slide 13

Slide 13 text

copyright © 2023 CMS Comunications Inc. all rights reserved. PyCon APAC 2023 ● https://2023-apac.pycon.jp/ ● Tokyo, Japan ● Oct 27th and 28th. ● (50% of the talks are in English) ● sprint: 29th. ● Tickets now for sale. ○ https://pretix.eu/pyconjp/2023-apac/