Slide 1

Slide 1 text

Momento Confidential LLM App with Momento Shingo Yoshida AWS Serverless Hero from Japan

Slide 2

Slide 2 text

Momento Confidential ABOUT ME: $50!$:%"4 $&0!4FDUJPO

Slide 3

Slide 3 text

Momento Confidential .PNFOUP JTB1-"5*/6. TQPOTPS⽏

Slide 4

Slide 4 text

Momento Confidential CYDAS PEOPLE = Talent Management SaaS on AWS ● Emp Profile, 1on1, MBO, Performance, HR FAQ, etc… ● Big Issue 1. Company with tens of thousands of employees 500 inquiries / 2 personnel / month →90% are listed in FAQ → consulted about closing the inquiry function

Slide 5

Slide 5 text

Momento Confidential “PEOPLE Copilot Chat” = RAG[Grounding] App ● RAG(Retrieval Augmented Generation) App ● HR FAQ & Chat history with HR → Embedding ● User question → retrieve and answer using ChatGPT(API) ● 6 Days to make Demo for HR Conference ● 2 Months to rebuild for PRODUCTION

Slide 6

Slide 6 text

Momento Confidential Architecture

Slide 7

Slide 7 text

Momento Confidential What is LangChain? 🦜🔗 ● LangChain is a framework for application development using LLM. ○ There are two implementations: Python and JavaScript/TypeScript. ○ Python version is more active development. ● LangChain is available as OSS and is updated daily. ● My recommendation is to use TypeScript for creating a demo, but if you want to build a production version or take full advantage of LangChain’s functionality in the long term, I recommend the Python version.

Slide 8

Slide 8 text

Momento Confidential LangChain > Usage ● Sentence summarization ● Chatbot ● Q&A for documents ● chat2query ● etc… LangChain > Module ● Models ● Prompts ● Indexes ● Chains ● Memory ● Agents

Slide 9

Slide 9 text

Momento Confidential DEMO > No memory(Momento)

Slide 10

Slide 10 text

Momento Confidential DEMO > No memory(Momento)

Slide 11

Slide 11 text

Momento Confidential DEMO

Slide 12

Slide 12 text

Momento Confidential DEMO

Slide 13

Slide 13 text

Momento Confidential Why we use Momento? ● Perfect Serverless ● Easy to integrate ○ Needs few lines of code. ● Super Fast and Reliable ○ Always respond within few msec. ● Higher Security ○ We need to care about many compliance because CYDAS is hosting many personal data in it.

Slide 14

Slide 14 text

Momento Confidential For Production It’s easy to make something cool with LLMs, but very hard to make something production-ready with them. - Chip Huyen ● Security and Compliance: ○ OpenAI → Azure OpenAI ○ Pinecone → Azure Cognitive Search ● Safety ○ Accuracy, Hallucination, Fairness

Slide 15

Slide 15 text

Momento Confidential Lessons we learned for Production 1. RAG app is easy to implement -> Can traditional search UI (without LLMs) solve this problem? 2. Workflow to take advantage of LLM capabilities is important 1. Combine deterministic programming with non-deterministic LLMs 2. Chain multiple tasks together 🦜🔗. 3. 🦜🔗 is a treasure trove of ideas + implementations 1. ReAct → langchain.agents 2. HyDE → LLM fantasizes about the answer to a question and searches for knowledge similar to that answer from langchain.chains import HypotheticalDocumentEmbedder 4. Enterprise search is usually beer for everything than Vector Similarity Search only 5. LLMOps≠MLOps 1. Hard to notice changes in input / output 2. Limited ability to notice = replace API or model, adjust prompts (+ version control) 3. Response time should be captured e.g. LangSmith

Slide 16

Slide 16 text

Momento Confidential OWASP Top10 for LLM 1. Prompt Injection 2. Insecure Output Handling 3. Training Data Poisoning 4. Model Denial of Service 5. Supply Chain Vulnerabilities 6. Sensitive Information Disclosure 7. Insecure Plugin Design 8. Excessive Agency 9. Overreliance 10. Model Theft OWASP Top 10 for Large Language Model Applications https://owasp.org/www-project-top-10-for-large-language-model-applications/

Slide 17

Slide 17 text

Momento Confidential OWASP Top10 for LLM 1. Prompt Injection 2. Insecure Output Handling 3. Training Data Poisoning 4. Model Denial of Service 5. Supply Chain Vulnerabilities 6. Sensitive Information Disclosure 7. Insecure Plugin Design 8. Excessive Agency 9. Overreliance 10. Model Theft OWASP Top 10 for Large Language Model Applications https://owasp.org/www-project-top-10-for-large-language-model-applications/

Slide 18

Slide 18 text

Momento Confidential GSPNMBOHDIBJOFYQFSJNFOUBM UPMBOHDIBJO@FYQFSJNFOUBM .PWFFYQFSJNFOUBMUPFYQFSJNFOUBM QBDLBHF &BTZUPQVTIUIF13T "TSFTVMU $7&TXJMMSFNPWFGSPN DPSFQBDLBHF MBOHDIBJO *U`TKVTUCFHJOOJOH NPSFIJHIMFWFM UIPVHIUTFY.PEVMBSJUZˠ IUUQTHJUIVCDPNMBOHDIBJO BJMBOHDIBJOEJTDVTTJPOT

Slide 19

Slide 19 text

Momento Confidential 🦜🔗 .PWFFYQFSJNFOUBMUPFYQFSJNFOUBMQBDLBHF ● Big News ○ All features including CVE (vulnerabilities) are now in a separate package (Experimental) ○ Streamlining of the 🦜🔗 core ○ Mentioned plans for a package called Community Chain ● Means… ○ Cannot be used in production → Can be used ○ Unlimited expansion over the past year or so has meant that the Lambda Layer will one day no longer be ridden → Constant traffic control will be possible. ○ Implementation of papers and ambitious ideas will be more PR-friendly ● for AWS Lambda ○ Current size: 130MB after expansion including dependent libraries ○ Spin-up takes approximately 5 seconds → Multiple measures are needed, such as Lazy listeners and retry header checks when using from Slack

Slide 20

Slide 20 text

Momento Confidential Extra: In Corp / Momento is Everywhere ChatGPT Clone for Cooperate Slackbot

Slide 21

Slide 21 text

Momento Confidential Amazon Kendra + 🦜🔗 高精度な生成系 AI アプリケーションを Amazon Kendra、LangChain、大規模言語モデルを使って作る https://aws.amazon.com/jp/blogs/news/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and- large-language-models/ Salesforce, Slack, box, Mail…

Slide 22

Slide 22 text

Momento Confidential Questions?