$30 off During Our Annual Pro Sale. View Details »

LLM App with Momento

LLM App with Momento

2023.8.3 MoCon 2023 @ T-Mobile Park (Seattle)

吉田真吾

August 03, 2023
Tweet

More Decks by 吉田真吾

Other Decks in Programming

Transcript

  1. Momento Confidential
    LLM App with Momento
    Shingo Yoshida
    AWS Serverless Hero from Japan

    View Slide

  2. Momento Confidential
    ABOUT ME:
    $50!$:%"4
    $&0!4FDUJPO

    View Slide

  3. Momento Confidential
    .PNFOUP JTB1-"5*/6. TQPOTPS⽏

    View Slide

  4. Momento Confidential
    CYDAS PEOPLE = Talent Management SaaS on AWS
    ● Emp Profile, 1on1, MBO,
    Performance, HR FAQ, etc…
    ● Big Issue
    1. Company with tens of thousands of
    employees
    500 inquiries / 2 personnel / month
    →90% are listed in FAQ
    → consulted about closing the
    inquiry function

    View Slide

  5. Momento Confidential
    “PEOPLE Copilot Chat” = RAG[Grounding] App
    ● RAG(Retrieval Augmented Generation) App
    ● HR FAQ & Chat history with HR → Embedding
    ● User question → retrieve and answer using
    ChatGPT(API)
    ● 6 Days to make Demo for HR Conference
    ● 2 Months to rebuild for PRODUCTION

    View Slide

  6. Momento Confidential
    Architecture

    View Slide

  7. Momento Confidential
    What is LangChain? 🦜🔗
    ● LangChain is a framework for application development using LLM.
    ○ There are two implementations: Python and JavaScript/TypeScript.
    ○ Python version is more active development.
    ● LangChain is available as OSS and is updated daily.
    ● My recommendation is to use TypeScript for creating a demo, but if you
    want to build a production version or take full advantage of
    LangChain’s functionality in the long term, I recommend the Python
    version.

    View Slide

  8. Momento Confidential
    LangChain > Usage
    ● Sentence summarization
    ● Chatbot
    ● Q&A for documents
    ● chat2query
    ● etc…
    LangChain > Module
    ● Models
    ● Prompts
    ● Indexes
    ● Chains
    ● Memory
    ● Agents

    View Slide

  9. Momento Confidential
    DEMO > No memory(Momento)

    View Slide

  10. Momento Confidential
    DEMO > No memory(Momento)

    View Slide

  11. Momento Confidential
    DEMO

    View Slide

  12. Momento Confidential
    DEMO

    View Slide

  13. Momento Confidential
    Why we use Momento?
    ● Perfect Serverless
    ● Easy to integrate
    ○ Needs few lines of code.
    ● Super Fast and Reliable
    ○ Always respond within few msec.
    ● Higher Security
    ○ We need to care about many compliance because CYDAS is hosting many personal data in it.

    View Slide

  14. Momento Confidential
    For Production
    It’s easy to make something cool with LLMs, but very hard to make
    something production-ready with them. - Chip Huyen
    ● Security and Compliance:
    ○ OpenAI → Azure OpenAI
    ○ Pinecone → Azure Cognitive Search
    ● Safety
    ○ Accuracy, Hallucination, Fairness

    View Slide

  15. Momento Confidential
    Lessons we learned for Production
    1. RAG app is easy to implement -> Can traditional search UI (without LLMs)
    solve this problem?
    2. Workflow to take advantage of LLM capabilities is important
    1. Combine deterministic programming with non-deterministic LLMs
    2. Chain multiple tasks together 🦜🔗.
    3. 🦜🔗 is a treasure trove of ideas + implementations
    1. ReAct → langchain.agents
    2. HyDE → LLM fantasizes about the answer to a question and searches for knowledge similar to that
    answer from langchain.chains import HypotheticalDocumentEmbedder
    4. Enterprise search is usually beer for everything than Vector Similarity
    Search only
    5. LLMOps≠MLOps
    1. Hard to notice changes in input / output
    2. Limited ability to notice = replace API or model, adjust prompts (+ version control)
    3. Response time should be captured e.g. LangSmith

    View Slide

  16. Momento Confidential
    OWASP Top10 for LLM
    1. Prompt Injection
    2. Insecure Output Handling
    3. Training Data Poisoning
    4. Model Denial of Service
    5. Supply Chain Vulnerabilities
    6. Sensitive Information Disclosure
    7. Insecure Plugin Design
    8. Excessive Agency
    9. Overreliance
    10. Model Theft
    OWASP Top 10 for Large Language Model Applications
    https://owasp.org/www-project-top-10-for-large-language-model-applications/

    View Slide

  17. Momento Confidential
    OWASP Top10 for LLM
    1. Prompt Injection
    2. Insecure Output Handling
    3. Training Data Poisoning
    4. Model Denial of Service
    5. Supply Chain Vulnerabilities
    6. Sensitive Information Disclosure
    7. Insecure Plugin Design
    8. Excessive Agency
    9. Overreliance
    10. Model Theft
    OWASP Top 10 for Large Language Model Applications
    https://owasp.org/www-project-top-10-for-large-language-model-applications/

    View Slide

  18. Momento Confidential
    GSPNMBOHDIBJOFYQFSJNFOUBM UPMBOHDIBJO@FYQFSJNFOUBM
    .PWFFYQFSJNFOUBMUPFYQFSJNFOUBM
    QBDLBHF
    &BTZUPQVTIUIF13T
    "TSFTVMU $7&TXJMMSFNPWFGSPN
    DPSFQBDLBHF MBOHDIBJO

    *U`TKVTUCFHJOOJOH NPSFIJHIMFWFM
    UIPVHIUTFY.PEVMBSJUZˠ
    IUUQTHJUIVCDPNMBOHDIBJO
    BJMBOHDIBJOEJTDVTTJPOT

    View Slide

  19. Momento Confidential
    🦜🔗 .PWFFYQFSJNFOUBMUPFYQFSJNFOUBMQBDLBHF
    ● Big News
    ○ All features including CVE (vulnerabilities) are now in a separate package
    (Experimental)
    ○ Streamlining of the 🦜🔗 core
    ○ Mentioned plans for a package called Community Chain
    ● Means…
    ○ Cannot be used in production → Can be used
    ○ Unlimited expansion over the past year or so has meant that the Lambda Layer will
    one day no longer be ridden → Constant traffic control will be possible.
    ○ Implementation of papers and ambitious ideas will be more PR-friendly
    ● for AWS Lambda
    ○ Current size: 130MB after expansion including dependent libraries
    ○ Spin-up takes approximately 5 seconds → Multiple measures are needed, such as
    Lazy listeners and retry header checks when using from Slack

    View Slide

  20. Momento Confidential
    Extra: In Corp / Momento is Everywhere
    ChatGPT Clone for Cooperate Slackbot

    View Slide

  21. Momento Confidential
    Amazon Kendra + 🦜🔗
    高精度な生成系 AI アプリケーションを Amazon Kendra、LangChain、大規模言語モデルを使って作る
    https://aws.amazon.com/jp/blogs/news/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-
    large-language-models/
    Salesforce, Slack, box, Mail…

    View Slide

  22. Momento Confidential
    Questions?

    View Slide