Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Accelerate your GenAI go to market with serverl...

Aletheia
May 26, 2024
47

Accelerate your GenAI go to market with serverless on AWS

In our fast-paced world of Generative AI everywhere, companies struggle to find concrete applications that provide value to their customers while driving margins.
Serverless can offer a cost-effective approach to GenAI, reducing TCOs through a pay-as-you-go model. Reproducible automated provisioning and deployments, as well as the cloud's composable nature, are key to the success of any GenAI strategy.

Aletheia

May 26, 2024
Tweet

Transcript

  1. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Accelerate your GenAI go to market with serverless on AWS Luca Bianchi (He/Him) Chief Technology Officer Neosperience
  2. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Who am I? Chief Technology Officer @ Neosperience and Neosperience Health, proud AWS Serverless Hero, passionate about software architectures, serverless, and machine learning. Serverless Italy and [Gen]AI Milano Meetup co- founder. ServerlessDays Milano co-organizer. github.com/aletheia https://it.linkedin.com/in/lucabianchipavia https://speakerdeck.com/aletheia bianchiluca.com @bianchiluca
  3. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. 1. A generative AI Use Case 2. Generative AI on serverless 3. Infrastructure-as-Code for GenAI 4. Putting all together
  4. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. GenAI use case: Retrieval Augmented Generation
  5. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What is Retrieval Augmented Generation (RAG)? RAG is ingeniously designed to amplify the capabilities of generative models, such as those used for text generation, by uniquely integrating them with a retrieval mechanism. • Retrieval Mechanism: searching a large database of documents to find relevant information. • Generative Model: relevant information is fed into an LLM to generate a response. • Advantages: allows to generate relevant responses from up-to-date or specialized information. • Applications: chatbots, question-answering systems, content creation tools, and more. • Continuous Learning: periodic updated, allowing the system to stay current with new information.
  6. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. LOAD • Acquire source documentation • web crawling, • data lake extraction, • connecting to proprietary databases, etc. • Convert documentation from source formats (pdf, doc, html, etc.) to plain text / lightweight structured format (simple html, markdown) • Create text chunks
  7. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SPLIT • Divide text into syntactically correlated parts (phrases, paragraphs, sections) • Aim to one chunk —> one topic • Split sentences using punctuation marks, newlines or bullets • Engineering decisions: • maximum chunk size • overlapping • Should be max(len(chunk)) * k + len(prompt) < len(context)
  8. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. EMBED • Text embeddings are a representation of text in the form of numeric vectors • Map documents of different lengths into a fixed-length, smaller dimension vector- space (512, 768, 1536 dimensions) • Numeric distance between similar world is small compared to other words. • Use neural network to learn words distance • Embedding is a projection from the language space to the vector space. • Many model choices (i.e. Amazon Titan Embeddings) • Use cases: semantic search, classification, clustering, outlier detection.
  9. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. STORE • A database designed to: • Store high-dimensional vector data • Efficiently searchiusing a similarity metric (knn) • Scalable, indexable for fast retrieval • Support for mixed queries (with metadata) • Traditional databases offer “vector extensions” • Available options include Amazon OpenSearch (serverless), Amazon Aurora, Pinecone (serverless), MongoDB
  10. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. SEARCH • Knn • Non-parametric supervised learning method that identifies the k “nearest” data points in a dataset to a given input • “nearest” is interpreted by an appropriate metrics that defines a distance function between data points • k is a user-defined parameter • Tackle computational complexity with an efficient implementation • Include support for pre/post filtering
  11. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Generative AI on serverless?
  12. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Why Serverless for LLM? • A collection of small microservices natively acts as a glue within different components of the architecture • Storage for Documents • LLM for embeddings and response generation • Vector Store • Interaction with user • Pay as you go model for scalability and a cost-effective approach • Interaction well suited for Event Driven Architectures (EDA) natively supported by serverless functions
  13. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. A managed Knowledge Base • Fully managed support for end-to- end RAG workflow • Securely connect FMs and agents to data sources • Retrieve relevant data and augment prompts through API or GUI • Retrieved Information from Knowledge Bases for Amazon Bedrock is provided with citations
  14. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. RAG Architecture W I T H B E D R O C K K N O W L E D G E B A S E
  15. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How to create a RAG U S I N G A W S C O N S O L E
  16. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How to create a RAG U S I N G A W S C O N S O L E
  17. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How to create a RAG U S I N G A W S C O N S O L E
  18. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. In the past were bigger companies that outcompete smaller ones. Now are faster companies to outcompete slower ones. Marc Benioff CEO @ Salesforce
  19. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Automation is key to fast go to market, Infrastructure-as-Code allows your infrastructure to move at the speed of cloud
  20. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon Cloud Development Kit • A programming language friendly framework to describe cloud infrastructure (TypeScript, JavaScript, Python, Java, C#). • High level of abstractions available with hierarchical constructs (L3, L2). • Core Concepts • Constructs: Basic building blocks of CDK applications. • Stacks: Units of deployment, corresponding to AWS CloudFormation stacks. • Apps: Collection of one or more stacks. • Leverages AWS CloudFormation under the hood for deployment. • Strong typing for cloud resources with IDE support.
  21. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Getting Started with CDK • Install the AWS CDK Toolkit. • Bootstrap your AWS Account for CDK. • Initialize a new CDK project Define and configure resources in your code. • Deploy your stack.
  22. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Generative AI CDK Constructs • An open-source extension to AWS CDK • Provides well-architected multi-service patterns to create resources for GenAI • L3 constructs to support use cases: • Q&A on knowledge base • Summarization • Data ingestion • Model deployment • L2 construct to support resource provisioning and configuration. GenerativeAI CDK Constructs https://github.com/awslabs/generative-ai-cdk- constructs/
  23. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. RAG with CDK L2 Constructs • S3 Bucket to store documents • Pinecone as vector database • Bedrock Knowledge Base configuration • chunking strategy • maximum tokens • overlap percentage
  24. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Data Ingestion Pipeline Construct • Ingests documents and then converts them into text formats. • PDF files and images uploaded to an input Amazon Simple Storage Service (S3) bucket. • AWS AppSync mutation to start ingestion process, and async notification through subscription.
  25. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Data Ingestion Pipeline Construct • Input validation • Transformation: • PDFs • Text document • labels extraction from images • Chunk splitting • Embedding creation with Amazon Bedrock
  26. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Data Ingestion Pipeline Construct • Question / answering workflow with Amazon Bedrock. • Provisioned Amazon OpenSearch cluster. • Anthropic's Claude-3 Sonnet (visual question answering).
  27. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Data Ingestion Pipeline Construct
  28. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Document Summarization Construct • Generates summaries for multiple PDF documents or images integrated . • AppSync, Step Functions, Lambda, EventBridge, and Step Functions. • Input request validation and processing.
  29. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Data Ingestion Pipeline Construct
  30. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Putting all together
  31. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Document Explorer Example • Leverage various AWS services to implement: • A document ingestion pipeline • A document summarization Pipeline • Q&A on documents • Validate and transform documents along the way. • Handle different file types. • APIs with AWS AppSync APIs and logins with Amazon Cognito. • Pipeline Workflow management with AWS StepFunction.
  32. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Document Explorer Solution Document Explorer https://github.com/aws-samples/generative-ai-cdk- constructs-samples/tree/main/samples/ document_explorer
  33. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS Project LakeChain • A framework based on the AWS Cloud Development Kit (CDK), allowing to express and deploy scalable document processing pipelines on AWS using infrastructure-as-code. • Composable • Scalable • Cost efficient • Provides implemented use-cases Project LakeChain https://awslabs.github.io/project-lakechain/
  34. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Project LakeChain Use Cases • Document Index • Podcast Generator • RAG Pipeline • Search Engine • Video Chaptering • Video Subtitle
  35. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Thank you! Please complete the session survey in the mobile app Luca Bianchi @bianchiluca www.bianchiluca.com