Accelerate your GenAI go to market with serverless on AWS

© 2024, Amazon Web Services, Inc. or its affiliates. All
rights reserved. Accelerate your GenAI go to market with serverless on AWS Luca Bianchi (He/Him) Chief Technology Officer Neosperience

rights reserved. Who am I? Chief Technology Officer @ Neosperience and Neosperience Health, proud AWS Serverless Hero, passionate about software architectures, serverless, and machine learning. Serverless Italy and [Gen]AI Milano Meetup co- founder. ServerlessDays Milano co-organizer. github.com/aletheia https://it.linkedin.com/in/lucabianchipavia https://speakerdeck.com/aletheia bianchiluca.com @bianchiluca

rights reserved.

rights reserved. 1. A generative AI Use Case 2. Generative AI on serverless 3. Infrastructure-as-Code for GenAI 4. Putting all together

rights reserved. GenAI use case: Retrieval Augmented Generation

rights reserved. What is Retrieval Augmented Generation (RAG)? RAG is ingeniously designed to amplify the capabilities of generative models, such as those used for text generation, by uniquely integrating them with a retrieval mechanism. • Retrieval Mechanism: searching a large database of documents to find relevant information. • Generative Model: relevant information is fed into an LLM to generate a response. • Advantages: allows to generate relevant responses from up-to-date or specialized information. • Applications: chatbots, question-answering systems, content creation tools, and more. • Continuous Learning: periodic updated, allowing the system to stay current with new information.

rights reserved. What is RAG?

rights reserved. LOAD • Acquire source documentation • web crawling, • data lake extraction, • connecting to proprietary databases, etc. • Convert documentation from source formats (pdf, doc, html, etc.) to plain text / lightweight structured format (simple html, markdown) • Create text chunks

rights reserved. SPLIT • Divide text into syntactically correlated parts (phrases, paragraphs, sections) • Aim to one chunk —> one topic • Split sentences using punctuation marks, newlines or bullets • Engineering decisions: • maximum chunk size • overlapping • Should be max(len(chunk)) * k + len(prompt) < len(context)

rights reserved. EMBED • Text embeddings are a representation of text in the form of numeric vectors • Map documents of different lengths into a fixed-length, smaller dimension vector- space (512, 768, 1536 dimensions) • Numeric distance between similar world is small compared to other words. • Use neural network to learn words distance • Embedding is a projection from the language space to the vector space. • Many model choices (i.e. Amazon Titan Embeddings) • Use cases: semantic search, classification, clustering, outlier detection.

rights reserved. STORE • A database designed to: • Store high-dimensional vector data • Efficiently searchiusing a similarity metric (knn) • Scalable, indexable for fast retrieval • Support for mixed queries (with metadata) • Traditional databases offer “vector extensions” • Available options include Amazon OpenSearch (serverless), Amazon Aurora, Pinecone (serverless), MongoDB

rights reserved. SEARCH • Knn • Non-parametric supervised learning method that identifies the k “nearest” data points in a dataset to a given input • “nearest” is interpreted by an appropriate metrics that defines a distance function between data points • k is a user-defined parameter • Tackle computational complexity with an efficient implementation • Include support for pre/post filtering

rights reserved. Generative AI on serverless?

rights reserved. Why Serverless for LLM? • A collection of small microservices natively acts as a glue within different components of the architecture • Storage for Documents • LLM for embeddings and response generation • Vector Store • Interaction with user • Pay as you go model for scalability and a cost-effective approach • Interaction well suited for Event Driven Architectures (EDA) natively supported by serverless functions

rights reserved. A managed Knowledge Base • Fully managed support for end-to- end RAG workflow • Securely connect FMs and agents to data sources • Retrieve relevant data and augment prompts through API or GUI • Retrieved Information from Knowledge Bases for Amazon Bedrock is provided with citations

rights reserved. RAG Architecture W I T H B E D R O C K K N O W L E D G E B A S E

rights reserved. How to create a RAG U S I N G A W S C O N S O L E

rights reserved. What about IaC?

rights reserved. In the past were bigger companies that outcompete smaller ones. Now are faster companies to outcompete slower ones. Marc Benioff CEO @ Salesforce

rights reserved. Automation is key to fast go to market, Infrastructure-as-Code allows your infrastructure to move at the speed of cloud

rights reserved. Amazon Cloud Development Kit • A programming language friendly framework to describe cloud infrastructure (TypeScript, JavaScript, Python, Java, C#). • High level of abstractions available with hierarchical constructs (L3, L2). • Core Concepts • Constructs: Basic building blocks of CDK applications. • Stacks: Units of deployment, corresponding to AWS CloudFormation stacks. • Apps: Collection of one or more stacks. • Leverages AWS CloudFormation under the hood for deployment. • Strong typing for cloud resources with IDE support.

rights reserved. Getting Started with CDK • Install the AWS CDK Toolkit. • Bootstrap your AWS Account for CDK. • Initialize a new CDK project Define and configure resources in your code. • Deploy your stack.

rights reserved. Generative AI CDK Constructs • An open-source extension to AWS CDK • Provides well-architected multi-service patterns to create resources for GenAI • L3 constructs to support use cases: • Q&A on knowledge base • Summarization • Data ingestion • Model deployment • L2 construct to support resource provisioning and configuration. GenerativeAI CDK Constructs https://github.com/awslabs/generative-ai-cdk- constructs/

rights reserved. RAG with CDK L2 Constructs • S3 Bucket to store documents • Pinecone as vector database • Bedrock Knowledge Base configuration • chunking strategy • maximum tokens • overlap percentage

rights reserved. Data Ingestion Pipeline Construct • Ingests documents and then converts them into text formats. • PDF files and images uploaded to an input Amazon Simple Storage Service (S3) bucket. • AWS AppSync mutation to start ingestion process, and async notification through subscription.

rights reserved. Data Ingestion Pipeline Construct • Input validation • Transformation: • PDFs • Text document • labels extraction from images • Chunk splitting • Embedding creation with Amazon Bedrock

rights reserved. Data Ingestion Pipeline Construct • Question / answering workflow with Amazon Bedrock. • Provisioned Amazon OpenSearch cluster. • Anthropic's Claude-3 Sonnet (visual question answering).

rights reserved. Data Ingestion Pipeline Construct

rights reserved. Document Summarization Construct • Generates summaries for multiple PDF documents or images integrated . • AppSync, Step Functions, Lambda, EventBridge, and Step Functions. • Input request validation and processing.

rights reserved. Data Ingestion Pipeline Construct

rights reserved. Putting all together

rights reserved. Document Explorer Example • Leverage various AWS services to implement: • A document ingestion pipeline • A document summarization Pipeline • Q&A on documents • Validate and transform documents along the way. • Handle different file types. • APIs with AWS AppSync APIs and logins with Amazon Cognito. • Pipeline Workflow management with AWS StepFunction.

rights reserved. Document Explorer Solution Document Explorer https://github.com/aws-samples/generative-ai-cdk- constructs-samples/tree/main/samples/ document_explorer

rights reserved. One more thing…

rights reserved. AWS Project LakeChain • A framework based on the AWS Cloud Development Kit (CDK), allowing to express and deploy scalable document processing pipelines on AWS using infrastructure-as-code. • Composable • Scalable • Cost efficient • Provides implemented use-cases Project LakeChain https://awslabs.github.io/project-lakechain/

rights reserved. Project LakeChain Use Cases • Document Index • Podcast Generator • RAG Pipeline • Search Engine • Video Chaptering • Video Subtitle

rights reserved. Thank you! Please complete the session survey in the mobile app Luca Bianchi @bianchiluca www.bianchiluca.com

Accelerate your GenAI go to market with serverl...

Accelerate your GenAI go to market with serverless on AWS

More Decks by Aletheia

Featured

Transcript