Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI Prototyping to Production with Promptflow

AI Prototyping to Production with Promptflow

This is the deck I used during my session at Epic AI Dev conference in 2024.

Daron Yondem

January 30, 2024

More Decks by Daron Yondem

Other Decks in Programming


  1. AI Prototyping to Production with Promptflow Daron Yöndem Azure Application

    Innovation Tech Lead Microsoft http://daron.me @daronyondem
  2. 87% of organizations believe AI will give them a competitive

    edge Source: MIT Sloan Management Review
  3. Getting Started The state of the art is evolving so

    quickly, it makes it difficult to decide what to use. Along with that, guidance and documentation is hard to find. Development Applications often require multiple cutting-edge products and frameworks which requires specialized expertise and new tools to stitch these components together. Context Large Language Model doesn't know about your data Evaluation It is hard to figure out which model to use and how to optimize for their use case. Operationalization Concerns around privacy, security, and grounding. Developers lack the experience and tools to evaluate, improve and validate the solutions for their Proof of Concepts, and to scale and operate in production. What slows down GenAI adoption?
  4. People Introducing LLMOps How to bring LLMs apps to production

    Process Platform Bring together people, process, and platform to automate LLM-infused software delivery & provide continuous value to our users.
  5. The paradigm shift from MLOps to LLMOps Traditional MLOps LLMOps

    Target audiences Assets to share Metrics/evaluations ML models ML Engineers Data Scientists ML Engineers App developers Model, data, environments, features LLM, agents, plugins, prompts, chains, APIs Accuracy Quality: accuracy, similarity Harm: bias, toxicity Correct: groundness Cost: token per request Latency: response time, RPS Build from scratch Pre-built, fine-tuned served as API (MaaS)
  6. Deployment & Inferencing Package and deploy the LLM flow as

    a scalable container for making predictions. Additionaly enable Blue/Green deployment with traffic routing control so that A/B testing can be done for the LLM flow. Prompt Engineering Prompt engineering or tuning with instructions describing the tasks that will be performed by the LLM model along with several measures for securities. CI CE and CD Continious Integration, Continious Evaluation and Continous Deployment of the LLM flows to maintain code quality with engineering best practices, comparing LLM performance and promotion to the higher environments. Foundational LLM Selection of the right Foundation Models such as Azure OpenAI models, Llama2, Falcon or any models from HuggingFace. If necessary, a fine- tuned model. Data & Services Enrich LLM models with domain scpecifc grounding data (RAG pattern) or enable in-context learning with use case speicifc examples. Monitor Monitoring performance metrics for the LLM flow, detecting data drifts and communicating the model's performance to stakeholders. Online Evaluation LLM online evaluations are very criticle to understand the performance, potentials risks, etc, where the LLM answer will be evaluated by one or more evaluation mechanism. Experiment & Evaluate Execute the flow (prompt + additional data or services) end-to-end with s ample input data. Evaluate the responses from LLM for large datasets against ground truth (if any) or if answer is relevant as per the context. LLM LifeCycle
  7. Operationalize LLM app development with prompt flow LLMOps is a

    complex process. Customers want: • Private data access and controls • Prompt engineering • CI/CD • Iterative experimentation • Versioning and reproducibility • Deployment and optimization • Safe and Responsible AI Design and development Develop flow based on prompt to extend the capability Debug, run, and evaluate flow with small data Modify flow (prompts and tools etc.) No If satisfied Yes Evaluation and refinement No Evaluate flow against large dataset with different metrics (quality, relevance, safety, etc.) If satisfied Yes Optimization and production Optimize flow Deploy and monitor flow Get end user feedback
  8. Azure AI Prompt Flow • Create AI workflows that connect

    various language models, APIs, and data sources to ground LLMs on your data. • One platform to design, construct, tune, evaluate, test, and deploy LLM workflows • Evaluate the quality of workflows with rich set of pre-built metrics and safety system. • Easy prompt tuning, comparison prompt variants, and version- controlling. Streamline prompt engineering projects
  9. Flow Orchestration Develop your LLM flow from scratch • Use

    any framework such as LangChain or Semantic Kernel to build initial flows • Add your own reusable tools • Manage your flows as files on disk • Track run history
  10. Integration Management Manage APIs and external data sources • Seamless

    integration with pre-built LLMs like Azure OpenAI Service • Built-in safety system with Azure AI Content Safety • Effectively manage credentials or secrets for APIs • Create your own connections in Python tools
  11. LLM Tuning Variants • Create dynamic prompts using external data

    and few shot samples • Edit your complex prompts in full screen • Quickly tune prompt and LLM configuration with variants
  12. Prompt Evaluation Evaluation • Evaluate flow performance with your own

    data • Use pre-built evaluation flows • Compare multiple variants or runs to pick best flow • Ensure accuracy by scaling the size of data in evaluation • Build your own custom evaluation flows Tune Variant 0 Tune Variant 1 Tune Variant 2 Flow variants Evaluatio n Bulk Test
  13. Evaluation Metrics • Groundedness:evaluates how well the model's generated answers

    align with information from the input source. • Relevance: evaluates the extent to which the model's generated responses are pertinent and directly related to the given questions. • Coherence: evaluates how well the language model can produce output flows smoothly, reads naturally, and resembles human-like language. • Fluency: evaluates the language proficiency of a generative AI's predicted answer. It assesses how well the generated text adheres to grammatical rules, syntactic structures, and appropriate usage of vocabulary, resulting in linguistically correct and natural-sounding responses. • Similarity: evaluates the similarity between a ground truth sentence (or document) and the prediction sentence generated by an AI model.
  14. Deployment Deploy • Seamless transition from development to production with

    AzureML’s managed online endpoints, Azure Kubernetes Service and Azure App Service. Productio n Tune Variant 0 Tune Variant 1 Tune Variant 2 Flow variants Test App
  15. Resources Go To https://aka.ms/prompt_flow • Getting Started with Azure AI

    Studio's Prompt Flow https://www.youtube.com/watch?v=vkM_sgaMTsU • LLMOps with Azure Prompt Flow & Github https://www.youtube.com/watch?v=j0YJ3BZjrFs