Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Large Language Model Powered Apps: Best Practices

Building Large Language Model Powered Apps: Best Practices

In this session, Kacper Łukawski, from Qdrant will discuss best practices for building LLM-based apps and their integrations. As the adoption of these models becomes more widespread, it's essential to understand the potential technical hurdles that could impact the system's performance and scalability.

During the talk, we will review the existing tools and see how to move from development to production.

This webinar will cover:

📌 Learn about the key considerations when designing applications powered by LLMs, including choosing the right model, understanding the computational requirements, and ensuring data privacy.

📌 Dive into the technical aspects of training and fine-tuning LLMs for your specific application needs.

📌 Discover the best practices for deploying and scaling LLM-based applications, including model versioning, A/B testing.

Kacper Łukawski

October 13, 2023

More Decks by Kacper Łukawski

Other Decks in Technology


  1. Vector Search in production Qdrant is a vector search database

    using HNSW, one of the most promising algorithms for Approximate Nearest Neighbours. • Written in Rust. • HTTP / gRPC APIs + official SDKs. • Local in-memory, Docker & Cloud. • Metadata filtering built-in into vector search phase.
  2. Freeware/Freemium is not open source 1. “Additional Commercial Terms. If,

    on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.“ 2. “You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).” Source: https://opensourceconnections.com/blog/2023/07/19/is-llama-2...definition-of-open/
  3. 1. Self-supervised - predict the next token in a document.

    2. Supervised - based on ideal responses to given prompts. 3. RLHF - produce the best answer, based on a different model feedback. Choosing the strategy
  4. Common issues with Retrieval Augmented Generation 1. Wrong chunking strategy.

    2. Poor embedding models. 3. Too few documents put into context (also too many). 4. Bad prompts.
  5. Fine-tuning checklist ❏ Our language model cannot produce responses in

    a way we expect it to. ❏ Prompt engineering doesn’t help. ❏ RAG introduces relevant results, but they are improperly treated by the LLM.
  6. A framework for running LLMs, AI, and batch jobs on

    any cloud, offering maximum cost savings, highest GPU availability, and managed execution. License: Apache 2.0 Author: UC Berkeley’s RISELab
  7. The default prompt for one of the available chains Source:

  8. Using LLMs, the proper way - Dataset versioning - Model

    versioning - Model evaluation - Prompt versioning
  9. FastChat An open platform for training, serving, and evaluating large

    language model based chatbots. License: Apache 2.0 Author: The Large Model System Organization (https://lmsys.org)