Slide 1

Slide 1 text

©2022 Databricks Inc. — All rights reserved LLMs in the enterprise market 1 Victor van den Broek Big Data Expo Utrecht - 12 september 2023

Slide 2

Slide 2 text

©2022 Databricks Inc. — All rights reserved 2 Victor van den Broek - Databricks since 2023 - Focus on Dutch & Belgian public sector and Financial services - 15 years of ‘data experience’ as DE, DS, DA, PO… - linkedin.com/in/victorvdb Solutions Architect @ Databricks

Slide 3

Slide 3 text

©2023 Databricks Inc. — All rights reserved $3B in investment 5000+ global employees $1B+ in revenue Inventor and pioneer of the data lakehouse Gartner-recognized Leader Database Management Systems Data Science and Machine Learning Platforms The Lakehouse Company Creator of

Slide 4

Slide 4 text

©2022 Databricks Inc. — All rights reserved 4 LLM 101

Slide 5

Slide 5 text

©2022 Databricks Inc. — All rights reserved Generative AI, LLMs and Foundation Models 5 Artificial Intelligence (AI) Multidisciplinary field of computer science that aims to create systems capable emulating human intelligence Machine Learning (ML) Learn from existing data and make predictions without being explicitly programmed Deep Learning (DL) Use artificial neural networks to learn from data Generative AI Subfield of AI focussing on generating new data (images, text, audio, code, ...) LLM Models trained on massive datasets to achieve advanced language processing capabilities Foundation Models (GPT-4, BARD, MPT-7B, …) LLMs which can serve as the base for a wide range or applications

Slide 6

Slide 6 text

©2022 Databricks Inc. — All rights reserved LLMs are not that new Why should I care now? Accuracy and effectiveness has hit a tipping point • Many new use cases are unlocked! • Accessible by all. Readily available data and tooling • Large datasets. • Open-sourced model options. • Requires powerful GPUs, but are available on the cloud.

Slide 7

Slide 7 text

©2022 Databricks Inc. — All rights reserved Machine Translation Text Summarization Chatbots & Conversational Interfaces Language Models are everywhere…

Slide 8

Slide 8 text

©2022 Databricks Inc. — All rights reserved What is a language model? Finds the most likely next word in a sequence Avocados are … Stochastic Parrot Green Fruit Delicious Luxurious

Slide 9

Slide 9 text

©2022 Databricks Inc. — All rights reserved I might read about 700 books in my lifetime

Slide 10

Slide 10 text

©2022 Databricks Inc. — All rights reserved A Large LM may be trained on 10.000.000 book equivalents

Slide 11

Slide 11 text

©2022 Databricks Inc. — All rights reserved 11 - Faster software development - More users can leverage AI - More use cases - Reduce development cost - Reduce monotonous tasks

Slide 12

Slide 12 text

©2022 Databricks Inc. — All rights reserved 12 How do you get to an enterprise deployment?

Slide 13

Slide 13 text

Use Existing Model or Build Your Own Model Serving and Monitoring Data Collection and Preparation DATA PLATFORM UNITY CATALOG Datasets Models Applications

Slide 14

Slide 14 text

©2022 Databricks Inc. — All rights reserved 14 LLM level 0 Plain foundational models “Everyone” has done this - go to ChatGPT and ask questions without much engineering. Typical enterprise use cases: - Text summarization - Text classification - Generic coding assistants

Slide 15

Slide 15 text

©2022 Databricks Inc. — All rights reserved

Slide 16

Slide 16 text

©2022 Databricks Inc. — All rights reserved Alternative to Azure OpenAI are Open Source Models 16

Slide 17

Slide 17 text

Use Existing Model or Build Your Own Model Serving and Monitoring Data Collection and Preparation DATA PLATFORM UNITY CATALOG Datasets Models Applications Curated AI Models Model Serving optimized for LLMs MLflow AI Gateway Plain LLM Lakehouse Monitoring

Slide 18

Slide 18 text

©2022 Databricks Inc. — All rights reserved 18 LLM level 1 Prompt engineering Add contextual information in the prompt, to give the model specific information pertaining to the question. Typical enterprise use cases: - Customer service chatbots - Specific coding assistants

Slide 19

Slide 19 text

©2022 Databricks Inc. — All rights reserved 19

Slide 20

Slide 20 text

Use Existing Model or Build Your Own Model Serving and Monitoring Data Collection and Preparation DATA PLATFORM UNITY CATALOG Datasets Models Applications Curated AI Models Model Serving optimized for LLMs Lakehouse Monitoring MLflow AI Gateway Feature Serving Mlflow Evaluation Plain LLM Simple prompt engineering

Slide 21

Slide 21 text

©2022 Databricks Inc. — All rights reserved 21 LLM level 2 Fine tuning Using data you have available, you can fine tune LLMs to fit your use case. Depending on whether they are open-source or closed source, the methodology will differ. Regardless, it will require data specific to your use case, and engineering capabilities - humans and hardware! Typical enterprise use cases: - LLM fine tuned to answer questions in a specialist area (e.g. legal, medical)

Slide 22

Slide 22 text

Use Existing Model or Build Your Own Model Serving and Monitoring Data Collection and Preparation DATA PLATFORM UNITY CATALOG Datasets Models Applications Feature Serving Curated AI Models AutoML for LLM training Model Serving optimized for LLMs Lakehouse Monitoring MLflow AI Gateway Mlflow Evaluation Plain LLM Simple prompt engineering Fine tuning

Slide 23

Slide 23 text

©2022 Databricks Inc. — All rights reserved 23 LLM level 3 Retrieval Augmented Generation Encode all relevant data you have with an LLM to a vector database. Then, retrieve the most relevant data and ingest them into the prompts. Basically prompt engineering on steroids, but requires you to encode all the data you have already, and keep using that LLM to encode questions as well. Typical enterprise use cases: - LLM answering about specifics in documents, such as purchase orders and contracts

Slide 24

Slide 24 text

Use Existing Model or Build Your Own Model Serving and Monitoring Data Collection and Preparation DATA PLATFORM UNITY CATALOG Datasets Models Applications Vector Search Feature Serving Curated AI Models AutoML for LLM training Model Serving optimized for LLMs Lakehouse Monitoring MLflow AI Gateway Mlflow Evaluation Plain LLM Simple prompt engineering Fine tuning Retrieval Augmented Generation

Slide 25

Slide 25 text

©2022 Databricks Inc. — All rights reserved 25 RAG vs Fine-Tuning Generic answers with specific knowledge vs specific answers

Slide 26

Slide 26 text

©2022 Databricks Inc. — All rights reserved 26 RAG vs Fine-Tuning Generic answers with specific knowledge vs specific answers

Slide 27

Slide 27 text

©2022 Databricks Inc. — All rights reserved 27 RAG vs Fine-Tuning Generic answers with specific knowledge vs specific answers

Slide 28

Slide 28 text

©2022 Databricks Inc. — All rights reserved 28 LLM level 4 Training your own model from 0 If all else fails, or you have specific governance / IP / risk requirements, then training a model from scratch becomes an option. However this is both very difficult and very expensive, and there are currently very few enterprise use cases in which this is the solution. If you are one of them, you will know ;-)

Slide 29

Slide 29 text

Use Existing Model or Build Your Own Model Serving and Monitoring Data Collection and Preparation DATA PLATFORM UNITY CATALOG Datasets Models Applications Vector Search Feature Serving Curated AI Models AutoML for LLM training Model Serving optimized for LLMs Lakehouse Monitoring MLflow AI Gateway Mlflow Evaluation Plain LLM Simple prompt engineering Fine tuning Retrieval Augmented Generation Training from scratch

Slide 30

Slide 30 text

Generative AI Fundamentals Course Earn your badge today and share your accomplishment on LinkedIn or résumé Build foundational knowledge of generative AI, including large language models (LLMs), with this free training course. ➔ Welcome and Introduction to the Course ➔ Introducing Generative AI ➔ Finding Success With Generative AI ➔ Assessing Potential Risks and Challenges Available on Databricks.com

Slide 31

Slide 31 text

©2023 Databricks Inc. — All rights reserved 23 November | Beurs van Berlage Register now 31