Slide 1

Slide 1 text

Proprietary + Confidential Marc Cohen Developer Advocate at Google [email protected] Lessons learned building a GenAI powered app Mete Atamel Developer Advocate at Google @meteatamel atamel.dev speakerdeck.com/meteatamel

Slide 2

Slide 2 text

Proprietary + Confidential Before GenAI GenAI arrives Architecture After GenAI Lessons Learned 01 02 03 04 05 Agenda

Slide 3

Slide 3 text

Before GenAI

Slide 4

Slide 4 text

Every great invention started out as someone’s weekend project. This is gonna be huge!

Slide 5

Slide 5 text

August 2016

Slide 6

Slide 6 text

Proprietary + Confidential Demo: Initial app with Open Trivia DB https://opentdb.com/api_config.php

Slide 7

Slide 7 text

Initial problems ● Limited list of topics ● Limited questions and answers ● Limited format: multiple choice with 4 answers ● English only ● No images ● Expanding quiz content is difficult

Slide 8

Slide 8 text

GenAI arrives

Slide 9

Slide 9 text

March 2023 Is it possible to have a more dynamic quiz app with infinite content using GenAI?

Slide 10

Slide 10 text

(pronounced like mosaic)

Slide 11

Slide 11 text

Proprietary + Confidential Demo: GenAI powered app

Slide 12

Slide 12 text

Architecture

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Flutter for client

Slide 15

Slide 15 text

Cloud Run hosts ui and api servers

Slide 16

Slide 16 text

Cloud Firestore as backend

Slide 17

Slide 17 text

Five Key Data Structures 1. admins 2. generators 3. quizzes 4. sessions 5. results

Slide 18

Slide 18 text

Four Key Personas 1. admin 2. creator 3. host 4. player

Slide 19

Slide 19 text

Data Access Model

Slide 20

Slide 20 text

API

Slide 21

Slide 21 text

Vertex AI on Google Cloud for LLMs

Slide 22

Slide 22 text

Quiz Generators Name Type Format OpenTrivia static multiple choice Palm genAI multiple choice (possible: free-form) Gemini (pro, ultra) genAI multiple choice (possible: free-form)

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Image Generator Name Type Description ImageGen (v1, v2) genAI Uses ImageGen model to generate images for quizzes

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

After GenAI

Slide 28

Slide 28 text

● Limited Unlimited list of topics ● Limited Unlimited questions and answers ● Limited Unlimited format ● English only Any language ● No Unlimited images ● Expanding quiz content is difficult easy with GenAI Revisit: Initial problems

Slide 29

Slide 29 text

● Learning curve with GenAI ● Inconsistent or no outputs from LLMs ● Slow LLM calls ● Hallucinations ● Hard to check the accuracy and quality of LLM outputs ● Fast changing landscape (models, APIs, libraries, etc.) New problems with GenAI

Slide 30

Slide 30 text

Lessons Learned

Slide 31

Slide 31 text

🎓 General Surprisingly easy to do hard things with GenAI ● Quiz/image generation with a single API call Hard to do things well and consistently ● Good results require prompt engineering ● You will get inconsistent outputs ● Hard to measure the output quality

Slide 32

Slide 32 text

🎓 General Accept uncertainty of LLMs ● Same prompt, same model ⇒ different output ● Same prompt, same model gets updated ⇒ different output ● Same prompt, different model ⇒ different output

Slide 33

Slide 33 text

🎓 General Free upgrades with new/updated models ● Palm ⇒ Gemini-Pro: better quizzes ● Gemini-Pro ⇒ Gemini-Ultra: even better quizzes ● Imagen v1 ⇒ v2: better images ● No or little code changes

Slide 34

Slide 34 text

🎓 General Do you even need an LLM? ● In grading free-form ⇒ LLM vs. TheFuzz library ● Image of the app ⇒ ImageGen vs. good old photo editor ● Sometimes you don't need an expensive LLM call

Slide 35

Slide 35 text

🎓 Prompting Be specific and clear with prompts More detailed prompts != better results Manage prompts like code ● Version prompts for safe iteration ● Prompt + output parsers go hand-in-hand

Slide 36

Slide 36 text

🎓 Coding with LLMs Code defensively ● LLM call can fail => Retry and keep the user informed ● LLM can give you malformed JSON ⇒ Can you still parse JSON somehow? ● LLM can return empty results ⇒ Can you live with no quizzes or no image? ● LLM can be too cautious ⇒ Do you need to change safety settings?

Slide 37

Slide 37 text

🎓 Coding with LLMs Pin model versions ● gemini-1.0-pro refers to the latest and can change to gemini-1.0-pro@001, gemini-1.0-pro@002, … ● Use a specific version such as gemini-1.0-pro@001

Slide 38

Slide 38 text

🎓 Coding with LLMs Consider using a higher level library like LangChain ● You can use Gemini from Google AI Studio and Vertex AI but each has different libraries ● In Vertex AI, libraries for PaLM and Gemini are different ● Other non-Google models have their own libraries ● LangChain can help to abstract all of this away

Slide 39

Slide 39 text

🎓 Coding with LLMs Good old software engineering tricks ● Minimize LLM calls by batching prompts ● Use parallel calls (eg. quiz and image generation runs in parallel) ● Cache common responses

Slide 40

Slide 40 text

Unit/functional tests are as important as ever ● Easy to check existence or format ● Is this a quiz with 5 questions and 4 answers? ● Is the image generated or not? 🎓 Testing and Validation

Slide 41

Slide 41 text

Testing quality and accuracy is more difficult ● Is the quiz actually on the topic of history? ● Is the answer actually correct? ● Is the generated image appropriate for the quiz? (still an open question) Need a way to measure LLM outputs ● Automate it, and use as a benchmark to work towards 🎓 Testing and Validation

Slide 42

Slide 42 text

Use LLM to evaluate LLM outputs 🎓 Testing and Validation

Slide 43

Slide 43 text

How do you know if the validator works? ● Use OpenTrivia as corpus of accurate quizzes ● See how validator performs against OpenTrivia 🎓 Testing and Validation

Slide 44

Slide 44 text

Every multiple choice quiz can be decomposed into four assertions, of the form: Q: question A: answer For example…Who was the first US president? A. Thomas Jefferson B. Alexander Hamilton C. George Washington D. Bill Clinton can be decomposed into these four assertions: ● Q: Who was the first US president? A: Thomas Jefferson is False ● Q: Who was the first US president? A: Alexander Hamilton is False ● Q: Who was the first US president? A: George Washington is True ● Q: Who was the first US president? A: Bill Clinton is False

Slide 45

Slide 45 text

Evaluation “In one (and only one) word, are the following assertions true or false?” Q: Who was the first US president? A: Thomas Jefferson Q: Who was the first US president? A: Alexander Hamilton Q: Who was the first US president? A: George Washington Q: Who was the first US president? A: Bill Clinton LLM: False False True False

Slide 46

Slide 46 text

🎓 Testing and Validation PaLM initially got around 80% accuracy Gemini Ultra got 91% accuracy

Slide 47

Slide 47 text

🎓 Testing and Validation Ultimately, you need grounding for more accuracy (eg. grounding with Google Search)

Slide 48

Slide 48 text

Is it possible to have a more dynamic and richer quiz app with the help of GenAI? 7 years  7 weeks  7 years to 7 weeks

Slide 49

Slide 49 text

Thank you Marc Cohen Developer Advocate at Google [email protected] Mete Atamel Developer Advocate at Google @meteatamel atamel.dev speakerdeck.com/meteatamel