Slide 1

Slide 1 text

Vertrouwelijk Aangepast voor naam van het bedrijf Versie 1.0 Generative AI Vereniging Informatici Defensie - October 5, 2023 Midjourney - "a symposium for ministry of defense staff, with drones in attendance, sci-fi movie style" Ivo Jansch [email protected] @ijansch

Slide 2

Slide 2 text

Vertrouwelijk Aangepast voor naam van het bedrijf Versie 1.0 DISCLAIMER By the time we finish this talk it's probably already outdated Midjourney - "people attending a presentation in an auditorium. There's a time machine phonebooth from Dr Who in the corner"

Slide 3

Slide 3 text

Vertrouwelijk Aangepast voor naam van het bedrijf Versie 1.0 Chapter 1: How does Generative AI work? Midjourney - "a brain of a robot, full of neurons. Science fiction style"

Slide 4

Slide 4 text

2005 - 2015: Rise of Deep Learning Introducing layers into the field of machine learning: pixels grouped pixels edges green hues … a cat a cat staring a cat staring at you a cat at a waterhole in the jungle

Slide 5

Slide 5 text

2017: Invention of the Transformer Architecture Has nothing to do with The Transformers Has everything to do with a new, scalable architecture that created the ability to process large quantities of data and most importantly: - Self supervised learning - Self attention

Slide 6

Slide 6 text

Understanding attention The cat drank from the waterhole until it was full The cat drank from the waterhole until it was empty

Slide 7

Slide 7 text

Understanding attention The cat drank from the waterhole until it was full The cat drank from the waterhole until it was empty

Slide 8

Slide 8 text

2018: GPT Generative Predict the next word(s) from a sequence of words Pre-trained Trained on a large corpus of text Transformer Built on the transformer architecture

Slide 9

Slide 9 text

Training the model Training Data 17 Gb for GPT-3 45 Gb for GPT-4 Language Model 175 billion parameters for GPT-3 1 trillion parameters for GPT-4

Slide 10

Slide 10 text

Training the model - word embeddings Cat Animal 0.3 Domestic 0.2 Hairy 0.25 Smelly 0.25 Food 0.3 PC 0.98 Mouse 0.3 other words encountered distance to those words

Slide 11

Slide 11 text

Training the model - word embeddings Cat Animal 0.3 Domestic 0.2 Hairy 0.25 Smelly 0.25 Food 0.3 PC 0.98 Mouse 0.3 Mouse Animal 0.3 Domestic 0.5 Hairy 0.25 Smelly 0.25 Food 0.7 PC 0.2 Cat 0.3 Dog Animal 0.3 Domestic 0.2 Hairy 0.25 Smelly 0.4 Food 0.25 PC 0.99 Mouse 0.7 PC Animal 0.98 Domestic 0.99 Hairy 0.7 Smelly 0.8 Food 0.8 Cat 0.98 Mouse 0.3

Slide 12

Slide 12 text

Word embeddings are useful for self-attention The cat drank from the waterhole until it was full Cat Full

Slide 13

Slide 13 text

The model 'knows' relationships between words Cat Drink Pet Water Fat Rat Dog PC Mouse Jungle Rhyme Vast

Slide 14

Slide 14 text

The model 'knows' relationships between words Cat Drink Pet Water Fat Rat Dog PC Mouse Jungle Rhyme Vast These are the 'parameters'

Slide 15

Slide 15 text

Then it starts predicting Given a prompt, what is the most likely word that comes next What is a cat? 96% 20% 94% Fat A pet An animal

Slide 16

Slide 16 text

Like humans, not always the same What is a cat? 96% 20% 94% Fat A pet An animal 'Temperature' introduces some randomness between similar answers

Slide 17

Slide 17 text

Word by word Given a prompt, what is the most likely word that comes next Input: Why does a cat drink water? Prediction: A cat

Slide 18

Slide 18 text

Word by word by word Given a prompt, what is the most likely word that comes next Input: Why does a cat drink water? A cat Prediction: needs

Slide 19

Slide 19 text

Word by word by word by word Given a prompt, what is the most likely word that comes next Input: Why does a cat drink water? A cat needs Prediction: nutrition

Slide 20

Slide 20 text

Word by word by word by word Given a prompt, what is the most likely word that comes next Done: Why does a cat drink water? A cat needs nutrition

Slide 21

Slide 21 text

Note: a model works with tokens, not words The cat drank from the waterhole until it was full All tokens are converted to numbers. 17 1345 98 45 17 2624 213 21 78 723 A language model is essentially an abstract model of relationships between numbers.

Slide 22

Slide 22 text

GPT's are "stochastic parrots" - They "understand" - Yet, they don't truly understand what they are talking about - They predict which numbers are most likely to follow other numbers, given the context of other numbers.

Slide 23

Slide 23 text

Disclaimer That was by no means a scientifically correct explanation. But explains in a very simplified way how it works.

Slide 24

Slide 24 text

If it works for words, it can also work for other things Music… Images… Film…

Slide 25

Slide 25 text

News this week:

Slide 26

Slide 26 text

Vertrouwelijk Aangepast voor naam van het bedrijf Versie 1.0 Chapter 2: Generative AI as assistant Midjourney - "a development team sitting around a table, everyone wearing AR glasses. Science fiction style"

Slide 27

Slide 27 text

Increase Coding Productivity

Slide 28

Slide 28 text

Coaching Junior Developers

Slide 29

Slide 29 text

Coaching Junior Developers

Slide 30

Slide 30 text

Designing Data Models LLMs can help with creating data models and normalization.

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Vertrouwelijk Aangepast voor naam van het bedrijf Versie 1.0 Chapter 3: Generative AI application development Midjourney - "smart ai robots in a factory, performing tedious manual tasks for humans, science fiction style"

Slide 33

Slide 33 text

Application areas Enhancing Copy writing Gap filling Summarising Extracting Analysing Translating Converting Copy editing Models are good at: But not so suitable for factfinding / data accuracy

Slide 34

Slide 34 text

Application Approach 1: Train your own model Training Huge Dataset Custom LLM Prompt engineering Input Output

Slide 35

Slide 35 text

Application Approach 2: Model finetuning Training Additional Dataset Custom LLM Existing LLM Prompt engineering Input Output

Slide 36

Slide 36 text

Application Approach 3: Vector databases Vector Database Existing LLM Prompt engineering Additional Dataset Create Embeddings Input Output

Slide 37

Slide 37 text

Application Approach 4: LLMs as tool in a chain SQL Query Additional Dataset Factual Results Summarize Existing LLM Translate to query Input Output

Slide 38

Slide 38 text

Example usecase "Can AI assist with answering parliamentary questions?"

Slide 39

Slide 39 text

Example use case "Can AI assist with answering parliamentary questions?"

Slide 40

Slide 40 text

Application Application Proof of concept: LLM: Predict Parliamentary Questions Historic Parliamentary Questions Vector Database Create Embeddings News Prompt Engineering Formulate Response LLM: Verify against party program / prior statements LLM: Assist copy-writing for chamber, press and public Verify 💻 Party Programs Vector DB

Slide 41

Slide 41 text

Vertrouwelijk Aangepast voor naam van het bedrijf Versie 1.0 Chapter 4: The dark side Midjourney - "a postapocalyptic city run by AI, full of drones. Humans have disappeared or are enslaved. Photorealistic image"

Slide 42

Slide 42 text

Hallucinations

Slide 43

Slide 43 text

Hallucinations

Slide 44

Slide 44 text

Hallucinations

Slide 45

Slide 45 text

Hallucinations

Slide 46

Slide 46 text

Hallucinations

Slide 47

Slide 47 text

What probably happened Tokens 5+ real methods starting with SecKeyCopy

Slide 48

Slide 48 text

Copyright concerns

Slide 49

Slide 49 text

Copyright concerns The output can contain copyrighted material!

Slide 50

Slide 50 text

Security concerns Your input can end up in training data!

Slide 51

Slide 51 text

Security & Quality concerns Paper: https://arxiv.org/abs/2211.03622

Slide 52

Slide 52 text

Environmental concerns

Slide 53

Slide 53 text

Environmental concerns Source: https://blogs.nvidia.com/blog/2022/03/25/what-is-a-transformer-model

Slide 54

Slide 54 text

Existential concerns Asimov's first law of robotics: "a robot shall not harm a human, or by inaction allow a human to come to harm"

Slide 55

Slide 55 text

Existential concerns FAKE NEWS Asimov's first law of robotics: "a robot shall not harm a human, or by inaction allow a human to come to harm"

Slide 56

Slide 56 text

Things to pay attention to AI tools are useful, but: ● Ensure human supervision -> Code reviews ● Be transparent about the use of AI ● Pay attention to Terms & Conditions of the tools you use ● Keep an eye on copyright legislation ● Choose the right tool for the job ○ ChatGPT is not the only player in town ○ Consider open source alternatives such as LLAMA ■ https://github.com/eugeneyan/open-llms

Slide 57

Slide 57 text

Vertrouwelijk Aangepast voor naam van het bedrijf Versie 1.0 Chapter 5: The future Midjourney - "A robot and a human holding hands, watching the sunset in the distance. Science fiction style"

Slide 58

Slide 58 text

Will AI replace developers? I gave it a try:

Slide 59

Slide 59 text

Let's see how far we can take this…

Slide 60

Slide 60 text

Let's see how far we can take this…

Slide 61

Slide 61 text

At this point, ChatGPT is lying through the teeth

Slide 62

Slide 62 text

Yay!

Slide 63

Slide 63 text

Now the tricky part…

Slide 64

Slide 64 text

But ChatGPT is easily convinced…

Slide 65

Slide 65 text

Chugging along…

Slide 66

Slide 66 text

And we're done! Little apple, little egg

Slide 67

Slide 67 text

Ouch…

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

AI PROOF!

Slide 70

Slide 70 text

Will AI replace developers? Visual Basic didn't make developers obsolete Outsourcing didn't make developers obsolete No-code systems didn't make developers obsolete AI won't make developers obsolete

Slide 71

Slide 71 text

Will AI replace developers? Development is so much more than producing code. AI will make programming more productive, but we will still require software engineering.

Slide 72

Slide 72 text

Vertrouwelijk Aangepast voor naam van het bedrijf Versie 1.0 Thank you! Vereniging Informatici Defensie - October 5, 2023 Midjourney - "a symposium for ministry of defense staff, with drones in attendance, sci-fi movie style" Ivo Jansch [email protected] @ijansch