Slide 1

Slide 1 text

meets LLMs Structured Data USING GENERATIVE AI FOR Ines Montani Explosion

Slide 2

Slide 2 text

Open-source library for industrial-strength natural language processing spacy.io 225m+ DOWNLOADS github.com/explosion/spaCy

Slide 3

Slide 3 text

Open-source library for industrial-strength natural language processing spacy.io 225m+ DOWNLOADS github.com/explosion/spaCy ChatGPT can write spaCy code!

Slide 4

Slide 4 text

designed for production

Slide 5

Slide 5 text

extensible and programmable designed for production

Slide 6

Slide 6 text

carefully designed and consistent API extensible and programmable designed for production

Slide 7

Slide 7 text

carefully designed and consistent API extensible and programmable serializable data structures designed for production

Slide 8

Slide 8 text

carefully designed and consistent API extensible and programmable serializable data structures good error handling designed for production

Slide 9

Slide 9 text

carefully designed and consistent API extensible and programmable serializable data structures good error handling pipeline approach to combine techniques and share data designed for production

Slide 10

Slide 10 text

carefully designed and consistent API extensible and programmable serializable data structures good error handling pipeline approach to combine techniques and share data designed for production “just works” pre-configured solutions

Slide 11

Slide 11 text

Prototype HOW TO AVOID THE Plateau

Slide 12

Slide 12 text

Prototype HOW TO AVOID THE Plateau standardize inputs and outputs

Slide 13

Slide 13 text

Prototype HOW TO AVOID THE Plateau standardize inputs and outputs start with evaluation

Slide 14

Slide 14 text

Prototype HOW TO AVOID THE Plateau standardize inputs and outputs start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking

Slide 15

Slide 15 text

Prototype HOW TO AVOID THE Plateau standardize inputs and outputs start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking work iteratively

Slide 16

Slide 16 text

Prototype HOW TO AVOID THE Plateau standardize inputs and outputs start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking consider structure and properties of language work iteratively

Slide 17

Slide 17 text

Doc classic NLP pipeline

Slide 18

Slide 18 text

Doc Entity Recognizer classic NLP pipeline

Slide 19

Slide 19 text

Doc Entity Recognizer Text Categorizer classic NLP pipeline

Slide 20

Slide 20 text

Doc Entity Recognizer Text Categorizer ... classic NLP pipeline

Slide 21

Slide 21 text

Doc Entity Recognizer Text Categorizer ... classic NLP pipeline spacy.io/usage/large-language-models Doc ... llm LLM-powered pipeline

Slide 22

Slide 22 text

Doc Entity Recognizer Text Categorizer ... classic NLP pipeline spacy.io/usage/large-language-models Doc ... llm LLM-powered pipeline GPT-4 Falcon

Slide 23

Slide 23 text

Doc llm spacy.io/usage/large-language-models

Slide 24

Slide 24 text

Doc llm spacy.io/usage/large-language-models Prompt Template

Slide 25

Slide 25 text

Doc llm spacy.io/usage/large-language-models Prompt Template Model

Slide 26

Slide 26 text

Doc llm spacy.io/usage/large-language-models Prompt Template Model Response Parser

Slide 27

Slide 27 text

Doc llm spacy.io/usage/large-language-models Prompt Template Model Response Parser Structured Attributes

Slide 28

Slide 28 text

Doc llm spacy.io/usage/large-language-models Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API

Slide 29

Slide 29 text

Doc llm spacy.io/usage/large-language-models Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg

Slide 30

Slide 30 text

Doc llm ... other components spacy.io/usage/large-language-models Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg

Slide 31

Slide 31 text

Doc llm ... other components LLM trained model rules spacy.io/usage/large-language-models Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg

Slide 32

Slide 32 text

Doc llm ... other components LLM trained model rules spacy.io/usage/large-language-models Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg Entity Linking, Summarization, Entity Recognition, Span Categorization, Relation Extraction, Text Categorization, Sentiment Analysis, Translation

Slide 33

Slide 33 text

Doc llm prototype

Slide 34

Slide 34 text

Doc llm prototype continuous evaluation baseline

Slide 35

Slide 35 text

Doc llm human-in- the-loop distillation prototype continuous evaluation baseline

Slide 36

Slide 36 text

Doc llm Entity Recognizer Text Categorizer … human-in- the-loop distillation prototype continuous evaluation baseline

Slide 37

Slide 37 text

production Doc Doc llm Entity Recognizer Text Categorizer … human-in- the-loop distillation prototype continuous evaluation baseline

Slide 38

Slide 38 text

production Doc Doc llm Entity Recognizer Text Categorizer … human-in- the-loop distillation prototype continuous evaluation baseline distilled model

Slide 39

Slide 39 text

Thank you! Explosion spaCy Prodigy Twitter Mastodon Bluesky explosion.ai spacy.io prodigy.ai @_inesmontani @[email protected] @inesmontani.bsky.social LinkedIn