spaCy meets LLMs: Using Generative AI for Structured Data

meets LLMs Structured Data USING GENERATIVE AI FOR Ines Montani
Explosion

Open-source library for industrial-strength natural language processing spacy.io 225m+ DOWNLOADS
github.com/explosion/spaCy

Open-source library for industrial-strength natural language processing spacy.io 225m+ DOWNLOADS
github.com/explosion/spaCy ChatGPT can write spaCy code!

designed for production

extensible and programmable designed for production

carefully designed and consistent API extensible and programmable designed for
production

carefully designed and consistent API extensible and programmable serializable data
structures designed for production

structures good error handling designed for production

structures good error handling pipeline approach to combine techniques and share data designed for production

structures good error handling pipeline approach to combine techniques and share data designed for production “just works” pre-configured solutions

Prototype HOW TO AVOID THE Plateau

Prototype HOW TO AVOID THE Plateau standardize inputs and outputs

start with evaluation

start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking

start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking work iteratively

start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking consider structure and properties of language work iteratively

Doc classic NLP pipeline

Doc Entity Recognizer classic NLP pipeline

Doc Entity Recognizer Text Categorizer classic NLP pipeline

Doc Entity Recognizer Text Categorizer ... classic NLP pipeline

Doc Entity Recognizer Text Categorizer ... classic NLP pipeline spacy.io/usage/large-language-models
Doc ... llm LLM-powered pipeline

Doc Entity Recognizer Text Categorizer ... classic NLP pipeline spacy.io/usage/large-language-models
Doc ... llm LLM-powered pipeline GPT-4 Falcon

Doc llm spacy.io/usage/large-language-models

Doc llm spacy.io/usage/large-language-models Prompt Template

Doc llm spacy.io/usage/large-language-models Prompt Template Model

Doc llm spacy.io/usage/large-language-models Prompt Template Model Response Parser

Doc llm spacy.io/usage/large-language-models Prompt Template Model Response Parser Structured Attributes

unified, model- agnostic API

unified, model- agnostic API con fi g.cfg

Doc llm ... other components spacy.io/usage/large-language-models Prompt Template Model Response
Parser Structured Attributes unified, model- agnostic API con fi g.cfg

Doc llm ... other components LLM trained model rules spacy.io/usage/large-language-models
Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg

Doc llm ... other components LLM trained model rules spacy.io/usage/large-language-models
Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg Entity Linking, Summarization, Entity Recognition, Span Categorization, Relation Extraction, Text Categorization, Sentiment Analysis, Translation

Doc llm prototype

Doc llm prototype continuous evaluation baseline

Doc llm human-in- the-loop distillation prototype continuous evaluation baseline

Doc llm Entity Recognizer Text Categorizer … human-in- the-loop distillation
prototype continuous evaluation baseline

production Doc Doc llm Entity Recognizer Text Categorizer … human-in-
the-loop distillation prototype continuous evaluation baseline

production Doc Doc llm Entity Recognizer Text Categorizer … human-in-
the-loop distillation prototype continuous evaluation baseline distilled model

Thank you! Explosion spaCy Prodigy Twitter Mastodon Bluesky explosion.ai spacy.io
prodigy.ai @_inesmontani @[email protected] @inesmontani.bsky.social LinkedIn

spaCy meets LLMs: Using Generative AI for Struc...

spaCy meets LLMs: Using Generative AI for Structured Data

Resources

Using LLMs for structured data in spaCy

A practical guide to human-in-the-loop distillation

Using LLMs for human-in-the-loop distillation in Prodigy

More Decks by Ines Montani

Other Decks in Programming

Featured

Transcript