Applied NLP in the Age of Generative AI

Ines Montani Explosion LLM

270m+ 270m+ spaC y Open-source library for industrial- strength natural
language processing spacy.io downloads

270m+ 270m+ spaC y ChatGPT can write spaCy code! Open-source
library for industrial- strength natural language processing spacy.io downloads

900+ 10k+ Prodi g y Modern scriptable annotation tool for
machine learning developers prodigy.ai 900+ companies 10k+ users

900+ 10k+ Prodi g y Modern scriptable annotation tool for
machine learning developers prodigy.ai Alex Smith Developer Kim Miller Analyst GPT-4 API 900+ companies 10k+ users

B ack to our r oots! explosion.ai/blog/back-to-our-roots We’re back to
running Explosion as a smaller, independent-minded and self-su ff icient company.

B ack to our r oots! explosion.ai/blog/back-to-our-roots We’re back to
running Explosion as a smaller, independent-minded and self-su ff icient company. Consulting open source developer tools

Falcon MIXTRAL GPT-4 LLM

Falcon MIXTRAL GPT-4 good contextual results LLM

Falcon MIXTRAL GPT-4 good contextual results easy to use &
configure LLM

Falcon MIXTRAL GPT-4 good contextual results easy to use &
configure fast prototyping LLM

Falcon MIXTRAL GPT-4 good contextual results ⚠ transparency easy to
use & configure fast prototyping LLM

Falcon MIXTRAL GPT-4 good contextual results ⚠ transparency ⚠ e
iciency easy to use & configure fast prototyping LLM

Falcon MIXTRAL GPT-4 good contextual results ⚠ data privacy ⚠
transparency ⚠ e iciency easy to use & configure fast prototyping LLM

de fi nition s E volution

de fi nition s E volution rules or instructions ✍
programming & rules

programming & rules machine learning examples 📝 supervised learning

programming & rules machine learning examples 📝 supervised learning in-context learning rules or instructions ✍ LLM prompt engineering

programming & rules machine learning examples 📝 supervised learning in-context learning rules or instructions ✍ LLM prompt engineering instructions: human-shaped, easy for non-experts, risk of data drift ✍

programming & rules machine learning examples 📝 supervised learning in-context learning rules or instructions ✍ LLM prompt engineering instructions: human-shaped, easy for non-experts, risk of data drift ✍ 📝 examples: nuanced and intuitive behaviors, specific to use case, labor-intensive

programming & rules machine learning examples 📝 supervised learning in-context learning rules or instructions ✍ LLM prompt engineering ? ? LLM instructions: human-shaped, easy for non-experts, risk of data drift ✍ 📝 examples: nuanced and intuitive behaviors, specific to use case, labor-intensive

P rototype task-specific output 💬 prompt 📖 text LLM GPT-4
API

P rototype task-specific output 💬 prompt 📖 text LLM prompt
model & transform output to structured data github.com/explosion/spacy-llm GPT-4 API

📖 text task-specific output P roduction P rototype task-specific output
💬 prompt 📖 text LLM prompt model & transform output to structured data github.com/explosion/spacy-llm GPT-4 API

💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm GPT-4 API

💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm ✅ modular GPT-4 API

💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm ✅ small & fast ✅ modular GPT-4 API

💬 prompt 📖 text LLM distilled task-specific components prompt model & transform output to structured data github.com/explosion/spacy-llm ✅ data-private ✅ small & fast ✅ modular GPT-4 API

in the loop H uma n explosion.ai/blog/human-in-the-loop-distillation LLM

in the loop H uma n explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline
LLM

LLM prompting

LLM prompting transfer learning CO M PO N EN T

LLM prompting transfer learning CO M PO N EN T distilled model

99% 99% Case Stud y : S&P Global • real-time
commodities trading insights by extracting structured attributes 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

commodities trading insights by extracting structured attributes • high-security environment 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

commodities trading insights by extracting structured attributes • high-security environment • used LLM during annotation • 10× data development speedup with humans and model in the loop • 8 market pipelines in production 6mb 6mb model size 16k+ 16k+ words/second F-score explosion.ai/blog/sp-global-commodities

Refactor your code and data.

Software 1.0 Software 1.0 📄 code 💾 program compiler

Software 1.0 Software 1.0 📄 code 💾 program compiler Software
2.0 Software 2.0 📊 data 🔮 model algorithm

2.0 Software 2.0 📊 data 🔮 model algorithm ✅ tests 📈 evaluation

2.0 Software 2.0 📊 data 🔮 model algorithm ✅ tests 📈 evaluation refactoring refactoring

2.0 Software 2.0 📊 data 🔮 model algorithm ✅ tests 📈 evaluation refactoring refactoring iteration iteration

I lo v e cats. SIMILAR OR NOT? I ha
t e cats.

I lo v e cats. SIMILAR OR NOT? I ha
t e cats. Your application context always matters!

Serve with a cold beer and a small bowl of
Cheetos on the side. spacy.fyi/pydata-nyc Mix the Cheetos with the breadcrumbs and crush them with a rolling pin. INGREDIENT DISH EQUIPMENT WHICH LABEL?

Cheetos on the side. spacy.fyi/pydata-nyc Mix the Cheetos with the breadcrumbs and crush them with a rolling pin. INGREDIENT DISH EQUIPMENT WHICH LABEL? We beat few-shot GPT baseline with 20× speedup!

Cheetos on the side. spacy.fyi/pydata-nyc Mix the Cheetos with the breadcrumbs and crush them with a rolling pin. INGREDIENT DISH EQUIPMENT WHICH LABEL? Serve with a cold beer and a small bowl of Cheetos on the side. Mix the Cheetos with the breadcrumbs and crush them with a rolling pin. EQUIPMENT We beat few-shot GPT baseline with 20× speedup!

Cheetos on the side. spacy.fyi/pydata-nyc Mix the Cheetos with the breadcrumbs and crush them with a rolling pin. INGREDIENT DISH EQUIPMENT WHICH LABEL? Serve with a cold beer and a small bowl of Cheetos on the side. Mix the Cheetos with the breadcrumbs and crush them with a rolling pin. EQUIPMENT ADJ NOUN We beat few-shot GPT baseline with 20× speedup!

F actor out busi n ess logic MODEL

F actor out busi n ess logic result = business_logic(classification(text))
MODEL

MODEL words, grammar, syntax information in the text

MODEL external knowledge facts that can change over time words, grammar, syntax information in the text

P ro tip: Try to think about the text from the model’s point of view! MODEL external knowledge facts that can change over time words, grammar, syntax information in the text

1 year 1 year 6× Case Study: GitLab Case Stud
y : GitLab • extract actionable insights from support tickets and usage questions 6× speedup of support tickets explosion.ai/blog/gitlab-support-insights

y : GitLab • extract actionable insights from support tickets and usage questions • high-security environment 6× speedup of support tickets explosion.ai/blog/gitlab-support-insights

y : GitLab • extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions 6× speedup of support tickets explosion.ai/blog/gitlab-support-insights

y : GitLab • extract actionable insights from support tickets and usage questions • high-security environment • easy to adapt to new scenarios and business questions • separated general-purpose features from product-specific logic 6× speedup of support tickets explosion.ai/blog/gitlab-support-insights

spacy.fyi/ie-bootstrapping 💬 question ⚙ vectorizer query answers 📚 vector DB
📖 snippets + ⚙ vectorizer RAG RAG Retrieval-Augmented Generation

📖 snippets + ⚙ vectorizer 💬 question ⚙ text-to-SQL query data 📦 NLP pipeline 📖 texts + RIE RIE Retrieval via Information Extraction RAG RAG Retrieval-Augmented Generation

📖 snippets + ⚙ vectorizer 💬 question ⚙ text-to-SQL query data 📦 NLP pipeline 📖 texts + RIE RIE Retrieval via Information Extraction RAG RAG Retrieval-Augmented Generation refactoring and introducing constraints iteration

Language is just another interface.

“knocker-uppers”

The Window K nocking Machine Tes t ines.io/blog/window-knocking-machine-test “knocker-uppers”

The Window K nocking Machine Tes t ines.io/blog/window-knocking-machine-test Are you
designing a window-knocking machine or an alarm clock? “knocker-uppers”

Hello, I ’ m Toni ’ s virtual assistant and
I help schedule appointments. Do you have time at 1pm on Monday? No, but Tuesday would work for me. Okay, please confirm: Tuesday at 1pm? 1pm is unideal but 3pm would work. Toni doesn ’ t have availability at 3pm but I could offer a slot at 4pm or 5 : 30pm. Which time zone is this by the way? I ’ m in CET. ines.io/blog/window-knocking-machine-test

I help schedule appointments. Do you have time at 1pm on Monday? No, but Tuesday would work for me. Okay, please confirm: Tuesday at 1pm? 1pm is unideal but 3pm would work. Toni doesn ’ t have availability at 3pm but I could offer a slot at 4pm or 5 : 30pm. Which time zone is this by the way? I ’ m in CET. Calendly ines.io/blog/window-knocking-machine-test

I help schedule appointments. Do you have time at 1pm on Monday? No, but Tuesday would work for me. Okay, please confirm: Tuesday at 1pm? 1pm is unideal but 3pm would work. Toni doesn ’ t have availability at 3pm but I could offer a slot at 4pm or 5 : 30pm. Which time zone is this by the way? I ’ m in CET. Calendly “window-knocking machine” “alarm clock” ines.io/blog/window-knocking-machine-test

What ’ s the total services revenue from 2023? $2,923,531
How many clients is that in total? 29 ⏺ ⏺ ⏺ ines.io/blog/window-knocking-machine-test

What ’ s the total services revenue from 2023? $2,923,531
How many clients is that in total? 29 ⏺ ⏺ ⏺ 🔮 LLM 📚 database 🤖 agents ⚙ query Retrieval-Augmented Generation ines.io/blog/window-knocking-machine-test

2023 Year Services Type ACME Inc. FooBar GmbH NLPCorp XKCD
Ltd. Python AG 432,032 82,000 1,500 193,000 91,320 $ 2,625,032 Clients (28) Revenue What ’ s the total services revenue from 2023? $2,923,531 How many clients is that in total? 29 ⏺ ⏺ ⏺ 🔮 LLM 📚 database 🤖 agents ⚙ query Retrieval-Augmented Generation ines.io/blog/window-knocking-machine-test

2023 Year Services Type ACME Inc. FooBar GmbH NLPCorp XKCD
Ltd. Python AG 432,032 82,000 1,500 193,000 91,320 $ 2,625,032 Clients (28) Revenue A I still needs produc t decisions! Kim Miller Analyst What ’ s the total services revenue from 2023? $2,923,531 How many clients is that in total? 29 ⏺ ⏺ ⏺ 🔮 LLM 📚 database 🤖 agents ⚙ query Retrieval-Augmented Generation ines.io/blog/window-knocking-machine-test

Summar y APPLIED NLP & GEN AI APPLIED NLP &
GEN AI

Reason and refactor. The key to success lies in your
data and may surprise you! Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI

data and may surprise you! Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI Think beyond chat bots. You don’t want to build a “window-knocking machine”.

data and may surprise you! LLM Stay ambitious. Don’t compromise on best practices, e iciency and privacy. Summar y APPLIED NLP & GEN AI APPLIED NLP & GEN AI Think beyond chat bots. You don’t want to build a “window-knocking machine”.

Explosion spaCy Prodigy Twitter Mastodon Bluesky explosion.ai spacy.io prodigy.ai @_inesmontani
@[email protected] @inesmontani.bsky.social LinkedIn

Applied NLP in the Age of Generative AI

Applied NLP in the Age of Generative AI

Video

Resources

A practical guide to human-in-the-loop distillation

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

How GitLab uses spaCy to analyze support tickets and empower their community

Applied NLP Thinking: How to Translate Problems into Solutions

The Window-Knocking Machine Test

Practical Tips for Bootstrapping Information Extraction Pipelines

More Decks by Ines Montani

Other Decks in Programming

Featured

Transcript