Vibe NLP for Applied NLP

Ines Montani Explosion

Open-source library for industrial-strength natural language processing spacy.io 600m+ downloads

Open-source library for industrial-strength natural language processing spacy.io LLMs are
really good at spaCy code! 600m+ downloads

users companies Modern scriptable annotation tool for machine learning developers
prodigy.ai 12k+ 1000+

users companies fully scriptable in Python Alex Smith Developer Kim
Miller Analyst GPT-5 API Modern scriptable annotation tool for machine learning developers prodigy.ai 12k+ 1000+

of coding assistants

help developer implement code for the given tools of coding
assistants

help developer implement code for the given tools help developer
pick the right tools and implement code of coding assistants

pick the right tools and implement code of coding assistants solve a business problem

pick the right tools and implement code of coding assistants solve a business problem “I need to analyze these company reports and create a table of the total spending on di ff erent types of IT services over time.” 2025.pdf 2024.pdf 2023.pdf

reverse these strings

reverse these strings write a script to reverse strings

reverse these strings write a script to reverse strings prompt
program

list all company names in the text write a script
to extract company names from text

as the system

as the system to build

sort these documents into custom categories create data and train
a classifier for custom categories

sort these documents into custom categories create data and train
a classifier for custom categories accurate fast small private cheap modular ...

Why don’t we just build ?

Why don’t we just build ? lots of steps

Why don’t we just build ? lots of steps operational
complexity

complexity expertise

complexity expertise coding agents are really good at software 1.0 Software 1.0

complexity expertise coding agents are really good at software 1.0 Software 1.0 kind of

complexity expertise coding agents are really good at software 1.0 Software 1.0 Software 2.0 code + data kind of

complexity expertise coding agents are really good at software 1.0 Software 1.0 Software 2.0 code + data agentic kind of

✳ Claude Code beta.ellf.ai

✳ Claude Code beta.ellf.ai ❯ We need to extract product
mentions and pricing from company press releases. We started with an LLM inference API but it's too slow and expensive, plus we can't send confidential data to an external API. How can we train our own model?

✳ Claude Code beta.ellf.ai ❯ We need to extract product
mentions and pricing from company press releases. We started with an LLM inference API but it's too slow and expensive, plus we can't send confidential data to an external API. How can we train our own model? ⏺ Yes, and this is exactly the right call. What you're describing — structured extraction from a known document type — is precisely where a supervised model beats an LLM on every dimension: speed, cost, privacy, and accuracy once you have enough examples. The LLM you've been using is essentially doing a job that a small NER model will do 10–100× faster, for low marginal cost, entirely on-premise. My rough pipeline sketch for your case:

✳ Claude Code beta.ellf.ai distilled expertise from 10+ years of
solving NLP problems ❯ We need to extract product mentions and pricing from company press releases. We started with an LLM inference API but it's too slow and expensive, plus we can't send confidential data to an external API. How can we train our own model? ⏺ Yes, and this is exactly the right call. What you're describing — structured extraction from a known document type — is precisely where a supervised model beats an LLM on every dimension: speed, cost, privacy, and accuracy once you have enough examples. The LLM you've been using is essentially doing a job that a small NER model will do 10–100× faster, for low marginal cost, entirely on-premise. My rough pipeline sketch for your case:

✳ Claude Code beta.ellf.ai distilled expertise from 10+ years of
solving NLP problems PROJECT_PLAN.md ❯ We need to extract product mentions and pricing from company press releases. We started with an LLM inference API but it's too slow and expensive, plus we can't send confidential data to an external API. How can we train our own model? ⏺ Yes, and this is exactly the right call. What you're describing — structured extraction from a known document type — is precisely where a supervised model beats an LLM on every dimension: speed, cost, privacy, and accuracy once you have enough examples. The LLM you've been using is essentially doing a job that a small NER model will do 10–100× faster, for low marginal cost, entirely on-premise. My rough pipeline sketch for your case:

✳ Claude Code ❯ / beta.ellf.ai/docs/modules

✳ Claude Code ❯ / /ellf-project Plan the project and
break down business problem beta.ellf.ai/docs/modules

break down business problem /ellf-annotate Define labels, components and annotate data beta.ellf.ai/docs/modules

break down business problem /ellf-annotate Define labels, components and annotate data /ellf-prodigy Implement workflows for the Prodigy annotation tool beta.ellf.ai/docs/modules

break down business problem /ellf-annotate Define labels, components and annotate data /ellf-prodigy Implement workflows for the Prodigy annotation tool /ellf-patterns Interactively develop patterns and rule-based logic beta.ellf.ai/docs/modules

break down business problem /ellf-annotate Define labels, components and annotate data /ellf-prodigy Implement workflows for the Prodigy annotation tool /ellf-patterns Interactively develop patterns and rule-based logic /ellf-train Train and export custom models and components beta.ellf.ai/docs/modules

break down business problem /ellf-annotate Define labels, components and annotate data /ellf-prodigy Implement workflows for the Prodigy annotation tool /ellf-patterns Interactively develop patterns and rule-based logic /ellf-train Train and export custom models and components /ellf-ask Ask specific questions and query the knowledge base beta.ellf.ai/docs/modules

break down business problem /ellf-annotate Define labels, components and annotate data /ellf-prodigy Implement workflows for the Prodigy annotation tool /ellf-patterns Interactively develop patterns and rule-based logic /ellf-train Train and export custom models and components /ellf-ask Ask specific questions and query the knowledge base beta.ellf.ai/docs/modules will be open- sourced soon! skills

We need to run lots of stu ff !

training We need to run lots of stu ff !

optimization

analysis optimization

training evaluation We need to run lots of stu ff
! analysis optimization

training data transformation evaluation We need to run lots of
stu ff ! analysis optimization

stu ff ! analysis testing optimization

stu ff ! analysis testing optimization annotation tasks auto-annotation agents

training data transformation evaluation various scripts We need to run
lots of stu ff ! analysis testing optimization annotation tasks auto-annotation agents

lots of stu ff ! … analysis testing optimization annotation tasks auto-annotation agents

lots of stu ff ! … analysis testing GPU optimization annotation tasks auto-annotation agents

lots of stu ff ! … analysis testing often sensitive data GPU optimization annotation tasks auto-annotation agents

lots of stu ff ! … analysis testing often sensitive data GPU fully custom optimization annotation tasks auto-annotation agents

Kubernetes cluster beta.ellf.ai

hosted by user local or cloud Kubernetes cluster beta.ellf.ai

hosted by user local or cloud Kubernetes cluster tasks beta.ellf.ai

hosted by user local or cloud Kubernetes cluster tasks actions
beta.ellf.ai

hosted by user local or cloud Kubernetes cluster tasks actions
agents beta.ellf.ai

beta.ellf.ai

beta.ellf.ai humans

beta.ellf.ai humans agents

Human + coding agent collaboration beta.ellf.ai

Human + coding agent collaboration @task_recipe( title="My Recipe", field_props={ "input":
TextProps(title="Input data"), "labels": ListProps(title="Labels"), "dataset": props.dataset_choice, } ) def my_recipe(input: Input, labels: List[str], dataset: Dataset): ... Recipe beta.ellf.ai

TextProps(title="Input data"), "labels": ListProps(title="Labels"), "dataset": props.dataset_choice, } ) def my_recipe(input: Input, labels: List[str], dataset: Dataset): ... Recipe $ ellf tasks create my_recipe -- input "Corpus" - - labels PRODUCT,ORG - - dataset products CLI beta.ellf.ai

TextProps(title="Input data"), "labels": ListProps(title="Labels"), "dataset": props.dataset_choice, } ) def my_recipe(input: Input, labels: List[str], dataset: Dataset): ... Recipe $ ellf tasks create my_recipe -- input "Corpus" - - labels PRODUCT,ORG - - dataset products CLI Input data input Corpus Labels labels PRODUCT ORG Dataset dataset products UI Start task beta.ellf.ai

TextProps(title="Input data"), "labels": ListProps(title="Labels"), "dataset": props.dataset_choice, } ) def my_recipe(input: Input, labels: List[str], dataset: Dataset): ... Recipe $ ellf tasks create my_recipe -- input "Corpus" - - labels PRODUCT,ORG - - dataset products CLI scripting the SaaS application! Input data input Corpus Labels labels PRODUCT ORG Dataset dataset products UI Start task beta.ellf.ai

beta.ellf.ai App and developer API need to speak the same
language!

Chat with Ellf beta.ellf.ai App and developer API need to
speak the same language!

Chat with Ellf beta.ellf.ai App and developer API need to
speak the same language! machine human

Chat with Ellf ✳ Claude Code ❯ /ellf-todo ⏺ Context
summary: Phase 1 of an e-commerce aspect extraction project. The task is to write loader.py for the internal data warehouse API that segments sentences and yields {id, text, date} dictionaries. ✔ Successfully updated coding request 523060d8-b2db-4fd5-b541-5ee119007501 beta.ellf.ai App and developer API need to speak the same language! machine human

Software 2.0 agentic We’ll still need...

Software 2.0 agentic We’ll still need... software

Software 2.0 agentic We’ll still need... software ML systems

Software 2.0 agentic We’ll still need... software ML systems classifiers

Software 2.0 agentic We’ll still need... software ML systems interfaces
for humans classifiers

Use LLMs to build the system, not as the system.

Apps, developer APIs and agents need to speak the same language.

Apps, developer APIs and agents need to speak the same language. Software 2.0 is a good idea and we can finally make it easy!

explosion.ai spacy.io prodigy.ai beta.ellf.ai @inesmontani.bsky.social @[email protected] Explosion spaCy Prodigy Ellf
Bluesky Mastodon LinkedIn

explosion.ai spacy.io prodigy.ai beta.ellf.ai @inesmontani.bsky.social @[email protected] Explosion spaCy Prodigy Ellf
Bluesky Mastodon more details and waitlist LinkedIn

Vibe NLP for Applied NLP

Vibe NLP for Applied NLP

Video

Resources

Ellf: Your Virtual NLP Engineer (beta)

Building AI with AI

A practical guide to human-in-the-loop distillation

More Decks by Ines Montani

Other Decks in Programming

Featured

Transcript