Ines Montani & Ryan Wesslen Explosion
Half hour of
labeling power
Can we beat GPT?
Slide 2
Slide 2 text
spacy.io
Slide 3
Slide 3 text
spacy.io
Open-source library for
industrial-strength natural
language processing
170m+
downloads
Slide 4
Slide 4 text
spacy.io
prodigy.ai
Open-source library for
industrial-strength natural
language processing
170m+
downloads
Slide 5
Slide 5 text
spacy.io
prodigy.ai
Open-source library for
industrial-strength natural
language processing
170m+
downloads
Modern scriptable
annotation tool for
machine learning
developers
9k+
users
800+
companies
Slide 6
Slide 6 text
spacy.io
prodigy.ai
Open-source library for
industrial-strength natural
language processing
170m+
downloads
Modern scriptable
annotation tool for
machine learning
developers
9k+
users
800+
companies
prodigy.ai/teams
Slide 7
Slide 7 text
spacy.io
prodigy.ai
Open-source library for
industrial-strength natural
language processing
170m+
downloads
Modern scriptable
annotation tool for
machine learning
developers
9k+
users
800+
companies
prodigy.ai/teams
Collaborative data
development platform
GPT-4
API
Alex Smith
Developer
Annotation Guidelines
DISH known food dishes, e.g. lobster
ravioli, garlic bread
INGREDIENT
EQUIPMENT
individual parts of a food dish,
including herbs and spices
any kind of cooking equipment,
e.g. oven, cooking pot, grill
Slide 17
Slide 17 text
annotate
evaluate
update
Slide 18
Slide 18 text
annotate
evaluate
update
1
Slide 19
Slide 19 text
annotate
evaluate
update
1
resolve disagreements
retrospective meetings
assess if more data is needed
2
Slide 20
Slide 20 text
annotate
evaluate
update
1
resolve disagreements
retrospective meetings
assess if more data is needed
2
update annotation guidelines
add more examples
expand label definitions
3
Annotation Guidelines
DISH known food dishes, e.g. lobster
ravioli, garlic bread
INGREDIENT
EQUIPMENT
individual parts of a food dish,
including herbs and spices
any kind of cooking equipment,
e.g. oven, cooking pot, grill
prodigy.ai/features/task-routing
Use task routing to
distribute workloads and
determine inter-annotator
agreement.
pro tip:
Slide 27
Slide 27 text
koaning.io/posts/large-disagreement-models
Focus on examples where
models disagree, similar
to active learning.
pro tip:
Slide 28
Slide 28 text
ChatGPT
Use generative models to
create spaCy rule sets!
pro tip:
spacy.io/usage/rule-based-matching
Slide 29
Slide 29 text
Takeaways
Generative complements predictive, it
doesn't replace it.
Slide 30
Slide 30 text
Takeaways
Generative complements predictive, it
doesn't replace it.
Use generative models to create
better, more accurate, faster, smaller
and private task-specific models.
Slide 31
Slide 31 text
Takeaways
Generative complements predictive, it
doesn't replace it.
Use generative models to create
better, more accurate, faster, smaller
and private task-specific models.
With good tooling, you can make
human input more e icient.