Slide 1

Slide 1 text

Ines Montani & Ryan Wesslen Explosion Half hour of labeling power Can we beat GPT?

Slide 2

Slide 2 text

spacy.io

Slide 3

Slide 3 text

spacy.io Open-source library for industrial-strength natural language processing 170m+ downloads

Slide 4

Slide 4 text

spacy.io prodigy.ai Open-source library for industrial-strength natural language processing 170m+ downloads

Slide 5

Slide 5 text

spacy.io prodigy.ai Open-source library for industrial-strength natural language processing 170m+ downloads Modern scriptable annotation tool for machine learning developers 9k+ users 800+ companies

Slide 6

Slide 6 text

spacy.io prodigy.ai Open-source library for industrial-strength natural language processing 170m+ downloads Modern scriptable annotation tool for machine learning developers 9k+ users 800+ companies prodigy.ai/teams

Slide 7

Slide 7 text

spacy.io prodigy.ai Open-source library for industrial-strength natural language processing 170m+ downloads Modern scriptable annotation tool for machine learning developers 9k+ users 800+ companies prodigy.ai/teams Collaborative data development platform GPT-4 API Alex Smith Developer

Slide 8

Slide 8 text

Generative ! single/multi-doc summarization " reasoning ✅ problem solving ✍ paraphrasing % style transfer ❓question answering Predictive ' text classification ( relation extraction ) coreference * grammar & morphology + entity recognition , semantic parsing - discourse structure

Slide 9

Slide 9 text

SST2 AG News Banking77 GPT-3 65 70 75 80 85 90 95 100 1% 5% 10% 20% 50% 100% Text Classification

Slide 10

Slide 10 text

SST2 AG News Banking77 GPT-3 65 70 75 80 85 90 95 100 1% 5% 10% 20% 50% 100% Text Classification 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 FabNER Claude 2 Entity Recognition

Slide 11

Slide 11 text

SST2 AG News Banking77 GPT-3 65 70 75 80 85 90 95 100 1% 5% 10% 20% 50% 100% Text Classification 10 20 30 40 50 60 70 80 90 100 0 100 200 300 400 500 FabNER Claude 2 Entity Recognition

Slide 12

Slide 12 text

1

Slide 13

Slide 13 text

1 2

Slide 14

Slide 14 text

Correct LLM few-shot results

Slide 15

Slide 15 text

Correct LLM few-shot results

Slide 16

Slide 16 text

Annotation Guidelines DISH known food dishes, e.g. lobster ravioli, garlic bread INGREDIENT EQUIPMENT individual parts of a food dish, including herbs and spices any kind of cooking equipment, e.g. oven, cooking pot, grill

Slide 17

Slide 17 text

annotate evaluate update

Slide 18

Slide 18 text

annotate evaluate update 1

Slide 19

Slide 19 text

annotate evaluate update 1 resolve disagreements retrospective meetings assess if more data is needed 2

Slide 20

Slide 20 text

annotate evaluate update 1 resolve disagreements retrospective meetings assess if more data is needed 2 update annotation guidelines add more examples expand label definitions 3

Slide 21

Slide 21 text

spacy-llm config prompt template spacy.io/usage/large-language-models

Slide 22

Slide 22 text

Evaluation Results 0 20 40 60 80 100 Zero-shot Chain-of-thought Few-shot Task-specific ? F DISH (F) INGREDIENT (F) EQUIPMENT (F)

Slide 23

Slide 23 text

Evaluation Results 0 20 40 60 80 100 Zero-shot Chain-of-thought Few-shot Task-specific ? F DISH (F) INGREDIENT (F) EQUIPMENT (F)

Slide 24

Slide 24 text

Annotation Guidelines DISH known food dishes, e.g. lobster ravioli, garlic bread INGREDIENT EQUIPMENT individual parts of a food dish, including herbs and spices any kind of cooking equipment, e.g. oven, cooking pot, grill

Slide 25

Slide 25 text

Evaluation Results 0 20 40 60 80 100 Zero-shot Chain-of-thought Few-shot Task-specific F DISH (F) INGREDIENT (F) EQUIPMENT (F) 2000 words/second

Slide 26

Slide 26 text

prodigy.ai/features/task-routing Use task routing to distribute workloads and determine inter-annotator agreement. pro tip:

Slide 27

Slide 27 text

koaning.io/posts/large-disagreement-models Focus on examples where models disagree, similar to active learning. pro tip:

Slide 28

Slide 28 text

ChatGPT Use generative models to create spaCy rule sets! pro tip: spacy.io/usage/rule-based-matching

Slide 29

Slide 29 text

Takeaways Generative complements predictive, it doesn't replace it.

Slide 30

Slide 30 text

Takeaways Generative complements predictive, it doesn't replace it. Use generative models to create better, more accurate, faster, smaller and private task-specific models.

Slide 31

Slide 31 text

Takeaways Generative complements predictive, it doesn't replace it. Use generative models to create better, more accurate, faster, smaller and private task-specific models. With good tooling, you can make human input more e icient.

Slide 32

Slide 32 text

thank you! Explosion spaCy Prodigy explosion.ai spacy.io prodigy.ai Twitter Mastodon Bluesky @explosion_ai @[email protected] @explosion-ai.bsky.social LinkedIn