Slide 1

Slide 1 text

PRACTICAL TIPS FOR BOOTSTRAPPING INFORMATION EXTRACTION PIPELINES Matthew Honnibal Explosion ๐Ÿค  You Developer GPT-4 API

Slide 2

Slide 2 text

Open-source library for industrial-strength natural language processing spacy.io SPACY 250m+ downloads

Slide 3

Slide 3 text

Open-source library for industrial-strength natural language processing spacy.io SPACY 250m+ downloads ChatGPT can write spaCy code!

Slide 4

Slide 4 text

900+ companies 10k+ users Modern scriptable annotation tool for machine learning developers prodigy.ai PRODIGY

Slide 5

Slide 5 text

900+ companies 10k+ users Alex Smith Developer Kim Miller Analyst GPT-4 API Modern scriptable annotation tool for machine learning developers prodigy.ai PRODIGY

Slide 6

Slide 6 text

Weโ€™re back to running Explosion as a smaller, independent-minded and self-su ff icient company. explosion.ai/blog/back-to-our-roots BACK TO OUR ROOTS

Slide 7

Slide 7 text

Weโ€™re back to running Explosion as a smaller, independent-minded and self-su ff icient company. explosion.ai/blog/back-to-our-roots Consulting open source developer tools BACK TO OUR ROOTS

Slide 8

Slide 8 text

WHAT I MEAN BY INFORMATION EXTRACTION

Slide 9

Slide 9 text

WHAT I MEAN BY INFORMATION EXTRACTION ๐Ÿ“ Turn text into data. Make a database from earnings reports, or skills in job postings, or product feedback in social media โ€“ many more.

Slide 10

Slide 10 text

WHAT I MEAN BY INFORMATION EXTRACTION ๐Ÿ“ Turn text into data. Make a database from earnings reports, or skills in job postings, or product feedback in social media โ€“ many more. ๐Ÿ—‚ Lots of subtasks. Text classification, named entity recognition, entity linking, relation extraction can all be part of an information extraction pipeline.

Slide 11

Slide 11 text

WHAT I MEAN BY INFORMATION EXTRACTION ๐Ÿ“ Turn text into data. Make a database from earnings reports, or skills in job postings, or product feedback in social media โ€“ many more. ๐Ÿ—‚ Lots of subtasks. Text classification, named entity recognition, entity linking, relation extraction can all be part of an information extraction pipeline. ๐ŸŽฏ Mostly static schema. Most people are solving one problem at a time, so thatโ€™s what Iโ€™ll focus on.

Slide 12

Slide 12 text

Database โ€œHooli raises $5m to revolutionize search, led by ACME Venturesโ€

Slide 13

Slide 13 text

COMPANY COMPANY named entity recognition Database โ€œHooli raises $5m to revolutionize search, led by ACME Venturesโ€

Slide 14

Slide 14 text

COMPANY COMPANY named entity recognition MONEY currency normalization Database โ€œHooli raises $5m to revolutionize search, led by ACME Venturesโ€

Slide 15

Slide 15 text

COMPANY COMPANY named entity recognition MONEY currency normalization 5923214 1681056 custom database lookup entity disambiguation Database โ€œHooli raises $5m to revolutionize search, led by ACME Venturesโ€

Slide 16

Slide 16 text

COMPANY COMPANY named entity recognition MONEY currency normalization INVESTOR entity relation extraction 5923214 1681056 custom database lookup entity disambiguation Database โ€œHooli raises $5m to revolutionize search, led by ACME Venturesโ€

Slide 17

Slide 17 text

๐Ÿ’ฌ question โš™ text-to-SQL query data ๐Ÿ“ฆ NLP pipeline ๐Ÿ“– texts + RIE: RETRIEVAL VIA INFORMATION EXTRACTION

Slide 18

Slide 18 text

๐Ÿ’ฌ question โš™ text-to-SQL query data ๐Ÿ“ฆ NLP pipeline ๐Ÿ“– texts + RIE: RETRIEVAL VIA INFORMATION EXTRACTION RAG: RETRIEVAL-AUGMENTED GENERATION ๐Ÿ’ฌ question โš™ vectorizer query answers ๐Ÿ“š vector DB ๐Ÿ“– snippets + โš™ vectorizer

Slide 19

Slide 19 text

TALK OUTLINE ๐Ÿ’ก

Slide 20

Slide 20 text

Training tips 1. TALK OUTLINE ๐Ÿ’ก

Slide 21

Slide 21 text

Training tips 1. Modelling tips 2. TALK OUTLINE ๐Ÿ’ก

Slide 22

Slide 22 text

Training tips 1. Modelling tips 2. Data annotation tips 3. TALK OUTLINE ๐Ÿ’ก

Slide 23

Slide 23 text

SUPERVISED LEARNING IS STILL VERY STRONG Example data is super powerful.

Slide 24

Slide 24 text

SUPERVISED LEARNING IS STILL VERY STRONG Example data is super powerful. Example data can do things that instructions canโ€™t.

Slide 25

Slide 25 text

SUPERVISED LEARNING IS STILL VERY STRONG Example data is super powerful. Example data can do things that instructions canโ€™t. In-context learning canโ€™t use examples scalably.

Slide 26

Slide 26 text

KNOW YOUR ENEMIES What makes supervised learning hard?

Slide 27

Slide 27 text

product vision ๐Ÿ‘ chicken-and- egg problem KNOW YOUR ENEMIES What makes supervised learning hard?

Slide 28

Slide 28 text

product vision ๐Ÿ‘ chicken-and- egg problem KNOW YOUR ENEMIES What makes supervised learning hard? accuracy estimate ๐Ÿ“ˆ

Slide 29

Slide 29 text

product vision ๐Ÿ‘ chicken-and- egg problem KNOW YOUR ENEMIES What makes supervised learning hard? accuracy estimate ๐Ÿ“ˆ training & evaluation ๐Ÿ”ฎ

Slide 30

Slide 30 text

product vision ๐Ÿ‘ chicken-and- egg problem KNOW YOUR ENEMIES What makes supervised learning hard? accuracy estimate ๐Ÿ“ˆ training & evaluation ๐Ÿ”ฎ labelled data ๐Ÿ“š

Slide 31

Slide 31 text

product vision ๐Ÿ‘ chicken-and- egg problem KNOW YOUR ENEMIES What makes supervised learning hard? accuracy estimate ๐Ÿ“ˆ training & evaluation ๐Ÿ”ฎ labelled data ๐Ÿ“š annotation scheme ๐Ÿท

Slide 32

Slide 32 text

RESULTS ARE HARD TO INTERPRET

Slide 33

Slide 33 text

RESULTS ARE HARD TO INTERPRET ๐Ÿ˜ฌ Model doesnโ€™t train at all. Is the data messed up somehow?

Slide 34

Slide 34 text

RESULTS ARE HARD TO INTERPRET ๐Ÿ˜ฌ Model doesnโ€™t train at all. Is the data messed up somehow? ๐Ÿคจ Model learns barely better than chance. Could be data, hyper-parameters, modellingโ€ฆ

Slide 35

Slide 35 text

RESULTS ARE HARD TO INTERPRET ๐Ÿ˜ฌ Model doesnโ€™t train at all. Is the data messed up somehow? ๐Ÿคจ Model learns barely better than chance. Could be data, hyper-parameters, modellingโ€ฆ ๐Ÿฅน Results are decent! But can it be better? How do I know if Iโ€™m missing out?

Slide 36

Slide 36 text

RESULTS ARE HARD TO INTERPRET ๐Ÿ˜ฌ Model doesnโ€™t train at all. Is the data messed up somehow? ๐Ÿคจ Model learns barely better than chance. Could be data, hyper-parameters, modellingโ€ฆ ๐Ÿฅน Results are decent! But can it be better? How do I know if Iโ€™m missing out? ๐Ÿค” Results are too good to be true. Probably messed up the dataโ€ฆ

Slide 37

Slide 37 text

Training โš— 1

Slide 38

Slide 38 text

FORM AND FALSIFY HYPOTHESES

Slide 39

Slide 39 text

This is the bit thatโ€™s broken. HYPOTHESIS

Slide 40

Slide 40 text

This is the bit thatโ€™s broken. HYPOTHESIS If this bit is broken, what should I expect to see? QUESTION

Slide 41

Slide 41 text

This is the bit thatโ€™s broken. HYPOTHESIS If this bit is broken, what should I expect to see? QUESTION Is that what actually happens? TEST

Slide 42

Slide 42 text

This is the bit thatโ€™s broken. HYPOTHESIS If this bit is broken, what should I expect to see? QUESTION Is that what actually happens? TEST โ€œI canโ€™t connect to this site.โ€

Slide 43

Slide 43 text

This is the bit thatโ€™s broken. HYPOTHESIS If this bit is broken, what should I expect to see? QUESTION Is that what actually happens? TEST โ€œMaybe itโ€™ll work if I reconnect to the wi-fi or if I restart my router.โ€ SOLUTION MINDSET โ€œI canโ€™t connect to this site.โ€

Slide 44

Slide 44 text

This is the bit thatโ€™s broken. HYPOTHESIS If this bit is broken, what should I expect to see? QUESTION Is that what actually happens? TEST โ€œMaybe itโ€™ll work if I reconnect to the wi-fi or if I restart my router.โ€ SOLUTION MINDSET SCIENTIFIC MINDSET โ€œIf the problem is between me and the site, other sites wonโ€™t load either. If the problem is between me and the router, I wonโ€™t be able to ping it.โ€ โ€œI canโ€™t connect to this site.โ€

Slide 45

Slide 45 text

EXAMPLES OF DEBUGGING TRAINING

Slide 46

Slide 46 text

EXAMPLES OF DEBUGGING TRAINING ๐Ÿ“‰ What happens if I train on a tiny amount of data? Does the model converge?

Slide 47

Slide 47 text

EXAMPLES OF DEBUGGING TRAINING ๐Ÿ“‰ What happens if I train on a tiny amount of data? Does the model converge? ๐Ÿ”€ What happens if I randomize the training labels? Does the model still learn?

Slide 48

Slide 48 text

EXAMPLES OF DEBUGGING TRAINING ๐Ÿ“‰ What happens if I train on a tiny amount of data? Does the model converge? ๐Ÿ”€ What happens if I randomize the training labels? Does the model still learn? ๐Ÿช„ Are my model weights changing at all during training?

Slide 49

Slide 49 text

EXAMPLES OF DEBUGGING TRAINING ๐Ÿ“‰ What happens if I train on a tiny amount of data? Does the model converge? ๐Ÿ”€ What happens if I randomize the training labels? Does the model still learn? ๐Ÿช„ Are my model weights changing at all during training? ๐Ÿงฎ Whatโ€™s the mean and variance of my gradients?

Slide 50

Slide 50 text

PRIORITIZE ROBUSTNESS NOT ACCURACY

Slide 51

Slide 51 text

๐Ÿ“ˆ Better needs to look better. You need it to not be like this:

Slide 52

Slide 52 text

๐Ÿ“ˆ Better needs to look better. You need it to not be like this: ๐Ÿ“ฆ Larger models are often less practical.

Slide 53

Slide 53 text

๐Ÿ“ˆ Better needs to look better. You need it to not be like this: ๐Ÿ“ฆ Larger models are often less practical. ๐Ÿค You need it to work with small samples.

Slide 54

Slide 54 text

๐Ÿ“ˆ Better needs to look better. You need it to not be like this: ๐Ÿ“ฆ Larger models are often less practical. ๐Ÿค You need it to work with small samples. ๐ŸŒช Large models are less stable with small batch sizes.

Slide 55

Slide 55 text

๐Ÿ”ฎ 2 Modelling

Slide 56

Slide 56 text

ITERATE ON YOUR DATA AND SCALE DOWN

Slide 57

Slide 57 text

task- specific output ๐Ÿ’ฌ prompt ๐Ÿ“– text ๐Ÿ”ฎ PROTOTYPE GPT-4 API

Slide 58

Slide 58 text

task- specific output ๐Ÿ’ฌ prompt ๐Ÿ“– text ๐Ÿ”ฎ PROTOTYPE github.com/explosion/spacy-llm prompt model & transform output to structured data GPT-4 API

Slide 59

Slide 59 text

task- specific output ๐Ÿ’ฌ prompt ๐Ÿ“– text ๐Ÿ”ฎ PROTOTYPE github.com/explosion/spacy-llm prompt model & transform output to structured data GPT-4 API ๐Ÿ“– text task- specific output PRODUCTION

Slide 60

Slide 60 text

distilled task-specific components ๐Ÿ“ฆ ๐Ÿ“ฆ ๐Ÿ“ฆ task- specific output ๐Ÿ’ฌ prompt ๐Ÿ“– text ๐Ÿ”ฎ PROTOTYPE github.com/explosion/spacy-llm prompt model & transform output to structured data GPT-4 API ๐Ÿ“– text task- specific output PRODUCTION

Slide 61

Slide 61 text

distilled task-specific components ๐Ÿ“ฆ ๐Ÿ“ฆ ๐Ÿ“ฆ task- specific output ๐Ÿ’ฌ prompt ๐Ÿ“– text ๐Ÿ”ฎ PROTOTYPE github.com/explosion/spacy-llm prompt model & transform output to structured data GPT-4 API ๐Ÿ“– text task- specific output PRODUCTION modular

Slide 62

Slide 62 text

distilled task-specific components ๐Ÿ“ฆ ๐Ÿ“ฆ ๐Ÿ“ฆ task- specific output ๐Ÿ’ฌ prompt ๐Ÿ“– text ๐Ÿ”ฎ PROTOTYPE github.com/explosion/spacy-llm prompt model & transform output to structured data GPT-4 API ๐Ÿ“– text task- specific output PRODUCTION modular small & fast

Slide 63

Slide 63 text

distilled task-specific components ๐Ÿ“ฆ ๐Ÿ“ฆ ๐Ÿ“ฆ task- specific output ๐Ÿ’ฌ prompt ๐Ÿ“– text ๐Ÿ”ฎ PROTOTYPE github.com/explosion/spacy-llm prompt model & transform output to structured data GPT-4 API ๐Ÿ“– text task- specific output PRODUCTION modular small & fast data-private

Slide 64

Slide 64 text

config.cfg spacy.io/usage/large-language-models โš™

Slide 65

Slide 65 text

config.cfg spacy.io/usage/large-language-models component โš™

Slide 66

Slide 66 text

config.cfg spacy.io/usage/large-language-models model and provider โบ โบ โบ component โš™

Slide 67

Slide 67 text

config.cfg spacy.io/usage/large-language-models model and provider โบ โบ โบ task definition and labels Named Entity Recognition, Text Classification, Relation Extraction, โ€ฆ component โš™

Slide 68

Slide 68 text

config.cfg spacy.io/usage/large-language-models label definitions to use in prompt model and provider โบ โบ โบ task definition and labels Named Entity Recognition, Text Classification, Relation Extraction, โ€ฆ component โš™

Slide 69

Slide 69 text

config.cfg spacy.io/usage/large-language-models label definitions to use in prompt model and provider โบ โบ โบ task definition and labels Named Entity Recognition, Text Classification, Relation Extraction, โ€ฆ component โš™ example from case study explosion.ai/blog/sp-global-commodities

Slide 70

Slide 70 text

Data annotation ๐Ÿ“’ 3

Slide 71

Slide 71 text

How much data do you need?

Slide 72

Slide 72 text

TRAINING =============== Train curve diagnostic =============== Training 4 times with 25%, 50%, 75%, 100% of the data % Score ner ---- ------ ------ 0% 0.00 0.00 25% 0.31 โ–ฒ 0.31 โ–ฒ 50% 0.44 โ–ฒ 0.44 โ–ฒ 75% 0.43 โ–ผ 0.43 โ–ผ 100% 0.56 โ–ฒ 0.56 โ–ฒ Prodigy How much data do you need?

Slide 73

Slide 73 text

TRAINING =============== Train curve diagnostic =============== Training 4 times with 25%, 50%, 75%, 100% of the data % Score ner ---- ------ ------ 0% 0.00 0.00 25% 0.31 โ–ฒ 0.31 โ–ฒ 50% 0.44 โ–ฒ 0.44 โ–ฒ 75% 0.43 โ–ผ 0.43 โ–ผ 100% 0.56 โ–ฒ 0.56 โ–ฒ Prodigy How much data do you need? Accuracy 0 25 50 75 100 % of examples 25 50 75 100 125 150 projection

Slide 74

Slide 74 text

TRAINING =============== Train curve diagnostic =============== Training 4 times with 25%, 50%, 75%, 100% of the data % Score ner ---- ------ ------ 0% 0.00 0.00 25% 0.31 โ–ฒ 0.31 โ–ฒ 50% 0.44 โ–ฒ 0.44 โ–ฒ 75% 0.43 โ–ผ 0.43 โ–ผ 100% 0.56 โ–ฒ 0.56 โ–ฒ Prodigy How much data do you need? Accuracy 0 25 50 75 100 % of examples 25 50 75 100 125 150 projection EVALUATION โš  You need enough data to avoid reporting meaningless precision.

Slide 75

Slide 75 text

TRAINING =============== Train curve diagnostic =============== Training 4 times with 25%, 50%, 75%, 100% of the data % Score ner ---- ------ ------ 0% 0.00 0.00 25% 0.31 โ–ฒ 0.31 โ–ฒ 50% 0.44 โ–ฒ 0.44 โ–ฒ 75% 0.43 โ–ผ 0.43 โ–ผ 100% 0.56 โ–ฒ 0.56 โ–ฒ Prodigy How much data do you need? Accuracy 0 25 50 75 100 % of examples 25 50 75 100 125 150 projection EVALUATION โš  You need enough data to avoid reporting meaningless precision. ๐Ÿ“Š Ten samples per significant figure is a good rule of thumb.

Slide 76

Slide 76 text

TRAINING =============== Train curve diagnostic =============== Training 4 times with 25%, 50%, 75%, 100% of the data % Score ner ---- ------ ------ 0% 0.00 0.00 25% 0.31 โ–ฒ 0.31 โ–ฒ 50% 0.44 โ–ฒ 0.44 โ–ฒ 75% 0.43 โ–ผ 0.43 โ–ผ 100% 0.56 โ–ฒ 0.56 โ–ฒ Prodigy How much data do you need? Accuracy 0 25 50 75 100 % of examples 25 50 75 100 125 150 projection EVALUATION โš  You need enough data to avoid reporting meaningless precision. ๐Ÿ“Š Ten samples per significant figure is a good rule of thumb. 1,000 samples is pretty good โ€“ enough for 94% vs. 95%.

Slide 77

Slide 77 text

KEEP TASKS SMALL

Slide 78

Slide 78 text

KEEP TASKS SMALL GOOD for i in range(rows): access_data(array[i]) โœ… BAD for j in range(columns): access_data(array[:, j]) โŒ

Slide 79

Slide 79 text

KEEP TASKS SMALL Humans have a cache, too! GOOD for i in range(rows): access_data(array[i]) โœ… BAD for j in range(columns): access_data(array[:, j]) โŒ

Slide 80

Slide 80 text

KEEP TASKS SMALL Humans have a cache, too! GOOD for i in range(rows): access_data(array[i]) โœ… BAD for j in range(columns): access_data(array[:, j]) โŒ DO THIS for annotation_type in annotation_types: for example in examples: annotate(example, annotation_type) โœ… NOT THIS for example in examples: for annotation_type in annotation_types: annotate(example, annotation_type) โŒ

Slide 81

Slide 81 text

USE MODEL ASSISTANCE

Slide 82

Slide 82 text

USE MODEL ASSISTANCE ๐Ÿ”ฎ Suggest annotations however you can. Rule- based, initial trained model, an LLM โ€“ or a combination of all.

Slide 83

Slide 83 text

USE MODEL ASSISTANCE ๐Ÿ”ฎ Suggest annotations however you can. Rule- based, initial trained model, an LLM โ€“ or a combination of all. Suggestions improve e iciency. Common cases are common, so getting them preset speeds up annotation a lot. ๐Ÿ”ฅ

Slide 84

Slide 84 text

USE MODEL ASSISTANCE ๐Ÿ”ฎ Suggest annotations however you can. Rule- based, initial trained model, an LLM โ€“ or a combination of all. Suggestions improve e iciency. Common cases are common, so getting them preset speeds up annotation a lot. ๐Ÿ”ฅ Suggestions improve accuracy. You need the common cases to be annotated consistently. Humans suck at this. ๐Ÿ“ˆ

Slide 85

Slide 85 text

๐Ÿ”ฎ explosion.ai/blog/human-in-the-loop-distillation HUMAN IN THE LOOP

Slide 86

Slide 86 text

๐Ÿ”ฎ explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline HUMAN IN THE LOOP

Slide 87

Slide 87 text

๐Ÿ”ฎ explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline prompting HUMAN IN THE LOOP

Slide 88

Slide 88 text

๐Ÿ”ฎ explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline prompting HUMAN IN THE LOOP

Slide 89

Slide 89 text

๐Ÿ”ฎ explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline prompting transfer learning ๐Ÿ“ฆ HUMAN IN THE LOOP

Slide 90

Slide 90 text

๐Ÿ”ฎ explosion.ai/blog/human-in-the-loop-distillation continuous evaluation baseline prompting transfer learning ๐Ÿ“ฆ distilled model HUMAN IN THE LOOP

Slide 91

Slide 91 text

prodigy.ai/docs/large-language-models $ prodigy ner.llm.correct todo_eval ./config.cfg ./examples.jsonl โš™

Slide 92

Slide 92 text

prodigy.ai/docs/large-language-models $ prodigy ner.llm.correct todo_eval ./config.cfg ./examples.jsonl recipe function with workflow โš™

Slide 93

Slide 93 text

prodigy.ai/docs/large-language-models $ prodigy ner.llm.correct todo_eval ./config.cfg ./examples.jsonl dataset to save annotations to recipe function with workflow โš™

Slide 94

Slide 94 text

prodigy.ai/docs/large-language-models $ prodigy ner.llm.correct todo_eval ./config.cfg ./examples.jsonl dataset to save annotations to recipe function with workflow [components.llm.model] @llm_models = "spacy.GPT-4.v2" โš™

Slide 95

Slide 95 text

prodigy.ai/docs/large-language-models $ prodigy ner.llm.correct todo_eval ./config.cfg ./examples.jsonl dataset to save annotations to recipe function with workflow raw data [components.llm.model] @llm_models = "spacy.GPT-4.v2" โš™

Slide 96

Slide 96 text

โœจ Starting the web server at localhost:8080 ... Open the app and start annotating! GPT-4 API prodigy.ai/docs/large-language-models $ prodigy ner.llm.correct todo_eval ./config.cfg ./examples.jsonl dataset to save annotations to recipe function with workflow raw data [components.llm.model] @llm_models = "spacy.GPT-4.v2" โš™

Slide 97

Slide 97 text

โœจ Starting the web server at localhost:8080 ... Open the app and start annotating! GPT-4 API prodigy.ai/docs/large-language-models $ prodigy ner.llm.correct todo_eval ./config.cfg ./examples.jsonl dataset to save annotations to recipe function with workflow raw data ๐Ÿค  You Developer [components.llm.model] @llm_models = "spacy.GPT-4.v2" โš™

Slide 98

Slide 98 text

explosion.ai/blog/guardian case study ANNOTATION STARTS AT HOME

Slide 99

Slide 99 text

explosion.ai/blog/guardian case study annotation guidelines ANNOTATION STARTS AT HOME

Slide 100

Slide 100 text

explosion.ai/blog/guardian case study annotation guidelines annotation meeting ANNOTATION STARTS AT HOME

Slide 101

Slide 101 text

๐Ÿ“’ ๐Ÿ”ฎ โš—

Slide 102

Slide 102 text

๐Ÿ“’ ๐Ÿ”ฎ Form and falsify hypotheses. โš—

Slide 103

Slide 103 text

๐Ÿ“’ ๐Ÿ”ฎ Form and falsify hypotheses. โš— Prioritize robustness.

Slide 104

Slide 104 text

๐Ÿ“’ ๐Ÿ”ฎ Form and falsify hypotheses. โš— Prioritize robustness. Scale down and iterate.

Slide 105

Slide 105 text

๐Ÿ“’ ๐Ÿ”ฎ Form and falsify hypotheses. โš— Prioritize robustness. Scale down and iterate. Imagine youโ€™re the model.

Slide 106

Slide 106 text

๐Ÿ“’ ๐Ÿ”ฎ Form and falsify hypotheses. โš— Prioritize robustness. Scale down and iterate. Imagine youโ€™re the model. Finish the pipeline to production.

Slide 107

Slide 107 text

๐Ÿ“’ ๐Ÿ”ฎ Form and falsify hypotheses. โš— Prioritize robustness. Scale down and iterate. Imagine youโ€™re the model. Finish the pipeline to production. Be agile and annotate yourself.

Slide 108

Slide 108 text

๐Ÿ“’ ๐Ÿ”ฎ Form and falsify hypotheses. โš— Prioritize robustness. Scale down and iterate. Imagine youโ€™re the model. Finish the pipeline to production. Be agile and annotate yourself. Keep tasks small.

Slide 109

Slide 109 text

๐Ÿ“’ ๐Ÿ”ฎ Form and falsify hypotheses. โš— Prioritize robustness. Scale down and iterate. Imagine youโ€™re the model. Finish the pipeline to production. Be agile and annotate yourself. Keep tasks small. Use model assistance.

Slide 110

Slide 110 text

LinkedIn Explosion spaCy Prodigy Twitter Mastodon Bluesky explosion.ai spacy.io prodigy.ai @honnibal @honnibal@sigmoid.social @honnibal.bsky.social THANK YOU!