Belgium NLP Meetup: Rapid NLP Annotation Through Binary Decisions, Pattern Bootstrapping and Active Learning

Slide 1

Slide 1 text

Rapid NLP annotation through binary decisions, pattern bootstrapping and active learning Ines Montani Explosion AI

Slide 2

Slide 2 text

Why we need annotations Machine Learning is “programming by example” annotations let us specify the output we’re looking for even unsupervised methods need to be evaluated on labelled examples

Slide 3

Slide 3 text

annotation needs iteration: we can’t expect to define the task correctly the first time good annotation teams are small – and should collaborate with the data scientist lots of high-value opportunities need specialist knowledge and expertise Why annotation tools need to be efﬁcient

Slide 4

Slide 4 text

impossible to perform boring, unstructured or multi-step tasks reliably humans make mistakes a computer never would, and vice versa humans are good at context, ambiguity and precision, computers are good at consistency, memory and recall Why annotation needs to be semi-automatic

Slide 5

Slide 5 text

“But annotation sucks!” 1. Excel spreadsheets  Problem: Excel. Spreadsheets. 

Slide 6

Slide 6 text

“But annotation sucks!” “But it’s just cheap click work. Can’t we outsource that?” 1. Excel spreadsheets  Problem: Excel. Spreadsheets.  2. Mechanical Turk or external annotators  Problem: If your results are bad, is it your label scheme, your data or your model?

Slide 7

Slide 7 text

“But annotation sucks!” 1. Excel spreadsheets  Problem: Excel. Spreadsheets.  2. Mechanical Turk or external annotators  Problem: If your results are bad, is it your label scheme, your data or your model?  3. Unsupervised learning  Problem: So many clusters – but now what?

Slide 8

Slide 8 text

Labelled data is not the problem. It’s data collection.

Slide 9

Slide 9 text

better annotation speed better, easier-to-measure reliability in theory: any task can be broken down into a sequence of binary (yes or no) decisions – it just makes your gradients sparse Ask simple questions, even for complex tasks – ideally binary

Slide 10

Slide 10 text

Prodigy Annotation Tool · https://prodi.gy

Slide 11

Slide 11 text

Prodigy Annotation Tool · https://prodi.gy

Slide 12

Slide 12 text

How can we train from incomplete information?

Slide 13

Slide 13 text

Barack H. Obama was the president of America PERSON LOC ['B-PERSON', 'I-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'U-LOC']

Slide 14

Slide 14 text

Learning from complete information gradient_of_loss = predicted - target In the simple case with one known correct label:  target = zeros(len(classes))  target[classes.index(true_label)] = 1.0 But what if we don’t know the full target distribution?

Slide 15

Slide 15 text

Barack H. Obama was the president of America ORG ['?', '?', 'U-ORG', '?', '?', '?', '?', '?']

Slide 16

Slide 16 text

Barack H. Obama was the president of America LOC ['?', '?', 'U-ORG', '?', '?', '?', '?', '?'] ['?', '?', '?', '?', '?', '?', '?', 'U-LOC']

Slide 17

Slide 17 text

Barack H. Obama was the president of America PERSON ['?', '?', 'U-ORG', '?', '?', '?', '?', '?'] ['?', '?', '?', '?', '?', '?', '?', 'U-LOC'] ['B-PERSON', 'L-PERSON', '?', '?', '?', '?', '?', '?']

Slide 18

Slide 18 text

Barack H. Obama was the president of America PERSON ['?', '?', 'U-ORG', '?', '?', '?', '?', '?'] ['?', '?', '?', '?', '?', '?', '?', 'U-LOC'] ['B-PERSON', 'L-PERSON', '?', '?', '?', '?', '?', '?'] ['B-PERSON', 'I-PERSON', 'L-PERSON', '?', '?', '?', '?', '?']

Slide 19

Slide 19 text

  Training from sparse labels goal: update the model in the best possible way with what we know just like multi-label classification where examples can have more than one right answer update towards: wrong labels get 0 probability, rest is split proportionally

Slide 20

Slide 20 text

token = 'Obama' labels = ['ORG', 'LOC', 'PERSON'] predicted = [ 0.5, 0.2, 0.3 ]

Slide 21

Slide 21 text

token = 'Obama' labels = ['ORG', 'LOC', 'PERSON'] predicted = [ 0.5, 0.2, 0.3 ] target = [ 0.0, 0.0, 1.0 ] gradient = predicted - target

Slide 22

Slide 22 text

token = 'Obama' labels = ['ORG', 'LOC', 'PERSON'] predicted = [ 0.5, 0.2, 0.3 ] target = [ 0.0, ?, ? ]

Slide 23

Slide 23 text

token = 'Obama' labels = ['ORG', 'LOC', 'PERSON'] predicted = [ 0.5, 0.2, 0.3 ] target = [ 0.0, 0.2 / (1.0 - 0.5), 0.3 / (1.0 - 0.5) ] target = [ 0.0, 0.4, 0.6 ] redistribute proportionally

Slide 24

Slide 24 text

Barack H. Obama was the president of America ['B-PERSON', 'I-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'U-LOC'] ['B-PERSON', 'I-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'O' ] [ 'O', 'O', 'U-PERSON', 'O', 'O', 'O', 'O', 'U-LOC'] [ 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O' ] 0.40 0.35 0.20 0.05

Slide 25

Slide 25 text

Training from sparse labels if we have a model that predicts something, we can work with that once the model’s already quite good, its second choice is probably correct new label: even from cold start, model will still converge – it’s just slow

Slide 26

Slide 26 text

How to get over the cold start when training a new label? model needs to see enough positive examples rule-based models are often quite good rules can pre-label entity candidates write rules, annotate the exceptions

Slide 27

Slide 27 text

{ "label": "GPE", "pattern": [ {"lower": "virginia"} ] }

Slide 28

Slide 28 text

Does this work for other structured prediction tasks? approach can be applied to other non-NER tasks: dependency parsing, coreference resolution, relation extraction, summarization etc. structures we’re predicting are highly correlated annotating it all at once is super inefficient – binary supervision can be much better

Slide 29

Slide 29 text

Beneﬁts of binary annotation workflows better data quality, reduce human error automate what humans are bad at, focus on what humans are needed for enable rapid iteration on data selection and   label scheme

Slide 30

Slide 30 text

Iterate on your code and your data.

Slide 31

Slide 31 text

the part you   work on source code compiler runtime   program “Regular” programming

Slide 32

Slide 32 text

the part you   should work on source code compiler runtime   program training data training algorithm runtime model “Regular” programming Machine Learning

Slide 33

Slide 33 text

If you can master annotation... ... you can try out more ideas quickly. Most ideas don’t work – but some succeed wildly. ... fewer projects will fail. Figure out what works before trying to scale it up. ... you can build entirely custom solutions and nobody can lock you in.

Slide 34

Slide 34 text

Thanks! Explosion AI  explosion.ai Follow us on Twitter  @_inesmontani  @explosion_ai