Belgium NLP Meetup: Rapid NLP Annotation Through Binary Decisions, Pattern Bootstrapping and Active Learning

Belgium NLP Meetup: Rapid NLP Annotation Through Binary Decisions, Pattern Bootstrapping and Active Learning

C005d9d90f1b1b1c2a0a478d67f1fee9?s=128

Ines Montani

October 31, 2018
Tweet

Transcript

  1. Rapid NLP annotation through binary decisions, pattern bootstrapping and active

    learning Ines Montani Explosion AI
  2. Why we need annotations Machine Learning is “programming by example”

    annotations let us specify the output we’re looking for even unsupervised methods need to be evaluated on labelled examples
  3. annotation needs iteration: we can’t expect to define the task

    correctly the first time good annotation teams are small – and should collaborate with the data scientist lots of high-value opportunities need specialist knowledge and expertise Why annotation tools need to be efficient
  4. impossible to perform boring, unstructured or multi-step tasks reliably humans

    make mistakes a computer never would, and vice versa humans are good at context, ambiguity and precision, computers are good at consistency, memory and recall Why annotation needs to be semi-automatic
  5. “But annotation sucks!” 1. Excel spreadsheets
 Problem: Excel. Spreadsheets.


  6. “But annotation sucks!” “But it’s just cheap click work. Can’t

    we outsource that?” 1. Excel spreadsheets
 Problem: Excel. Spreadsheets.
 2. Mechanical Turk or external annotators
 Problem: If your results are bad, is it your label scheme, your data or your model?
  7. “But annotation sucks!” 1. Excel spreadsheets
 Problem: Excel. Spreadsheets.
 2.

    Mechanical Turk or external annotators
 Problem: If your results are bad, is it your label scheme, your data or your model?
 3. Unsupervised learning
 Problem: So many clusters – but now what?
  8. Labelled data is not the problem. It’s data collection.

  9. better annotation speed better, easier-to-measure reliability in theory: any task

    can be broken down into a sequence of binary (yes or no) decisions – it just makes your gradients sparse Ask simple questions, even for complex tasks – ideally binary
  10. Prodigy Annotation Tool · https://prodi.gy

  11. Prodigy Annotation Tool · https://prodi.gy

  12. How can we train from incomplete information?

  13. Barack H. Obama was the president of America PERSON LOC

    ['B-PERSON', 'I-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'U-LOC']
  14. Learning from complete information gradient_of_loss = predicted - target In

    the simple case with one known correct label:
 target = zeros(len(classes))
 target[classes.index(true_label)] = 1.0 But what if we don’t know the full target distribution?
  15. Barack H. Obama was the president of America ORG ['?',

    '?', 'U-ORG', '?', '?', '?', '?', '?']
  16. Barack H. Obama was the president of America LOC ['?',

    '?', 'U-ORG', '?', '?', '?', '?', '?'] ['?', '?', '?', '?', '?', '?', '?', 'U-LOC']
  17. Barack H. Obama was the president of America PERSON ['?',

    '?', 'U-ORG', '?', '?', '?', '?', '?'] ['?', '?', '?', '?', '?', '?', '?', 'U-LOC'] ['B-PERSON', 'L-PERSON', '?', '?', '?', '?', '?', '?']
  18. Barack H. Obama was the president of America PERSON ['?',

    '?', 'U-ORG', '?', '?', '?', '?', '?'] ['?', '?', '?', '?', '?', '?', '?', 'U-LOC'] ['B-PERSON', 'L-PERSON', '?', '?', '?', '?', '?', '?'] ['B-PERSON', 'I-PERSON', 'L-PERSON', '?', '?', '?', '?', '?']
  19. 
 Training from sparse labels goal: update the model in

    the best possible way with what we know just like multi-label classification where examples can have more than one right answer update towards: wrong labels get 0 probability, rest is split proportionally
  20. token = 'Obama' labels = ['ORG', 'LOC', 'PERSON'] predicted =

    [ 0.5, 0.2, 0.3 ]
  21. token = 'Obama' labels = ['ORG', 'LOC', 'PERSON'] predicted =

    [ 0.5, 0.2, 0.3 ] target = [ 0.0, 0.0, 1.0 ] gradient = predicted - target
  22. token = 'Obama' labels = ['ORG', 'LOC', 'PERSON'] predicted =

    [ 0.5, 0.2, 0.3 ] target = [ 0.0, ?, ? ]
  23. token = 'Obama' labels = ['ORG', 'LOC', 'PERSON'] predicted =

    [ 0.5, 0.2, 0.3 ] target = [ 0.0, 0.2 / (1.0 - 0.5), 0.3 / (1.0 - 0.5) ] target = [ 0.0, 0.4, 0.6 ] redistribute proportionally
  24. Barack H. Obama was the president of America ['B-PERSON', 'I-PERSON',

    'L-PERSON', 'O', 'O', 'O', 'O', 'U-LOC'] ['B-PERSON', 'I-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'O' ] [ 'O', 'O', 'U-PERSON', 'O', 'O', 'O', 'O', 'U-LOC'] [ 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O' ] 0.40 0.35 0.20 0.05
  25. Training from sparse labels if we have a model that

    predicts something, we can work with that once the model’s already quite good, its second choice is probably correct new label: even from cold start, model will still converge – it’s just slow
  26. How to get over the cold start when training a

    new label? model needs to see enough positive examples rule-based models are often quite good rules can pre-label entity candidates write rules, annotate the exceptions
  27. { "label": "GPE", "pattern": [ {"lower": "virginia"} ] }

  28. Does this work for other structured prediction tasks? approach can

    be applied to other non-NER tasks: dependency parsing, coreference resolution, relation extraction, summarization etc. structures we’re predicting are highly correlated annotating it all at once is super inefficient – binary supervision can be much better
  29. Benefits of binary annotation workflows better data quality, reduce human

    error automate what humans are bad at, focus on what humans are needed for enable rapid iteration on data selection and 
 label scheme
  30. Iterate on your code and your data.

  31. the part you 
 work on source code compiler runtime

    
 program “Regular” programming
  32. the part you 
 should work on source code compiler

    runtime 
 program training data training algorithm runtime model “Regular” programming Machine Learning
  33. If you can master annotation... ... you can try out

    more ideas quickly. Most ideas don’t work – but some succeed wildly. ... fewer projects will fail. Figure out what works before trying to scale it up. ... you can build entirely custom solutions and nobody can lock you in.
  34. Thanks! Explosion AI
 explosion.ai Follow us on Twitter
 @_inesmontani
 @explosion_ai