Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyData Global 2022: Spancat

PyData Global 2022: Spancat

Named entity recognition models might not be able to handle a wide variety of spans, but Spancat certainly can! Within our open-source library for NLP, spaCy, we’ve created a NER model to handle overlapping and arbitrary text spans. Dive into named entity recognition, its limitations, and how we’ve solved them with a solution-focused talk and practical applications.

Victoria Slocum

December 05, 2022
Tweet

More Decks by Victoria Slocum

Other Decks in Programming

Transcript

  1. Sam Bankman-Fried announced he would resign as CEO of FTX

    amid the crisis in late 2022. DATE PERSON ORG This drug helped my joint pain and increased my ability to be active. However, now I am starting to feel dizzy and get headaches. 2
  2. Sam Bankman-Fried announced he would resign as CEO of FTX

    amid the crisis in late 2022. DATE PERSON ORG 4 This drug helped my joint pain and increased my ability to be active. However, now I am starting to feel dizzy and get headaches. COND COND COND BENEFIT ADE ADE BENEFIT
  3. Ç What is ? Ç What are the of NER

    Ç - and how it’s different from NER named entity recognition limitations Spancat 5
  4. Sam Bankman-Fried announced he would wind down operations at Alameda

    Research and resigned as CEO of FTX amid the crisis in late 2022. DATE PERSON ORG ORG 6
  5. >>> B-ORG Sam Bankman-Fried was the CEO of FTX B-PER

    I-PER >>> [‘B’,‘I’,‘O’,‘O’,‘O’,‘O’,‘B’] 0 1 2 3 4 import doc spacy nlp = spacy. ( ) doc = nlp( ) print([ .ent_iob_ for in ]) load token token "en_core_web_sm" "Sam Bankman-Fried was the CEO of FTX" 7
  6. This drug helped my joint pain and increased my ability

    to be active. However, now I am starting to feel dizzy and get headaches. COND COND COND 8
  7. This is great for but it also caused joint pain

    headaches it also caused headaches This is great for joint pain More on this: https://explosion.ai/blog/healthsea Text Classification Text Classification 9
  8. This drug helped my joint pain and increased my ability

    to be active. However, now I am starting to feel dizzy and get headaches. COND COND COND BENEFIT ADE ADE BENEFIT 10
  9. More on this: https://explosion.ai/blog/spacy-design-concepts Customizability without compromising the developer experience

    f so you can customize to your specific us† f and easy to get started with a project f configuration and implementation swappable components sensible defaults transparent 12
  10. # Construction via add_pipe with default model = # Construction

    via add_pipe with custom model = = spancat nlp. ( ) config { : { : }} parser nlp. ( , config=config) add_pipe add_pipe "spancat" "model" "@architectures" "my_spancat" "spancat" € componen € different ways to get starte € part of the text trainable processing pipeline Text Doc nlp tokenizer tagger parser spancat ... 13
  11. [components.spancat.suggester] @misc sizes = = "spacy.ngram_suggester.v1" [1,2,3] config.cfg [components.spancat.suggester] @misc

    max_output = = "custom_suggester.v1" 10 config.cfg Generate a https://spacy.io/usage/training#quickstart config: s the , includes all settings and records all default€ s by swapping out component€ s preset with to get you started single source of truth customize the architecture sensible defaults 14
  12. 15 import from import from import spacy spacy displacy spacy.tokens

    Span text nlp spacy.blank( ) doc nlp(text) doc.spans[ ] [ Span(doc, , , ), Span(doc, , , ), ] displacy.serve(doc, style ) = = = = = "Welcome to the Bank of China." "en" "sc" "ORG" "GPE" "span" 3 6 5 6 displaCy https://spacy.io/usage/visualizers
  13. This has helped my joint pain. COND Classifier label: condition

    This has COND 0.1 has helped COND 0.1 helped my COND 0.1 my joint COND 0.25 joint pain COND 0.99 Suggester n-gram 2 This has has helped helped my my joint joint pain 16
  14. joint pain [3,2] [1,8] Tok2vec Pooling Scoring Suggested span COND:

    0.99 EFFECT: 0.06 [3,2] [1,8] [2,5] [3,8] First Last Mean Max 17
  15. Subtree suggester  syntactic dependencies  noun chunk iterator Chunk

    suggester  full sentences Sentence suggester Learn more: github.com/explosion/ #span-finder spacy-experimental swappable suggestor functions 18
  16. Subtree suggester  syntactic dependencies  noun chunk iterator Chunk

    suggester  full sentences Sentence suggester  a certain amount of tokens n-gram suggester Learn more: github.com/explosion/ #span-finder spacy-experimental swappable suggestor functions 18
  17. Subtree suggester  syntactic dependencies  noun chunk iterator Chunk

    suggester  full sentences Sentence suggester  a certain amount of tokens n-gram suggester  machine learning approac‰  learns start and end tokens Learn more: github.com/explosion/ #span-finder spacy-experimental SpanFinder swappable suggestor functions 18
  18. >>> , , , , , , , , ,

    , , [ Sam Bankman-Fried CEO of FTX Sam Bankman-Fried Bankman-Fried CEO CEO of of FTX Sam Bankman-Fried CEO Bankman-Fried CEO of CEO of FTX ] 0 1 2 3 4 5 6 7 8 import from import for in spacy spacy. registry nlp = spacy. doc = nlp( ) build_suggester = registry.misc. ( suggester = build_suggester( =[1, 2, 3]) util blank get sizes start:end data ( ) ) spans = [doc[ ] (start, end) suggester([doc]). ] “en” "Sam Bankman-Fried, CEO of FTX." “spacy.ngram_range_suggester.v1” 19
  19. Explicit control of candidate span & via suggester function &

    bias your model towards precision or recall 20
  20. 20 Explicit control of candidate span Access to confidence score

    0 via suggester function 0 bias your model towards precision or recall 0 label probabilities over the whole spa' 0 includes the full context of the span
  21. 20 Explicit control of candidate span Access to confidence score

    Less edge-sensitivitÈ 9 via suggester function 9 bias your model towards precision or recall 9 label probabilities over the whole spa6 9 includes the full context of the span 9 doesn’t predict single token-based tagsQ 9 more useful for other types of phrases or overlapping spans
  22. youtube.com/ExplosionAI 2 data for spancaC 2 process with patterns and

    training temporary model' 2 for consistent annotatio4 2 ... and even more Annotate Speed up Tips and tricks explosion. /blog/spancat ai 2 vs Named Entity Recognize… 2 spancat work' 2 spancat dataset' 2 spancat with displaCˆ 2 ... and more SpanCategorizer How Debug Visualize github.com/ /projects explosion 2 data for spancaC 2 an application of spancaC 2 for your projecC 2 ... and more Annotate See Template 21
  23. Thank you for listening! 22 4 .co8 4 twitter.com/ 4

    linkedin.com/in/ 4 @explosion.ai victoriaslocum victorialslocu8 victorialslocu8 victoria