Slide 1

Slide 1 text

with Spancat Entities Entities within Entities within 1

Slide 2

Slide 2 text

Sam Bankman-Fried announced he would resign as CEO of FTX amid the crisis in late 2022. DATE PERSON ORG This drug helped my joint pain and increased my ability to be active. However, now I am starting to feel dizzy and get headaches. 2

Slide 3

Slide 3 text

Sam Bankman-Fried announced he would resign as CEO of FTX amid the crisis in late 2022. DATE PERSON ORG 4 This drug helped my joint pain and increased my ability to be active. However, now I am starting to feel dizzy and get headaches. COND COND COND BENEFIT ADE ADE BENEFIT

Slide 4

Slide 4 text

Ç What is ? Ç What are the of NER Ç - and how it’s different from NER named entity recognition limitations Spancat 5

Slide 5

Slide 5 text

Sam Bankman-Fried announced he would wind down operations at Alameda Research and resigned as CEO of FTX amid the crisis in late 2022. DATE PERSON ORG ORG 6

Slide 6

Slide 6 text

>>> B-ORG Sam Bankman-Fried was the CEO of FTX B-PER I-PER >>> [‘B’,‘I’,‘O’,‘O’,‘O’,‘O’,‘B’] 0 1 2 3 4 import doc spacy nlp = spacy. ( ) doc = nlp( ) print([ .ent_iob_ for in ]) load token token "en_core_web_sm" "Sam Bankman-Fried was the CEO of FTX" 7

Slide 7

Slide 7 text

This drug helped my joint pain and increased my ability to be active. However, now I am starting to feel dizzy and get headaches. COND COND COND 8

Slide 8

Slide 8 text

This is great for but it also caused joint pain headaches it also caused headaches This is great for joint pain More on this: https://explosion.ai/blog/healthsea Text Classification Text Classification 9

Slide 9

Slide 9 text

This drug helped my joint pain and increased my ability to be active. However, now I am starting to feel dizzy and get headaches. COND COND COND BENEFIT ADE ADE BENEFIT 10

Slide 10

Slide 10 text

https://spacy.io/api/spancategorizer Spancat COMPONENT 11

Slide 11

Slide 11 text

More on this: https://explosion.ai/blog/spacy-design-concepts Customizability without compromising the developer experience f so you can customize to your specific us† f and easy to get started with a project f configuration and implementation swappable components sensible defaults transparent 12

Slide 12

Slide 12 text

# Construction via add_pipe with default model = # Construction via add_pipe with custom model = = spancat nlp. ( ) config { : { : }} parser nlp. ( , config=config) add_pipe add_pipe "spancat" "model" "@architectures" "my_spancat" "spancat" € componen € different ways to get starte € part of the text trainable processing pipeline Text Doc nlp tokenizer tagger parser spancat ... 13

Slide 13

Slide 13 text

[components.spancat.suggester] @misc sizes = = "spacy.ngram_suggester.v1" [1,2,3] config.cfg [components.spancat.suggester] @misc max_output = = "custom_suggester.v1" 10 config.cfg Generate a https://spacy.io/usage/training#quickstart config: s the , includes all settings and records all default€ s by swapping out component€ s preset with to get you started single source of truth customize the architecture sensible defaults 14

Slide 14

Slide 14 text

15 import from import from import spacy spacy displacy spacy.tokens Span text nlp spacy.blank( ) doc nlp(text) doc.spans[ ] [ Span(doc, , , ), Span(doc, , , ), ] displacy.serve(doc, style ) = = = = = "Welcome to the Bank of China." "en" "sc" "ORG" "GPE" "span" 3 6 5 6 displaCy https://spacy.io/usage/visualizers

Slide 15

Slide 15 text

This has helped my joint pain. COND Classifier label: condition This has COND 0.1 has helped COND 0.1 helped my COND 0.1 my joint COND 0.25 joint pain COND 0.99 Suggester n-gram 2 This has has helped helped my my joint joint pain 16

Slide 16

Slide 16 text

joint pain [3,2] [1,8] Tok2vec Suggested span 17

Slide 17

Slide 17 text

joint pain [3,2] [1,8] Tok2vec Pooling Suggested span [3,2] [1,8] [2,5] [3,8] First Last Mean Max 17

Slide 18

Slide 18 text

joint pain [3,2] [1,8] Tok2vec Pooling Scoring Suggested span COND: 0.99 EFFECT: 0.06 [3,2] [1,8] [2,5] [3,8] First Last Mean Max 17

Slide 19

Slide 19 text

Subtree suggester  syntactic dependencies  noun chunk iterator Chunk suggester  full sentences Sentence suggester Learn more: github.com/explosion/ #span-finder spacy-experimental swappable suggestor functions 18

Slide 20

Slide 20 text

Subtree suggester  syntactic dependencies  noun chunk iterator Chunk suggester  full sentences Sentence suggester  a certain amount of tokens n-gram suggester Learn more: github.com/explosion/ #span-finder spacy-experimental swappable suggestor functions 18

Slide 21

Slide 21 text

Subtree suggester  syntactic dependencies  noun chunk iterator Chunk suggester  full sentences Sentence suggester  a certain amount of tokens n-gram suggester  machine learning approac‰  learns start and end tokens Learn more: github.com/explosion/ #span-finder spacy-experimental SpanFinder swappable suggestor functions 18

Slide 22

Slide 22 text

>>> , , , , , , , , , , , [ Sam Bankman-Fried CEO of FTX Sam Bankman-Fried Bankman-Fried CEO CEO of of FTX Sam Bankman-Fried CEO Bankman-Fried CEO of CEO of FTX ] 0 1 2 3 4 5 6 7 8 import from import for in spacy spacy. registry nlp = spacy. doc = nlp( ) build_suggester = registry.misc. ( suggester = build_suggester( =[1, 2, 3]) util blank get sizes start:end data ( ) ) spans = [doc[ ] (start, end) suggester([doc]). ] “en” "Sam Bankman-Fried, CEO of FTX." “spacy.ngram_range_suggester.v1” 19

Slide 23

Slide 23 text

Explicit control of candidate span & via suggester function & bias your model towards precision or recall 20

Slide 24

Slide 24 text

20 Explicit control of candidate span Access to confidence score 0 via suggester function 0 bias your model towards precision or recall 0 label probabilities over the whole spa' 0 includes the full context of the span

Slide 25

Slide 25 text

20 Explicit control of candidate span Access to confidence score Less edge-sensitivitÈ 9 via suggester function 9 bias your model towards precision or recall 9 label probabilities over the whole spa6 9 includes the full context of the span 9 doesn’t predict single token-based tagsQ 9 more useful for other types of phrases or overlapping spans

Slide 26

Slide 26 text

youtube.com/ExplosionAI 2 data for spancaC 2 process with patterns and training temporary model' 2 for consistent annotatio4 2 ... and even more Annotate Speed up Tips and tricks explosion. /blog/spancat ai 2 vs Named Entity Recognize… 2 spancat work' 2 spancat dataset' 2 spancat with displaCˆ 2 ... and more SpanCategorizer How Debug Visualize github.com/ /projects explosion 2 data for spancaC 2 an application of spancaC 2 for your projecC 2 ... and more Annotate See Template 21

Slide 27

Slide 27 text

Thank you for listening! 22 4 .co8 4 twitter.com/ 4 linkedin.com/in/ 4 @explosion.ai victoriaslocum victorialslocu8 victorialslocu8 victoria