Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2021-02-01-rel_video

 2021-02-01-rel_video

Creating a custom trainable component with spaCy: an example use-case for Relation Extraction

Slides to support this tutorial video: https://www.youtube.com/watch?v=8HL-Ap5_Axo

Sofie Van Landeghem

February 01, 2021
Tweet

More Decks by Sofie Van Landeghem

Other Decks in Programming

Transcript

  1. v3 So fi e Van Landeghem FOR NAMED ENTITY Trainable

    Component Relation Extraction CUSTOM
  2. Models written in any framework Multi-task learning with transformers like

    BERT Production-ready training system, model packaging & workflow management spacy.io
  3. Models written in any framework Multi-task learning with transformers like

    BERT Production-ready training system, model packaging & workflow management Fully custom trainable pipeline components spacy.io
  4. Document Machine Learning model Predictions matrix Doc Text ner rel

    step #1: implement model step #2: implement pipeline component
  5. Document Machine Learning model Predictions matrix Doc Text ner rel

    step #1: implement model step #3:enhance accuracy transformer step #2: implement pipeline component
  6. Document GATA3 inhibits FOXP3 expression Tokens + NER [GATA3, inhibits,

    FOXP3, expression] Token vectors [[-0.42, 1.93, -1.08, 0.28, -0.71] [ 3.84, 2.59, -0.14, -3.77, -0.66] [ 3.35, -1.51, 1.23, -0.88, -2.19] [ 3.77, -2.17, -0.48, -1.73, 1.10]]
  7. Document GATA3 inhibits FOXP3 expression Tokens + NER [GATA3, inhibits,

    FOXP3, expression] Token vectors [[-0.42, 1.93, -1.08, 0.28, -0.71] [ 3.84, 2.59, -0.14, -3.77, -0.66] [ 3.35, -1.51, 1.23, -0.88, -2.19] [ 3.77, -2.17, -0.48, -1.73, 1.10]] Instance 1 Instance 2 GATA3 -> FOXP3 [-0.42, 1.93, -1.08, 0.28, -0.71, 3.35, -1.51, 1.23, -0.88, -2.19] FOXP3 -> GATA3 [ 3.35, -1.51, 1.23, -0.88, -2.19, -0.42, 1.93, -1.08, 0.28, -0.71]
  8. [GATA3, inhibits, FOXP3, expression] Instance data [[-0.42, 1.93, -1.08, 0.28,

    -0.71, 3.35, -1.51, 1.23, -0.88, -2.19] [ 3.35, -1.51, 1.23, -0.88, -2.19, -0.42, 1.93, -1.08, 0.28, -0.71]]
  9. [GATA3, inhibits, FOXP3, expression] Instance data [[-0.42, 1.93, -1.08, 0.28,

    -0.71, 3.35, -1.51, 1.23, -0.88, -2.19] [ 3.35, -1.51, 1.23, -0.88, -2.19, -0.42, 1.93, -1.08, 0.28, -0.71]] Classi fi cation layer Relation types: BINDING ACTIVATION INHIBITION
  10. [GATA3, inhibits, FOXP3, expression] Instance data [[-0.42, 1.93, -1.08, 0.28,

    -0.71, 3.35, -1.51, 1.23, -0.88, -2.19] [ 3.35, -1.51, 1.23, -0.88, -2.19, -0.42, 1.93, -1.08, 0.28, -0.71]] Classi fi cation layer Relation types: BINDING ACTIVATION INHIBITION Predictions [[ 0.09, 0.14, 0.93 ] [ 0.11, 0.15, 0.31 ]] BINDING ACTIVATION INHIBITION
  11. [GATA3, inhibits, FOXP3, expression] Instance data [[-0.42, 1.93, -1.08, 0.28,

    -0.71, 3.35, -1.51, 1.23, -0.88, -2.19] [ 3.35, -1.51, 1.23, -0.88, -2.19, -0.42, 1.93, -1.08, 0.28, -0.71]] Classi fi cation layer Relation types: BINDING ACTIVATION INHIBITION GATA3 -> FOXP3 BINDING: False, ACTIVATION: False, INHIBITION: True Instance 1 Instance 2 FOXP3 -> GATA3 BINDING: False, ACTIVATION: False, INHIBITION: False Predictions [[ 0.09, 0.14, 0.93 ] [ 0.11, 0.15, 0.31 ]] BINDING ACTIVATION INHIBITION
  12. create_instance_tensor Instance tensor Floats2d Documents List[Doc] Token vectors List[Floats2d] tok2vec

    Entity vectors List[Floats2d] pooling Candidate instances List[Tuple[Span, Span]] get_instances
  13. create_instance_tensor Instance tensor Floats2d Documents List[Doc] Predictions matrix Floats2d classification

    layer Token vectors List[Floats2d] tok2vec Entity vectors List[Floats2d] pooling Candidate instances List[Tuple[Span, Span]] get_instances
  14. Document TGF-beta signalling induces Id2 Tokens + NER [TGF, -,

    beta, signalling, induces, Id2] Token vectors [[ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74] [-1.27, 2.21, -0.75, 1.07, -0.48] [-1.03, 0.94, 1.64, -0.05, -0.98] [-0.81, 0.72, -0.52, 0.67, -0.16]]
  15. Document TGF-beta signalling induces Id2 Tokens + NER [TGF, -,

    beta, signalling, induces, Id2] Token vectors [[ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74] [-1.27, 2.21, -0.75, 1.07, -0.48] [-1.03, 0.94, 1.64, -0.05, -0.98] [-0.81, 0.72, -0.52, 0.67, -0.16]] Entities Ragged [3, 1, 1, 3] Lengths Data [[ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74] [-0.81, 0.72, -0.52, 0.67, -0.16] [-0.81, 0.72, -0.52, 0.67, -0.16] [ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74]]
  16. Document TGF-beta signalling induces Id2 Tokens + NER [TGF, -,

    beta, signalling, induces, Id2] Token vectors [[ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74] [-1.27, 2.21, -0.75, 1.07, -0.48] [-1.03, 0.94, 1.64, -0.05, -0.98] [-0.81, 0.72, -0.52, 0.67, -0.16]] Entities Ragged [3, 1, 1, 3] Lengths Data [[ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74] [-0.81, 0.72, -0.52, 0.67, -0.16] [-0.81, 0.72, -0.52, 0.67, -0.16] [ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74]] Instance 1 Instance 2
  17. [TGF, -, beta, signalling, induces, Id2] Entities Ragged [3, 1,

    1, 3] Lengths Data [[ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74] [-0.81, 0.72, -0.52, 0.67, -0.16] [-0.81, 0.72, -0.52, 0.67, -0.16] [ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74]]
  18. [TGF, -, beta, signalling, induces, Id2] Entities Ragged [3, 1,

    1, 3] Lengths Data [[ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74] [-0.81, 0.72, -0.52, 0.67, -0.16] [-0.81, 0.72, -0.52, 0.67, -0.16] [ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74]] Pooled entities Floats2d [[-0.60, 0.11, -0.37, 0.17, 0.17] [-0.81, 0.72, -0.52, 0.67, -0.16] [-0.81, 0.72, -0.52, 0.67, -0.16] [-0.60, 0.11, -0.37, 0.17, 0.17]] Instance 1 Instance 2
  19. [TGF, -, beta, signalling, induces, Id2] Entities Ragged [3, 1,

    1, 3] Lengths Data [[ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74] [-0.81, 0.72, -0.52, 0.67, -0.16] [-0.81, 0.72, -0.52, 0.67, -0.16] [ 1.22, -3.12, -0.19, 0.51, -0.46] [-1.71, 0.92, -0.67, 0.86, 2.70] [-1.32, 2.52, -0.26, -0.86, -1.74]] Instance tensor Floats2d [[-0.60, 0.11, -0.37, 0.17, 0.17, -0.81, 0.72, -0.52, 0.67, -0.16] [-0.81, 0.72, -0.52, 0.67, -0.16, -0.60, 0.11, -0.37, 0.17, 0.17]] Pooled entities Floats2d [[-0.60, 0.11, -0.37, 0.17, 0.17] [-0.81, 0.72, -0.52, 0.67, -0.16] [-0.81, 0.72, -0.52, 0.67, -0.16] [-0.60, 0.11, -0.37, 0.17, 0.17]] Instance 1 Instance 2
  20. optimize model settings for accuracy or efficiency components to train

    spacy.io/usage/training generate starter config
  21. con fi g.cfg structured section defining components factory function used

    to create component registered function to create model architecture
  22. con fi g.cfg structured section defining components factory function used

    to create component registered function to create model architecture function arguments