Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2023-06-15-odsc

Sofie Van Landeghem
June 21, 2023
260

 2023-06-15-odsc

spaCy: A customizable NLP toolkit designed for developers

Presentation given by Sofie Van Landeghem at ODSC 2023

Sofie Van Landeghem

June 21, 2023
Tweet

Transcript

  1. Sofie Van Landeghem, PhD.
    Core maintainer of spaCy
    Open Source Team Lead @ Explosion
    ODSC, London, June 2023
    spaCy:
    A customizable NLP toolkit
    designed for developers

    View full-size slide

  2. Natural Language Processing
    ... There were 26 complete responses (16%) and 0 partial responses (0%) …
    … The median progression-free survival time was 65 months ...
    CR (%) PR (%) PFS (%)
    16 0 65
    Sofie Van Landeghem, ODSC 2023
    2

    View full-size slide

  3. Outline
    Sofie Van Landeghem, ODSC 2023
    Part I
    spaCy:
    A toolkit for Natural Language Processing,
    designed for developers
    Part II
    spacy-llm:
    Integrating Large Language Models
    into structured NLP pipelines
    3

    View full-size slide

  4. Sofie Van Landeghem, ODSC 2023
    ➢ Free, open-source library
    ➢ Designed for production use
    ➢ Focus on developer productivity
    ➢ Free course: https://course.spacy.io
    https://github.com/explosion/spaCy
    4
    spaCy

    View full-size slide

  5. spaCy pipelines

    A modular, pipeline approach for linguistic analysis

    Transforming unstructured text into structured data objects like spaCy’s Doc
    ORG
    Sofie Van Landeghem, ODSC 2023
    5

    View full-size slide

  6. spaCy: Empower developers
    Sofie Van Landeghem, ODSC 2023
    Business case Customize solution
    Quick prototype
    ➢ Understand business
    application
    ➢ Analyse downstream
    needs of NLP solution
    ➢ Pre-trained models
    ➢ Built-in functionality
    ➢ Reasonable
    default settings
    ➢ Implement custom
    models & algorithms
    ➢ Powerful configuration
    system
    ➢ Fine-tune iteratively
    6

    View full-size slide

  7. Pre-trained models
    $ python -m spacy download en_core_web_trf
    Sofie Van Landeghem, ODSC 2023
    https://spacy.io/models
    nlp = spacy.load("en_core_web_trf")
    doc = nlp(text)
    for ent in doc.ents:
    print(ent.text, ent.label_)
    displacy.serve(doc, style="ent")
    7

    View full-size slide

  8. Use-case: clinical trial results
    Hemodynamic Effects of Phenylephrine, Vasopressin, and Epinephrine in Children With Pulmonary Hypertension: A Pilot Study
    Abstract
    Objectives: During a pulmonary hypertensive crisis, the marked increase in pulmonary vascular resistance can result in acute right ventricular failure and death. Currently, there
    are no therapeutic guidelines for managing an acute crisis. This pilot study examined the hemodynamic effects of phenylephrine, arginine vasopressin, and epinephrine in
    pediatric patients with pulmonary hypertension.
    Design: In this prospective, open-label, nonrandomized pilot study, we enrolled pediatric patients previously diagnosed with pulmonary hypertensive who were scheduled
    electively for cardiac catheterization. Primary outcome was a change in the ratio of pulmonary-to-systemic vascular resistance. Baseline hemodynamic data were collected before
    and after the study drug was administered.
    Patients: Eleven of 15 participants were women, median age was 9.2 years (range, 1.7-14.9 yr), and median weight was 26.8 kg (range, 8.5-55.2 kg). Baseline mean pulmonary artery
    pressure was 49 ± 19 mm Hg, and mean indexed pulmonary vascular resistance was 10 ± 5.4 Wood units. Etiology of pulmonary hypertensive varied, and all were on systemic
    pulmonary hypertensive medications.
    Interventions: Patients 1-5 received phenylephrine 1 g/kg; patients 6-10 received arginine vasopressin 0.03 U/kg; and patients 11-15 received epinephrine 1 g/kg.
    μ μ
    Hemodynamics was measured continuously for up to 10 minutes following study drug administration.
    Measurements and main results: After study drug administration, the ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving
    phenylephrine, five of five patients receiving arginine vasopressin, and three of five patients receiving epinephrine. Although all three medications resulted in an increase in
    aortic pressure, only arginine vasopressin consistently resulted in a decrease in the ratio of systolic pulmonary artery-to-aortic pressure.
    Conclusions: This prospective pilot study of phenylephrine, arginine vasopressin, and epinephrine in pediatric patients with pulmonary hypertensive showed an increase in aortic
    pressure with all drugs although only vasopressin resulted in a consistent decrease in the ratio of pulmonary-to-systemic vascular resistance. Studies with more subjects are
    warranted to define optimal dosing strategies of these medications in an acute pulmonary hypertensive crisis.
    Stephanie L Siehr, Jeffrey A Feinstein, Weiguang Yang, Lynn F Peng, Michelle T Ogawa, Chandra Ramamoorthy. Pediatr Crit Care Med (2016)
    PMID: 27144689
    Sofie Van Landeghem, ODSC 2023
    8

    View full-size slide

  9. Goal: Identify treatments and outcomes
    Patients: Eleven of 15 participants were women, median age was 9.2 years (range, 1.7-14.9 yr), and median
    weight was 26.8 kg (range, 8.5-55.2 kg). Baseline mean pulmonary artery pressure was 49 ± 19 mm Hg, and
    mean indexed pulmonary vascular resistance was 10 ± 5.4 Wood units. Etiology of pulmonary hypertensive
    varied, and all were on systemic pulmonary hypertensive medications.
    Interventions: Patients 1-5 received phenylephrine 1 g/kg; patients 6-10 received arginine vasopressin 0.03
    μ
    U/kg; and patients 11-15 received epinephrine 1 g/kg.
    μ Hemodynamics was measured continuously for up to
    10 minutes following study drug administration.
    Measurements and main results: After study drug administration, the ratio of pulmonary-to-systemic vascular
    resistance decreased in three of five patients receiving phenylephrine, five of five patients receiving arginine
    vasopressin, and three of five patients receiving epinephrine. Although all three medications resulted in an
    increase in aortic pressure, only arginine vasopressin consistently resulted in a decrease in the ratio of systolic
    pulmonary artery-to-aortic pressure.
    Sofie Van Landeghem, ODSC 2023
    9
    → The pre-trained English models are not tailored to biomedical texts,
    so there is no out-of-the-box functionality for patient groups, drugs, etc

    View full-size slide

  10. Dependency parsing
    nlp = spacy.load("en_core_web_trf")
    doc = nlp(sentence)
    options = {"collapse_punct": False}
    displacy.serve(doc, style="dep", options=options)
    Sofie Van Landeghem, ODSC 2023
    → How can we leverage this information?
    10

    View full-size slide

  11. spaCy’s Matcher
    matcher = Matcher(nlp.vocab)
    pattern = [
    {"LOWER": "patients"},
    {"POS": {"IN": ["SYM", "NUM", "PUNCT"]}, "OP": "+"},
    {"LOWER": "received"},
    {"POS": {"IN": ["ADJ", "NOUN", "NUM", "ADP", "SYM"]}, "OP": "+"}]
    matcher.add("TreatmentGroup", [pattern])
    matches = matcher(doc, as_spans=True)
    matches = util.filter_spans(matches)
    for span in matches:
    print(span.text)
    > Patients 1-5 received phenylephrine 1 μg/kg
    > patients 6-10 received arginine vasopressin 0.03 U/kg
    > patients 11-15 received epinephrine 1 μg/kg
    Sofie Van Landeghem, ODSC 2023
    → Matcher rules help you explore the data & get a hands-on feel of the complexity of the task
    → They can help bootstrap annotation through weak labeling
    https://spacy.io/usage/rule-based-matching
    11

    View full-size slide

  12. Training supervised models

    Typically, you want to train a supervised model
    tailored to your domain & business case

    In this case, we probably need (at least)

    NER & spancat: identify patient groups, drugs, doses, frequencies, outcomes, …

    Relation extraction: find the correct relations between patient groups, treatments
    and outcomes.
    Sofie Van Landeghem, ODSC 2023
    https://spacy.io/usage/training
    12

    View full-size slide

  13. Annotate training data
    Sofie Van Landeghem, ODSC 2023
    13

    View full-size slide

  14. Config file: capture all training settings
    Sofie Van Landeghem, ODSC 2023
    [nlp]
    lang = "en"
    pipeline = ["tok2vec","ner","spancat"]
    batch_size = 1000
    [training]
    seed = 342
    dropout = 0.1
    max_steps = 20000
    ...
    [components.spancat]
    factory = "spancat"
    spans_key = "sc"
    [components.spancat.model]
    @architectures = "spacy.SpanCategorizer.v1"
    [components.ner]
    factory = "ner"
    ...
    → A config file allows for serializability & reproducability of your NLP pipelines
    → spaCy has built-in architectures for NER, spancat, textcat, tagger, dependency parser, …
    → You can also implement and register your own models and components!
    https://github.com/explosion/projects/tree/v3/tutorials/rel_component
    14
    $ python -m spacy init config my_config.cfg --lang en --pipeline ner,spancat

    View full-size slide

  15. Training a supervised model
    Sofie Van Landeghem, ODSC 2023
    $ python -m spacy train my_config.cfg --output ./my_output
    E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
    --- ------ ------------ -------- ------ ------ ------ ------
    0 0 0.00 23.79 0.00 0.00 0.00 0.00
    6 200 105.40 2586.38 37.21 57.14 27.59 0.37
    14 400 255.98 360.81 40.00 47.62 34.48 0.40
    23 600 60.01 47.55 34.04 44.44 27.59 0.34
    33 800 35.52 20.49 40.00 47.62 34.48 0.40
    45 1000 89.50 36.39 32.00 38.10 27.59 0.32
    59 1200 47.41 22.91 43.90 75.00 31.03 0.44
    ...
    Saves best & last trained model
    to the specified output directory.
    You can load it as an ‘nlp’ object
    to use for inference / further fine-tuning.
    nlp = spacy.load("my_output/model-best")
    doc = nlp(text)
    15

    View full-size slide

  16. Integrating Large Language Models
    into structured NLP pipelines
    Part II: spacy-llm

    View full-size slide

  17. ChatGPT (OpenAI)
    Summarize the trial results in
    a structured fashion like so:
    Patient group:
    Number of patients in the group:
    Treatment drug or substance:
    Treatment dose:
    Treatment frequency of administration:
    Treatment duration:
    Outcome:
    Patient group: Phenylephrine Group
    Number of patients in the group: 5
    Treatment drug or substance: Phenylephrine
    Treatment dose: 1 μg/kg
    Treatment frequency of administration: Single administration
    Treatment duration: Not specified
    Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in three
    of five patients receiving phenylephrine. Increase in aortic pressure observed.
    Patient group: Arginine Vasopressin Group
    Number of patients in the group: 5
    Treatment drug or substance: Arginine vasopressin
    Treatment dose: 0.03 U/kg
    Treatment frequency of administration: Single administration
    Treatment duration: Not specified
    Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in all five
    patients receiving arginine vasopressin. Increase in aortic pressure observed.
    Consistent decrease in the ratio of systolic pulmonary artery-to-aortic pressure
    observed.
    Patient group: Epinephrine Group
    (...)
    Patients 1-5 received phenylephrine 1 g/kg
    μ ; patients 6-10
    received arginine vasopressin 0.03 U/kg; and patients 11-15
    received epinephrine 1 g/kg
    μ
    (...)
    After study drug administration, the ratio of pulmonary-to-
    systemic vascular resistance decreased in three of five patients
    receiving phenylephrine, five of five patients receiving arginine
    vasopressin, and three of five patients receiving epinephrine.
    Sofie Van Landeghem, ODSC 2023
    ChatGPT
    Input text
    LLM output
    17

    View full-size slide

  18. ChatGPT in production?
    Sofie Van Landeghem, ODSC 2023
    Free web interface
    No API
    No batching
    No robustness
    No data privacy
    No reproducibility
    Whole new meaning of “black box”
    18

    View full-size slide

  19. Integrate LLMs into NLP solutions
    spacy-llm: integrate LLMs into production-ready,
    structured NLP pipelines

    Backends:

    External APIs, e.g. OpenAI, Cohere, Anthropic

    Open-source models, e.g. Dolly v2, OpenLLaMa, StableLM
    (via HuggingFace hub)

    Connect your favourite model by writing a custom backend!

    Tasks:

    Define prompt to send to the LLM

    Parse the LLM’s response and turn this into
    structured annotations on spaCy’s Doc objects

    Write a custom task definition for your specific use-case!
    Sofie Van Landeghem, ODSC 2023
    https://github.com/explosion/spacy-llm
    19

    View full-size slide

  20. spacy-llm: Empower developers
    Sofie Van Landeghem, ODSC 2023
    Business case Customize solution
    Quick prototype
    ➢ Understand business
    application
    ➢ Analyse downstream
    needs of NLP solution
    ➢ Built-in backends/models
    ➢ Built-in tasks
    ➢ Reasonable
    default settings
    ➢ Implement custom
    tasks & backends
    ➢ Powerful configuration
    system
    ➢ Fine-tune iteratively
    20

    View full-size slide

  21. Using built-in NER functionality
    Sofie Van Landeghem, ODSC 2023
    [nlp]
    lang = "en"
    pipeline = ["llm"]
    [components]
    [components.llm]
    factory = "llm"
    [components.llm.backend]
    @llm_backends = "spacy.REST.v1"
    api = "OpenAI"
    [components.llm.backend.config]
    model: "gpt-3.5-turbo"
    [components.llm.task]
    @llm_tasks = "spacy.NER.v2"
    labels = "Drug,Dose"
    my_config.cfg
    21
    Zero-shot results with spacy-llm:
    from spacy_llm.util import assemble
    text = _read_trial(pmid=27144689)
    nlp = assemble(_MY_CONFIG_DIR / "my_config.cfg")
    doc = nlp(text)

    View full-size slide

  22. Easily swap in other backends/tasks
    Sofie Van Landeghem, ODSC 2023
    my_config.cfg
    22
    [components.llm.backend]
    @llm_backends = "spacy.DollyHF.v1"
    model = "databricks/dolly-v2-12b"
    [nlp]
    lang = "en"
    pipeline = ["llm"]
    [components]
    [components.llm]
    factory = "llm"
    [components.llm.backend]
    @llm_backends = "spacy.REST.v1"
    api = "OpenAI"
    [components.llm.backend.config]
    model: "gpt-3.5-turbo"
    [components.llm.task]
    @llm_tasks = "spacy.NER.v2"
    labels = "Drug,Dose"
    [components.llm.task]
    @llm_tasks = "spacy.NER.v2"
    labels = "Patient_group,Treatment"
    [components.llm.task]
    @llm_tasks = "spacy.TextCat.v2"
    labels = "Trial,Patent,News,Research"

    View full-size slide

  23. Writing a custom task
    Sofie Van Landeghem, ODSC 2023
    INSTRUCTION = """
    Summarize the trial results in a structured fashion like so:
    Patient group:
    Number of patients in the group:
    Treatment drug or substance:
    Treatment dose:
    Treatment frequency of administration:
    Treatment duration:
    Outcome:
    """
    class TrialSummaryTask:
    def generate_prompts(self, docs: Iterable[Doc]) -> Iterable[str]:
    for doc in docs:
    prompt = "Below this instruction, I will provide you with a clinical trial abstract. "
    prompt += INSTRUCTION + doc.text
    yield prompt
    def parse_responses(self, docs: Iterable[Doc], responses: Iterable[str]) -> Iterable[Doc]:
    ...
    @registry.llm_tasks("hedgy.TrialSummary.v1")
    def make_trial_task():
    return TrialSummaryTask()
    23

    View full-size slide

  24. gpt-3.5-turbo (spacy-llm)
    Summarize the trial results in
    a structured fashion like so:
    Patient group:
    Number of patients in the group:
    Treatment drug or substance:
    Treatment dose:
    Treatment frequency of administration:
    Treatment duration:
    Outcome:
    Patient group: Pediatric patients with pulmonary hypertension
    Number of patients in the group: 15
    Treatment drug or substance: Phenylephrine, arginine vasopressin, and
    epinephrine
    Treatment dose: Phenylephrine 1 μg/kg, arginine vasopressin 0.03
    U/kg, epinephrine 1 μg/kg
    Treatment frequency of administration: Administered once
    Treatment duration: Hemodynamic measurements taken continuously for up to
    10 minutes following drug administration
    Outcome: All three medications resulted in an increase in aortic pressure.
    Only arginine vasopressin consistently resulted in a decrease in the ratio of systolic
    pulmonary artery-to-aortic pressure. The ratio of pulmonary-to-systemic vascular
    resistance decreased in three out of five patients receiving phenylephrine, five out of
    five patients receiving arginine vasopressin, and three out of five patients receiving
    epinephrine. Studies with more patients are needed to define optimal dosing
    strategies for these medications in an acute pulmonary hypertensive crisis.
    Patients 1-5 received phenylephrine 1 g/kg; patients 6-10 received
    μ
    arginine vasopressin 0.03 U/kg; and patients 11-15 received
    epinephrine 1 g/kg
    μ
    (...)
    After study drug administration, the ratio of pulmonary-to-systemic
    vascular resistance decreased in three of five patients receiving
    phenylephrine, five of five patients receiving arginine vasopressin,
    and three of five patients receiving epinephrine. Although all three
    medications resulted in an increase in aortic pressure, only arginine
    vasopressin consistently resulted in a decrease in the ratio of
    systolic pulmonary artery-to-aortic pressure.
    Sofie Van Landeghem, ODSC 2023
    GPT 3.5
    [components.llm.task]
    @llm_tasks = "hedgy.TrialSummary.v1"
    Input text
    LLM output
    24

    View full-size slide

  25. Reminder: ChatGPT (browser)
    Summarize the trial results in
    a structured fashion like so:
    Patient group:
    Number of patients in the group:
    Treatment drug or substance:
    Treatment dose:
    Treatment frequency of administration:
    Treatment duration:
    Outcome:
    Patient group: Phenylephrine Group
    Number of patients in the group: 5
    Treatment drug or substance: Phenylephrine
    Treatment dose: 1 μg/kg
    Treatment frequency of administration: Single administration
    Treatment duration: Not specified
    Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in three
    of five patients receiving phenylephrine. Increase in aortic pressure observed.
    Patient group: Arginine Vasopressin Group
    Number of patients in the group: 5
    Treatment drug or substance: Arginine vasopressin
    Treatment dose: 0.03 U/kg
    Treatment frequency of administration: Single administration
    Treatment duration: Not specified
    Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in all five
    patients receiving arginine vasopressin. Increase in aortic pressure observed.
    Consistent decrease in the ratio of systolic pulmonary artery-to-aortic pressure
    observed.
    Patient group: Epinephrine Group
    (...)
    Patients 1-5 received phenylephrine 1 g/kg
    μ ; patients 6-10
    received arginine vasopressin 0.03 U/kg; and patients 11-15
    received epinephrine 1 g/kg
    μ
    (...)
    After study drug administration, the ratio of pulmonary-to-
    systemic vascular resistance decreased in three of five patients
    receiving phenylephrine, five of five patients receiving arginine
    vasopressin, and three of five patients receiving epinephrine.
    Sofie Van Landeghem, ODSC 2023
    ChatGPT
    Input text
    LLM output
    25

    View full-size slide

  26. Prompt engineering
    Summarize the trial results in a structured fashion.
    First, identify all patient groups with distinct treatments.
    Then, for each patient group, write down the following:
    Patient group:
    Number of patients in the group:
    Treatment drug or substance:
    Treatment dose:
    Treatment frequency of administration:
    Treatment duration:
    Outcome:
    Sofie Van Landeghem, ODSC 2023
    GPT 3.5
    [components.llm.task]
    @llm_tasks = "hedgy.TrialSummary.v2"
    Patient group: Phenylephrine group
    Number of patients in the group: 5
    Treatment drug or substance: Phenylephrine
    Treatment dose: 1 μg/kg
    Treatment frequency of administration: One-time administration
    Treatment duration: Not specified
    Outcome: Ratio of pulmonary-to-systemic vascular resistance decreased in three
    out of five patients after study drug administration
    Patient group: Arginine vasopressin group
    Number of patients in the group: 5
    Treatment drug or substance: Arginine vasopressin
    Treatment dose: 0.03 U/kg
    Treatment frequency of administration: One-time administration
    Treatment duration: Not specified
    Outcome: Ratio of pulmonary-to-systemic vascular resistance decreased in all five
    patients after study drug administration. Consistent decrease in the ratio of systolic
    pulmonary artery-to-aortic pressure noted.
    Patient group: Epinephrine group
    (...)
    Patients 1-5 received phenylephrine 1 g/kg
    μ ; patients 6-10
    received arginine vasopressin 0.03 U/kg; and patients 11-15
    received epinephrine 1 g/kg
    μ
    (...)
    After study drug administration, the ratio of pulmonary-to-
    systemic vascular resistance decreased in three of five patients
    receiving phenylephrine, five of five patients receiving arginine
    vasopressin, and three of five patients receiving epinephrine.
    LLM output
    Input text
    26

    View full-size slide

  27. Task: parse into structured fields
    Sofie Van Landeghem, ODSC 2023
    def parse_responses(self, docs: Iterable[Doc], responses: Iterable[str]) -> Iterable[Doc]:
    for doc, response in zip(docs, responses):
    patient_groups = []
    ...
    while ...
    patient_group = response[start_index:end_index].strip()
    patient_groups.append(patient_group)
    ...
    matcher.add("Patient_Group", [nlp.make_doc(text) for text in patient_groups])
    ...
    matches = matcher(doc, as_spans=True)
    doc.ents = spacy.util.filter_spans(matches)
    yield doc
    27
    → Downstream processes can now use the LLM output in a structured way via the Doc object

    View full-size slide

  28. NLP is solved!

    View full-size slide

  29. (or maybe not)
    29

    View full-size slide

  30. Reliability & robustness
    Patient group: Phenylephrine group
    Number of patients in the group: 5
    Treatment drug or substance: Phenylephrine 1 μg/kg
    Treatment dose: As mentioned above
    Sofie Van Landeghem, ODSC 2023
    Number of patients in the group: 15
    Treatment drug or substance:
    Group 1:
    Patient 1-5 received phenylephrine 1 μg/kg
    Group 2:
    Patient 6-10 received arginine vasopressin 0.03 U/kg
    Group 3:
    Patient 11-15 received epinephrine 1 μg/kg
    Treatment frequency of administration
    “Administered once”
    “Single administration”
    “One-time dose”
    “One time”
    “Single dose”
    “One-time administration”
    “once”
    openai.error.RateLimitError
    30

    View full-size slide

  31. Performance features
    Sofie Van Landeghem, ODSC 2023
    Accuracy
    Inference
    speed
    Memory
    usage
    Reliability /
    reproducibility
    Maintainability
    Customizability Runtime
    cost
    Annotation /
    implementation
    cost
    Compute
    power
    Quick
    prototype
    Interpretability
    Data
    privacy
    31

    View full-size slide

  32. Performance trade-offs (1)
    Sofie Van Landeghem, ODSC 2023
    Rules/patterns
    Supervised ML
    Large Language Models
    32

    View full-size slide

  33. Performance trade-offs (2)
    Sofie Van Landeghem, ODSC 2023
    Closed source LLMs Open source LLMs
    33
    Note: make sure to inspect the license
    and the terms of use!

    View full-size slide

  34. From prototype to production
    Sofie Van Landeghem, ODSC 2023
    Business
    application
    Integrate / replace with
    supervised ML, rules, ...
    Quick prototype (LLM)
    Fine-tune
    34

    View full-size slide

  35. Ex 1: LLM-assisted annotation
    Sofie Van Landeghem, ODSC 2023
    LLM zero-shot predictions
    https://prodigy.ai/features/large-language-models
    Manual curation
    Evaluation data
    -
    Measure
    pipeline
    performance
    Training data
    -
    Train a
    supervised
    model
    35
    Examples for
    few-shot
    learning
    -
    Tune the
    LLM

    View full-size slide

  36. Ex 2: Pre-process texts
    Sofie Van Landeghem, ODSC 2023
    PII
    NER
    LLM
    ➢ Avoid sending sensitive data
    to third parties
    ➢ Recognize & replace
    Personal Identifiable Information
    36

    View full-size slide

  37. Ex 3: Filter input texts
    Sofie Van Landeghem, ODSC 2023
    TextCat NER
    ➢ Only send texts/sentences with
    certain topics/entities to the LLM
    ➢ Avoid inducing
    unncessary costs
    ➢ Adjust prompt according
    to earlier classification
    and/or identified entities
    ➢ ...
    LLM
    37

    View full-size slide

  38. Ex 4: Post-process LLM responses
    Sofie Van Landeghem, ODSC 2023
    LLM
    Entity
    linking
    ➢ Normalize the (free-text)
    LLM responses
    ➢ Connect to a knowledge base
    (e.g. through entity linking)
    ➢ Make the (unpredictable) LLM
    responses more robust for ingestion
    by downstream processes
    ➢ ...
    38
    Rules

    View full-size slide

  39. Recap
    Sofie Van Landeghem, ODSC 2023

    NLP unlocks information from text and makes it available to
    down-stream business applications in a structured form

    Large Language Models have impressive text generation/understanding abilities

    It’s become super easy to prototype NLP applications with LLMs

    When building a production-ready pipeline, you need to consider other traits
    such as customizability, robustness, inference cost, network latency, etc.

    spaCy is a production-ready NLP framework written for developers

    Its latest extension spacy-llm allows easy integration of LLMs into structured NLP pipelines

    LLM-assisted annotation allows fast bootstrapping of training/evaluation data
    39

    View full-size slide

  40. [email protected]
    https://twitter.com/OxyKodit
    https://www.linkedin.com/in/sofievanlandeghem/
    https://github.com/explosion/spaCy
    https://github.com/explosion/spacy-llm
    https://explosion.ai/
    Thanks

    View full-size slide