Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2024-01-23-az

 2024-01-23-az

Presentation given to AZ NLP Community of Practice

Sofie Van Landeghem

January 23, 2024
Tweet

More Decks by Sofie Van Landeghem

Other Decks in Research

Transcript

  1. spacy-llm
    From quick prototyping with LLMs
    to more reliable and efficient NLP solutions
    Sofie Van Landeghem, PhD.
    Core maintainer of spaCy @ Explosion
    NLP and ML freelancer @ OxyKodit

    View full-size slide

  2. Briefly about me ...
    ● 2008 - 2012: PhD in BioNLP, mostly working on Biomedical event extraction
    ○ SVMs were the new kid on the block - feature engineering is fun!
    ● 2013 - 2014: PostDoc in Bioinformatics, combining BioNLP with network analysis
    ○ Seeing the rise of word embeddings with the publication of the word2vec paper by Mikolov et al.
    ● 2015 - 2018: Data scientist @ Johnson & Johnson
    ○ Bridging the gap between state-of-the-art NLP and (often harsh) business reality
    ● 2019 - 2024: Freelancing + maintainer of open-source NLP toolbox spaCy
    ○ Front row seat to the transformer-based revolution and LLM disruption
    2
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  3. This talk ...
    ● Will explain the design principles of the open-source toolbox spaCy
    ● Will showcase how to use its recent plugin spacy-llm to perform rapid prototyping
    with Large Language Models (LLMs)
    ● Will demonstrate how to move beyond a prototype into a more reliable,
    efficient, and maintainable solution
    ● Will use clinical trial analysis as an example application
    3
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  4. Clinical trial abstract
    Hemodynamic Effects of Phenylephrine, Vasopressin, and Epinephrine in Children With Pulmonary Hypertension: A Pilot Study
    Objectives: During a pulmonary hypertensive crisis, the marked increase in pulmonary vascular resistance can result in acute right ventricular failure and death.
    Currently, there are no therapeutic guidelines for managing an acute crisis. This pilot study examined the hemodynamic effects of phenylephrine, arginine
    vasopressin, and epinephrine in pediatric patients with pulmonary hypertension.
    Design: In this prospective, open-label, nonrandomized pilot study, we enrolled pediatric patients previously diagnosed with pulmonary hypertensive who were
    scheduled electively for cardiac catheterization. Primary outcome was a change in the ratio of pulmonary-to-systemic vascular resistance. Baseline hemodynamic data
    were collected before and after the study drug was administered.
    Patients: Eleven of 15 participants were women, median age was 9.2 years (range, 1.7-14.9 yr), and median weight was 26.8 kg (range, 8.5-55.2 kg). Baseline mean
    pulmonary artery pressure was 49 ± 19 mm Hg, and mean indexed pulmonary vascular resistance was 10 ± 5.4 Wood units. Etiology of pulmonary hypertensive
    varied, and all were on systemic pulmonary hypertensive medications.
    Interventions: Patients 1-5 received phenylephrine 1 μg/kg; patients 6-10 received arginine vasopressin 0.03 U/kg; and patients 11-15 received epinephrine 1 μg/kg.
    Hemodynamics was measured continuously for up to 10 minutes following study drug administration.
    Measurements and main results: After study drug administration, the ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving
    phenylephrine, five of five patients receiving arginine vasopressin, and three of five patients receiving epinephrine. Although all three medications resulted in an
    increase in aortic pressure, only arginine vasopressin consistently resulted in a decrease in the ratio of systolic pulmonary artery-to-aortic pressure.
    Conclusions: This prospective pilot study of phenylephrine, arginine vasopressin, and epinephrine in pediatric patients with pulmonary hypertensive showed an
    increase in aortic pressure with all drugs although only vasopressin resulted in a consistent decrease in the ratio of pulmonary-to-systemic vascular resistance. Studies
    with more subjects are warranted to define optimal dosing strategies of these medications in an acute pulmonary hypertensive crisis.
    Stephanie L Siehr, Jeffrey A Feinstein, Weiguang Yang, Lynn F Peng, Michelle T Ogawa, Chandra Ramamoorthy. Pediatr Crit Care Med (2016)
    PMID: 27144689 4

    View full-size slide

  5. Clinical trial abstract - treatment groups
    Design: In this prospective, open-label, nonrandomized pilot study, we enrolled pediatric patients
    previously diagnosed with pulmonary hypertensive who were scheduled electively for cardiac
    catheterization. (...)
    Patients: Eleven of 15 participants were women, median age was 9.2 years (range, 1.7-14.9 yr), and median
    weight was 26.8 kg (range, 8.5-55.2 kg). Baseline mean pulmonary artery pressure was 49 ± 19 mm Hg,
    and mean indexed pulmonary vascular resistance was 10 ± 5.4 Wood units. Etiology of pulmonary
    hypertensive varied, and all were on systemic pulmonary hypertensive medications.
    Interventions: Patients 1-5 received phenylephrine 1 μg/kg; patients 6-10 received arginine vasopressin
    0.03 U/kg; and patients 11-15 received epinephrine 1 μg/kg. Hemodynamics was measured continuously
    for up to 10 minutes following study drug administration.
    5
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  6. Clinical trial abstract - outcomes
    Design: (...) Primary outcome was a change in the ratio of pulmonary-to-systemic vascular resistance.
    (...)
    Measurements and main results: After study drug administration, the ratio of pulmonary-to-systemic
    vascular resistance decreased in three of five patients receiving phenylephrine, five of five patients
    receiving arginine vasopressin, and three of five patients receiving epinephrine. Although all three
    medications resulted in an increase in aortic pressure, only arginine vasopressin consistently resulted in a
    decrease in the ratio of systolic pulmonary artery-to-aortic pressure.
    6
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  7. Clinical trial abstract - NLP output
    #
    patients
    Treatment
    Drug
    Treatment
    Dose
    Outcome:
    Decreased ratio of PVR to SVR
    Group 1 5 phenylephrine 1 μg/kg 3
    Group 2 5
    arginine
    vasopressin
    0.03 U/kg 5
    Group 3 5 epinephrine 1 μg/kg 3
    Ideally, you want your NLP solution to extract a structured summary:
    7
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  8. NLP complexity of this challenge (1)
    ● Named Entities like drugs, diseases, ...
    ○ Standard NLP challenge, pre-trained models often exist
    ○ https://github.com/AstraZeneca/KAZU ;-)
    ● Treatment dose and frequency
    ○ Probably pretty doable with some type of pattern matching
    ● Patient/treatment groups
    ○ Non-standard, challenging NLP target
    ○ Groups can be unique because of different prior conditions, prior treatments, patient
    characteristics, behavioural patterns, treatment drug or dose, treatment frequency, ...
    ○ Group size can be mentioned in various ways, e.g. "5 patients" or "3 women and 2 men"
    8
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  9. NLP complexity of this challenge (2)
    ● Primary/secondary endpoints of the study
    ○ Can be partly dictionary-based: e.g. "Progression-free survival", "PFS", "CR", ...
    ○ Can be much more complex: e.g. "a change in the ratio of pulmonary-to-systemic vascular
    resistance"
    ● Outcomes per patient group & endpoint
    ○ Requires linking across different paragraphs, resolving references and synonyms, and
    understanding complex sentence structures
    ○ As NLP tasks go, this one's pretty challenging!
    ○ With a classical NLP approach, might require a mix of NER, entity linking, coreference
    resolution, relation extraction
    9
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  10. LLMs to the rescue?
    Generative vs predictive tasks
    ● LLMs are primarily used for text generation
    ○ Often user-facing tasks
    ○ Text summarization, question answering, writing a poem, etc.
    ● They can be useful for structured NLP as well
    ○ Extracting structured attributes such as named entities, part-of-speech tags, ...
    ○ Better allows automated integration with downstream applications
    ■ Can we extract a structured table of results from the clinical trial abstract?
    10
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  11. Prototype with ChatGPT - prompt
    Summarize the trial results in a structured fashion like so:
    Patient group:
    Number of patients in the group:
    Treatment drug or substance:
    Treatment dose:
    Treatment frequency of administration:
    Treatment duration:
    Outcome:
    11
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  12. Prototype with ChatGPT - output
    Patient group: Phenylephrine Group
    Number of patients in the group: 5
    Treatment drug or substance: Phenylephrine
    Treatment dose: 1 μg/kg
    Treatment frequency of administration: Single administration
    Treatment duration: Not specified
    Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving
    phenylephrine. Increase in aortic pressure observed.
    Patient group: Arginine Vasopressin Group
    Number of patients in the group: 5
    Treatment drug or substance: Arginine vasopressin
    Treatment dose: 0.03 U/kg
    Treatment frequency of administration: Single administration
    Treatment duration: Not specified
    Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in all five patients receiving arginine
    vasopressin. Increase in aortic pressure observed. Consistent decrease in the ratio of systolic pulmonary
    artery-to-aortic pressure observed.
    Patient group: Epinephrine Group
    (...)
    12
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  13. A few observations ...
    13
    ● ChatGPT manages to link information spread across paragraphs to each other,
    e.g. the drug + dose + outcome per patient group
    ● ChatGPT assumes that no mention of frequency of administration in the
    abstract, equals "single administration"
    ● ChatGPT is able to admit what it doesn't know, e.g. "Not specified" for the
    Treatment duration (which is, indeed, not specified in the abstract)
    ● Not bad for a quick prototype!
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  14. Disadvantages of using the ChatGPT web interface
    14
    ● No API
    ● No batching
    ● No robustness
    ● No data privacy
    ● No reproducibility
    ● Whole new meaning of "black box"
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  15. spacy-llm: Integrating LLMs into structured NLP pipelines
    15
    ● Support for external API's (OpenAI, Cohere, Anthropic, ...)
    as well as open-source models (via HuggingFace)
    ● Built-in support for various standard NLP tasks such as text classification,
    NER, relation extraction, text summarization, ...
    ● Relatively easy to implement your own, custom tasks
    ● https://github.com/explosion/spacy-llm
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  16. spacy-llm follows the main design principles of spaCy
    16
    ● Free, open-source library
    ● Designed for production use
    ● Focus on developer productivity
    ○ Built-in functionality to help you hit the ground running
    ○ Customizability & extensibility of the framework to implement anything your use-case needs
    ● Reproducibility of experiments by using a detailed config file
    ● Use rich data structures for results and metadata
    ● Break NLP challenge down into a pipeline of highly specific chained tasks
    ● https://github.com/explosion/spacy
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  17. Built-in zero-shot NER with spacy-llm
    17
    my_config.cfg
    [nlp]
    lang = "en"
    pipeline = ["llm"]
    batch_size = 128
    [components]
    [components.llm]
    factory = "llm"
    [components.llm.model]
    @llm_models = "spacy.GPT-4.v2"
    [components.llm.task]
    @llm_tasks = "spacy.NER.v2"
    labels = ["Drug", "Dose"]
    from spacy_llm.util import assemble
    text = _read_trial(pmid=27144689)
    nlp = assemble(config_path)
    doc = nlp(text)
    my_script.py
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  18. Zero-shot NER with LLMs
    18
    ● Performance highly dependent on the label(s)
    ○ How commonly known these types of entities are
    ○ How descriptive & accurate the label text is, e.g.
    ■ "Dose" vs. "TreatmentDose"
    ■ "Drug" vs. "Chemical" etc
    ● Reproducibility can be tricky because the LLM's responses may vary
    ○ For classification (not generation) tasks, you'll typically want to set temperature to 0.0
    ○ You can provide model-specific parameters in the config file:
    [components.llm.model]
    @llm_models = "spacy.GPT-4.v2"
    config = {"seed": 342, "temperature": 0.0}
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  19. Few-shot NER "Chain-of-thought" prompting
    19
    ● Based on the PromptNER paper by Ashok and Lipton (2023)
    ● Asking the LLM to explain its reasoning - giving it "tokens to think"
    ● Reimplemented in spacy-llm, available as spacy.NER.v3
    ● Increase of 15 percentage points F-score on an internal use-case
    ● Works best when providing label definitions and examples -
    these allow you to tune the prompt towards the desired results
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  20. Built-in few-shot NER with spacy-llm (1)
    20
    my_config.cfg
    [components.llm.task]
    @llm_tasks = "spacy.NER.v3"
    labels = ["Drug", "Dose"]
    description = Entities are drugs or their doses. They can be uppercased, title-cased, or
    lowercased. Each occurrence of an entity in the text should be extracted.
    [components.llm.task.label_definitions]
    Drug = "A medicine or drug given to a patient as a treatment. Can be a generic name or
    brand name, e.g. paracetamol, Aspirin"
    Dose = "The measured quantity (dose) of a certain medicine given to patients, e.g. 1mg.
    This should exclude the drug name."
    [components.llm.task.examples]
    @misc = "spacy.FewShotReader.v1"
    path = "my_fewshot.json"
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  21. Built-in few-shot NER with spacy-llm (2)
    21
    my_fewshot.json
    "text": "The patient was given 1mg of paracetamol.",
    "spans": [
    {
    "text": "paracetamol",
    "is_entity": true,
    "label": "Drug",
    "reason": "is a drug name, used as medication"
    },
    {
    "text": "1mg",
    "is_entity": true,
    "label": "Dose",
    "reason": "is the quantity or dose of the given medication"
    },
    {
    "text": "patient",
    "is_entity": false,
    "label": "==NONE==",
    "reason": "is a person, not a drug or dose"
    }
    ]
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  22. Implementation of a custom task
    22
    my_task.py
    INSTRUCTION = "Summarize the following clinical trial in a
    structured fashion. (...)"
    @registry.llm_tasks("tutorial.TrialSummary.v1")
    def make_trial_task() -> "TrialSummaryTask":
    return TrialSummaryTask(INSTRUCTION)
    class TrialSummaryTask(LLMTask):
    def __init__(self, instruction: str):
    self.instruction = instruction
    def generate_prompts(self, docs):
    for doc in docs:
    yield self.instruction + "\n\n" + doc.text
    def parse_responses(self, docs):
    ...
    my_config.cfg
    [nlp]
    lang = "en"
    pipeline = ["llm"]
    batch_size = 128
    [components]
    [components.llm]
    factory = "llm"
    [components.llm.model]
    @llm_models = "spacy.GPT-4.v2"
    [components.llm.task]
    @llm_tasks = "tutorial.TrialSummary.v1"
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  23. Output from the spacy-llm pipeline using GPT-4
    Patient group: Group 1
    Number of patients in the group: 5
    Treatment drug or substance: Phenylephrine
    Treatment dose: 1 μg/kg
    Treatment frequency of administration: Single dose
    Treatment duration: Up to 10 minutes following study drug administration
    Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients. An increase in
    aortic pressure was observed.
    Patient group: Group 2
    Number of patients in the group: 5
    Treatment drug or substance: Arginine vasopressin
    Treatment dose: 0.03 U/kg
    Treatment frequency of administration: Single dose
    Treatment duration: Up to 10 minutes following study drug administration
    Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in all five patients. An increase in aortic
    pressure was observed. Arginine vasopressin consistently resulted in a decrease in the ratio of systolic pulmonary
    artery-to-aortic pressure.
    Patient group: Group 3
    (...)
    23
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  24. Unfortunately, the output is still text ...
    24
    ● The "outcome" field contains full sentences
    ● Different ways LLMs express a "single" treatment frequency:
    ○ Administered once
    ○ Single administration
    ○ One-time dose
    ○ One time
    ○ Single dose
    ○ One-time administration
    ○ Once
    ● You still need to post-process the results to structured fields
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  25. From text to structured data: a pipeline approach
    25
    .txt file
    spaCy
    pipeline
    Doc object
    [nlp]
    lang = "en"
    pipeline = ["llm_ner", "entity_linker"]
    .txt file llm_ner Doc object
    entity_linker
    doc.ents
    ent.text ="phenylephrine"
    ent.label = "Drug"
    Sofie Van Landeghem - NLP Community - January 23, 2024
    doc.ents
    ent.text ="phenylephrine"
    ent.label = "Drug"
    ent.kb_id = CHEMBL1215

    View full-size slide

  26. From a trial text to structured output
    26
    .txt file llm_trial Doc object
    normalizer
    Parse
    responses
    {
    "Group 1":
    {
    "Number of patients": 5
    "Drug": "Phenylephrine"
    "Dose": 1 μg/kg
    "Frequency": "Single dose"
    "Outcome":"The ratio of
    pulmonary-to-systemic
    vascular resistance decreased
    in three of five patients."
    },
    "Group 2":
    ...
    }
    Sofie Van Landeghem - NLP Community - January 23, 2024
    {
    "Group 1":
    {
    "Number of patients": 5
    "Drug": "CHEMBL1215"
    "Dose": 1 μg/kg
    "Frequency": 1/trial
    "Outcome":
    {
    "Ratio PVR to SVR":
    {
    "Decrease": 3
    }
    }
    },
    "Group 2":
    ...
    }
    entity_linker
    Standardize,
    summarize

    View full-size slide

  27. Swap out the LLM backend
    27
    ● Closed-source LLM
    ○ Sometimes better accuracy out-of-the box
    ○ Service can be unreliable (time-out, rate limits, ...)
    ○ All data sent to a third party
    ○ Often costly
    ● Open-source LLM
    ○ Can be customized / fine-tuned
    ○ More reliable
    ○ Requires dedicated hardware
    ○ Data privacy
    [components.llm]
    factory = "llm"
    [components.llm.model]
    @llm_models = "spacy.GPT-4.v2"
    [components.llm]
    factory = "llm"
    [components.llm.model]
    @llm_models = "spacy.Mistral.v1"
    name = "Mistral-7B-v0.1"
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  28. Swap out a component architecture
    28
    ● LLM
    ○ Quick prototyping
    ○ Can be unreliable/unstable
    ○ Expensive
    ● Supervised Machine Learning
    ○ Manual annotation effort
    ○ Faster & more reliable inference
    ○ Train your own or source a pretrained model
    ○ Cost-efficient
    ● Rules/patterns
    ○ Manual effort & maintenance burden
    ○ Higher customizability & interpretability
    [components.my_ner]
    factory = "llm"
    [components.my_ner.task]
    @llm_tasks = "spacy.NER.v3"
    [components.my_ner]
    source = "en_core_sci_lg"
    name = "ner"
    [components.my_ner]
    factory = "span_ruler"
    annotate_ents = true
    [components.my_ner]
    factory = "ner"
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  29. Combine the best of both worlds (example)
    29
    ● Machine learning textcat model
    ○ Identifies topics of sentences, paragraphs or full documents
    ○ Classifies the document as relevant or not (e.g. "Clinical trial abstract" or not?)
    ● LLM
    ○ Only processes those documents that were deemed relevant in the previous step
    ○ Avoid inducing unnecessary costs
    .txt file textcat Doc object
    llm
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  30. Annotations are still important
    30
    ● Evaluation
    ○ Every project requires at least a representative evaluation set!
    ○ Measure performance of single components
    ○ Measure performance of the full pipeline end-to-end
    ○ Measure progress while changing/fine-tuning the pipeline
    ● Training a supervised model
    ○ Smaller and specialized models can be more cost efficient
    ● Tuning an LLM
    ○ Providing "difficult" examples as few-shot examples in the prompt
    ○ Actually running a fine tuning learning step of your LLM
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  31. LLM-assisted annotation (NER)
    31
    Curate zero-shot
    LLM predictions
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  32. LLM-assisted annotation (REL)
    32
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  33. Summary
    33
    ● LLMs are great for quick prototyping and for bootstrapping annotation
    ● NLP solutions need to balance various metrics including accuracy, reliability,
    maintainability, customizability and cost
    ○ Mix and match LLMs with supervised models or rule-based components
    ○ spaCy pipelines are very versatile
    ○ Easily swap out one component while keeping other components in the pipeline the same
    ● spacy-llm lets you easily integrate LLMs into structured NLP pipelines
    ○ Swap out backends easily, switching between closed-source LLMs (API) and open-source ones
    ○ Use built-in standard NLP tasks
    ○ Write your own custom task, fine-tune the prompt, etc
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide

  34. Thanks!
    34
    ● Contact
    ○ sofi[email protected]
    ○ http://www.oxykodit.com
    ○ https://www.linkedin.com/in/sofievanlandeghem/
    ○ https://twitter.com/OxyKodit
    ○ https://explosion.ai/tailored-solutions
    ● Resources
    ○ https://github.com/explosion/spacy-llm/
    ○ https://spacy.io/usage/large-language-models
    ○ https://prodi.gy/docs/large-language-models
    Core maintainer of spaCy @ Explosion
    NLP and ML freelancer @ OxyKodit
    Sofie Van Landeghem - NLP Community - January 23, 2024

    View full-size slide