Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2024-01-23-az

 2024-01-23-az

Presentation given to AZ NLP Community of Practice

Sofie Van Landeghem

January 23, 2024
Tweet

More Decks by Sofie Van Landeghem

Other Decks in Research

Transcript

  1. spacy-llm From quick prototyping with LLMs to more reliable and

    efficient NLP solutions Sofie Van Landeghem, PhD. Core maintainer of spaCy @ Explosion NLP and ML freelancer @ OxyKodit
  2. Briefly about me ... • 2008 - 2012: PhD in

    BioNLP, mostly working on Biomedical event extraction ◦ SVMs were the new kid on the block - feature engineering is fun! • 2013 - 2014: PostDoc in Bioinformatics, combining BioNLP with network analysis ◦ Seeing the rise of word embeddings with the publication of the word2vec paper by Mikolov et al. • 2015 - 2018: Data scientist @ Johnson & Johnson ◦ Bridging the gap between state-of-the-art NLP and (often harsh) business reality • 2019 - 2024: Freelancing + maintainer of open-source NLP toolbox spaCy ◦ Front row seat to the transformer-based revolution and LLM disruption 2 Sofie Van Landeghem - NLP Community - January 23, 2024
  3. This talk ... • Will explain the design principles of

    the open-source toolbox spaCy • Will showcase how to use its recent plugin spacy-llm to perform rapid prototyping with Large Language Models (LLMs) • Will demonstrate how to move beyond a prototype into a more reliable, efficient, and maintainable solution • Will use clinical trial analysis as an example application 3 Sofie Van Landeghem - NLP Community - January 23, 2024
  4. Clinical trial abstract Hemodynamic Effects of Phenylephrine, Vasopressin, and Epinephrine

    in Children With Pulmonary Hypertension: A Pilot Study Objectives: During a pulmonary hypertensive crisis, the marked increase in pulmonary vascular resistance can result in acute right ventricular failure and death. Currently, there are no therapeutic guidelines for managing an acute crisis. This pilot study examined the hemodynamic effects of phenylephrine, arginine vasopressin, and epinephrine in pediatric patients with pulmonary hypertension. Design: In this prospective, open-label, nonrandomized pilot study, we enrolled pediatric patients previously diagnosed with pulmonary hypertensive who were scheduled electively for cardiac catheterization. Primary outcome was a change in the ratio of pulmonary-to-systemic vascular resistance. Baseline hemodynamic data were collected before and after the study drug was administered. Patients: Eleven of 15 participants were women, median age was 9.2 years (range, 1.7-14.9 yr), and median weight was 26.8 kg (range, 8.5-55.2 kg). Baseline mean pulmonary artery pressure was 49 ± 19 mm Hg, and mean indexed pulmonary vascular resistance was 10 ± 5.4 Wood units. Etiology of pulmonary hypertensive varied, and all were on systemic pulmonary hypertensive medications. Interventions: Patients 1-5 received phenylephrine 1 μg/kg; patients 6-10 received arginine vasopressin 0.03 U/kg; and patients 11-15 received epinephrine 1 μg/kg. Hemodynamics was measured continuously for up to 10 minutes following study drug administration. Measurements and main results: After study drug administration, the ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving phenylephrine, five of five patients receiving arginine vasopressin, and three of five patients receiving epinephrine. Although all three medications resulted in an increase in aortic pressure, only arginine vasopressin consistently resulted in a decrease in the ratio of systolic pulmonary artery-to-aortic pressure. Conclusions: This prospective pilot study of phenylephrine, arginine vasopressin, and epinephrine in pediatric patients with pulmonary hypertensive showed an increase in aortic pressure with all drugs although only vasopressin resulted in a consistent decrease in the ratio of pulmonary-to-systemic vascular resistance. Studies with more subjects are warranted to define optimal dosing strategies of these medications in an acute pulmonary hypertensive crisis. Stephanie L Siehr, Jeffrey A Feinstein, Weiguang Yang, Lynn F Peng, Michelle T Ogawa, Chandra Ramamoorthy. Pediatr Crit Care Med (2016) PMID: 27144689 4
  5. Clinical trial abstract - treatment groups Design: In this prospective,

    open-label, nonrandomized pilot study, we enrolled pediatric patients previously diagnosed with pulmonary hypertensive who were scheduled electively for cardiac catheterization. (...) Patients: Eleven of 15 participants were women, median age was 9.2 years (range, 1.7-14.9 yr), and median weight was 26.8 kg (range, 8.5-55.2 kg). Baseline mean pulmonary artery pressure was 49 ± 19 mm Hg, and mean indexed pulmonary vascular resistance was 10 ± 5.4 Wood units. Etiology of pulmonary hypertensive varied, and all were on systemic pulmonary hypertensive medications. Interventions: Patients 1-5 received phenylephrine 1 μg/kg; patients 6-10 received arginine vasopressin 0.03 U/kg; and patients 11-15 received epinephrine 1 μg/kg. Hemodynamics was measured continuously for up to 10 minutes following study drug administration. 5 Sofie Van Landeghem - NLP Community - January 23, 2024
  6. Clinical trial abstract - outcomes Design: (...) Primary outcome was

    a change in the ratio of pulmonary-to-systemic vascular resistance. (...) Measurements and main results: After study drug administration, the ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving phenylephrine, five of five patients receiving arginine vasopressin, and three of five patients receiving epinephrine. Although all three medications resulted in an increase in aortic pressure, only arginine vasopressin consistently resulted in a decrease in the ratio of systolic pulmonary artery-to-aortic pressure. 6 Sofie Van Landeghem - NLP Community - January 23, 2024
  7. Clinical trial abstract - NLP output # patients Treatment Drug

    Treatment Dose Outcome: Decreased ratio of PVR to SVR Group 1 5 phenylephrine 1 μg/kg 3 Group 2 5 arginine vasopressin 0.03 U/kg 5 Group 3 5 epinephrine 1 μg/kg 3 Ideally, you want your NLP solution to extract a structured summary: 7 Sofie Van Landeghem - NLP Community - January 23, 2024
  8. NLP complexity of this challenge (1) • Named Entities like

    drugs, diseases, ... ◦ Standard NLP challenge, pre-trained models often exist ◦ https://github.com/AstraZeneca/KAZU ;-) • Treatment dose and frequency ◦ Probably pretty doable with some type of pattern matching • Patient/treatment groups ◦ Non-standard, challenging NLP target ◦ Groups can be unique because of different prior conditions, prior treatments, patient characteristics, behavioural patterns, treatment drug or dose, treatment frequency, ... ◦ Group size can be mentioned in various ways, e.g. "5 patients" or "3 women and 2 men" 8 Sofie Van Landeghem - NLP Community - January 23, 2024
  9. NLP complexity of this challenge (2) • Primary/secondary endpoints of

    the study ◦ Can be partly dictionary-based: e.g. "Progression-free survival", "PFS", "CR", ... ◦ Can be much more complex: e.g. "a change in the ratio of pulmonary-to-systemic vascular resistance" • Outcomes per patient group & endpoint ◦ Requires linking across different paragraphs, resolving references and synonyms, and understanding complex sentence structures ◦ As NLP tasks go, this one's pretty challenging! ◦ With a classical NLP approach, might require a mix of NER, entity linking, coreference resolution, relation extraction 9 Sofie Van Landeghem - NLP Community - January 23, 2024
  10. LLMs to the rescue? Generative vs predictive tasks • LLMs

    are primarily used for text generation ◦ Often user-facing tasks ◦ Text summarization, question answering, writing a poem, etc. • They can be useful for structured NLP as well ◦ Extracting structured attributes such as named entities, part-of-speech tags, ... ◦ Better allows automated integration with downstream applications ▪ Can we extract a structured table of results from the clinical trial abstract? 10 Sofie Van Landeghem - NLP Community - January 23, 2024
  11. Prototype with ChatGPT - prompt Summarize the trial results in

    a structured fashion like so: Patient group: <name> Number of patients in the group: <number> Treatment drug or substance: <drug> Treatment dose: <dose> Treatment frequency of administration: <frequency> Treatment duration: <duration> Outcome: <outcome> 11 Sofie Van Landeghem - NLP Community - January 23, 2024
  12. Prototype with ChatGPT - output Patient group: Phenylephrine Group Number

    of patients in the group: 5 Treatment drug or substance: Phenylephrine Treatment dose: 1 μg/kg Treatment frequency of administration: Single administration Treatment duration: Not specified Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving phenylephrine. Increase in aortic pressure observed. Patient group: Arginine Vasopressin Group Number of patients in the group: 5 Treatment drug or substance: Arginine vasopressin Treatment dose: 0.03 U/kg Treatment frequency of administration: Single administration Treatment duration: Not specified Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in all five patients receiving arginine vasopressin. Increase in aortic pressure observed. Consistent decrease in the ratio of systolic pulmonary artery-to-aortic pressure observed. Patient group: Epinephrine Group (...) 12 Sofie Van Landeghem - NLP Community - January 23, 2024
  13. A few observations ... 13 • ChatGPT manages to link

    information spread across paragraphs to each other, e.g. the drug + dose + outcome per patient group • ChatGPT assumes that no mention of frequency of administration in the abstract, equals "single administration" • ChatGPT is able to admit what it doesn't know, e.g. "Not specified" for the Treatment duration (which is, indeed, not specified in the abstract) • Not bad for a quick prototype! Sofie Van Landeghem - NLP Community - January 23, 2024
  14. Disadvantages of using the ChatGPT web interface 14 • No

    API • No batching • No robustness • No data privacy • No reproducibility • Whole new meaning of "black box" Sofie Van Landeghem - NLP Community - January 23, 2024
  15. spacy-llm: Integrating LLMs into structured NLP pipelines 15 • Support

    for external API's (OpenAI, Cohere, Anthropic, ...) as well as open-source models (via HuggingFace) • Built-in support for various standard NLP tasks such as text classification, NER, relation extraction, text summarization, ... • Relatively easy to implement your own, custom tasks • https://github.com/explosion/spacy-llm Sofie Van Landeghem - NLP Community - January 23, 2024
  16. spacy-llm follows the main design principles of spaCy 16 •

    Free, open-source library • Designed for production use • Focus on developer productivity ◦ Built-in functionality to help you hit the ground running ◦ Customizability & extensibility of the framework to implement anything your use-case needs • Reproducibility of experiments by using a detailed config file • Use rich data structures for results and metadata • Break NLP challenge down into a pipeline of highly specific chained tasks • https://github.com/explosion/spacy Sofie Van Landeghem - NLP Community - January 23, 2024
  17. Built-in zero-shot NER with spacy-llm 17 my_config.cfg [nlp] lang =

    "en" pipeline = ["llm"] batch_size = 128 [components] [components.llm] factory = "llm" [components.llm.model] @llm_models = "spacy.GPT-4.v2" [components.llm.task] @llm_tasks = "spacy.NER.v2" labels = ["Drug", "Dose"] from spacy_llm.util import assemble text = _read_trial(pmid=27144689) nlp = assemble(config_path) doc = nlp(text) my_script.py Sofie Van Landeghem - NLP Community - January 23, 2024
  18. Zero-shot NER with LLMs 18 • Performance highly dependent on

    the label(s) ◦ How commonly known these types of entities are ◦ How descriptive & accurate the label text is, e.g. ▪ "Dose" vs. "TreatmentDose" ▪ "Drug" vs. "Chemical" etc • Reproducibility can be tricky because the LLM's responses may vary ◦ For classification (not generation) tasks, you'll typically want to set temperature to 0.0 ◦ You can provide model-specific parameters in the config file: [components.llm.model] @llm_models = "spacy.GPT-4.v2" config = {"seed": 342, "temperature": 0.0} Sofie Van Landeghem - NLP Community - January 23, 2024
  19. Few-shot NER "Chain-of-thought" prompting 19 • Based on the PromptNER

    paper by Ashok and Lipton (2023) • Asking the LLM to explain its reasoning - giving it "tokens to think" • Reimplemented in spacy-llm, available as spacy.NER.v3 • Increase of 15 percentage points F-score on an internal use-case • Works best when providing label definitions and examples - these allow you to tune the prompt towards the desired results Sofie Van Landeghem - NLP Community - January 23, 2024
  20. Built-in few-shot NER with spacy-llm (1) 20 my_config.cfg [components.llm.task] @llm_tasks

    = "spacy.NER.v3" labels = ["Drug", "Dose"] description = Entities are drugs or their doses. They can be uppercased, title-cased, or lowercased. Each occurrence of an entity in the text should be extracted. [components.llm.task.label_definitions] Drug = "A medicine or drug given to a patient as a treatment. Can be a generic name or brand name, e.g. paracetamol, Aspirin" Dose = "The measured quantity (dose) of a certain medicine given to patients, e.g. 1mg. This should exclude the drug name." [components.llm.task.examples] @misc = "spacy.FewShotReader.v1" path = "my_fewshot.json" Sofie Van Landeghem - NLP Community - January 23, 2024
  21. Built-in few-shot NER with spacy-llm (2) 21 my_fewshot.json "text": "The

    patient was given 1mg of paracetamol.", "spans": [ { "text": "paracetamol", "is_entity": true, "label": "Drug", "reason": "is a drug name, used as medication" }, { "text": "1mg", "is_entity": true, "label": "Dose", "reason": "is the quantity or dose of the given medication" }, { "text": "patient", "is_entity": false, "label": "==NONE==", "reason": "is a person, not a drug or dose" } ] Sofie Van Landeghem - NLP Community - January 23, 2024
  22. Implementation of a custom task 22 my_task.py INSTRUCTION = "Summarize

    the following clinical trial in a structured fashion. (...)" @registry.llm_tasks("tutorial.TrialSummary.v1") def make_trial_task() -> "TrialSummaryTask": return TrialSummaryTask(INSTRUCTION) class TrialSummaryTask(LLMTask): def __init__(self, instruction: str): self.instruction = instruction def generate_prompts(self, docs): for doc in docs: yield self.instruction + "\n\n" + doc.text def parse_responses(self, docs): ... my_config.cfg [nlp] lang = "en" pipeline = ["llm"] batch_size = 128 [components] [components.llm] factory = "llm" [components.llm.model] @llm_models = "spacy.GPT-4.v2" [components.llm.task] @llm_tasks = "tutorial.TrialSummary.v1" Sofie Van Landeghem - NLP Community - January 23, 2024
  23. Output from the spacy-llm pipeline using GPT-4 Patient group: Group

    1 Number of patients in the group: 5 Treatment drug or substance: Phenylephrine Treatment dose: 1 μg/kg Treatment frequency of administration: Single dose Treatment duration: Up to 10 minutes following study drug administration Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients. An increase in aortic pressure was observed. Patient group: Group 2 Number of patients in the group: 5 Treatment drug or substance: Arginine vasopressin Treatment dose: 0.03 U/kg Treatment frequency of administration: Single dose Treatment duration: Up to 10 minutes following study drug administration Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in all five patients. An increase in aortic pressure was observed. Arginine vasopressin consistently resulted in a decrease in the ratio of systolic pulmonary artery-to-aortic pressure. Patient group: Group 3 (...) 23 Sofie Van Landeghem - NLP Community - January 23, 2024
  24. Unfortunately, the output is still text ... 24 • The

    "outcome" field contains full sentences • Different ways LLMs express a "single" treatment frequency: ◦ Administered once ◦ Single administration ◦ One-time dose ◦ One time ◦ Single dose ◦ One-time administration ◦ Once • You still need to post-process the results to structured fields Sofie Van Landeghem - NLP Community - January 23, 2024
  25. From text to structured data: a pipeline approach 25 .txt

    file spaCy pipeline Doc object [nlp] lang = "en" pipeline = ["llm_ner", "entity_linker"] .txt file llm_ner Doc object entity_linker doc.ents ent.text ="phenylephrine" ent.label = "Drug" Sofie Van Landeghem - NLP Community - January 23, 2024 doc.ents ent.text ="phenylephrine" ent.label = "Drug" ent.kb_id = CHEMBL1215
  26. From a trial text to structured output 26 .txt file

    llm_trial Doc object normalizer Parse responses { "Group 1": { "Number of patients": 5 "Drug": "Phenylephrine" "Dose": 1 μg/kg "Frequency": "Single dose" "Outcome":"The ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients." }, "Group 2": ... } Sofie Van Landeghem - NLP Community - January 23, 2024 { "Group 1": { "Number of patients": 5 "Drug": "CHEMBL1215" "Dose": 1 μg/kg "Frequency": 1/trial "Outcome": { "Ratio PVR to SVR": { "Decrease": 3 } } }, "Group 2": ... } entity_linker Standardize, summarize
  27. Swap out the LLM backend 27 • Closed-source LLM ◦

    Sometimes better accuracy out-of-the box ◦ Service can be unreliable (time-out, rate limits, ...) ◦ All data sent to a third party ◦ Often costly • Open-source LLM ◦ Can be customized / fine-tuned ◦ More reliable ◦ Requires dedicated hardware ◦ Data privacy [components.llm] factory = "llm" [components.llm.model] @llm_models = "spacy.GPT-4.v2" [components.llm] factory = "llm" [components.llm.model] @llm_models = "spacy.Mistral.v1" name = "Mistral-7B-v0.1" Sofie Van Landeghem - NLP Community - January 23, 2024
  28. Swap out a component architecture 28 • LLM ◦ Quick

    prototyping ◦ Can be unreliable/unstable ◦ Expensive • Supervised Machine Learning ◦ Manual annotation effort ◦ Faster & more reliable inference ◦ Train your own or source a pretrained model ◦ Cost-efficient • Rules/patterns ◦ Manual effort & maintenance burden ◦ Higher customizability & interpretability [components.my_ner] factory = "llm" [components.my_ner.task] @llm_tasks = "spacy.NER.v3" [components.my_ner] source = "en_core_sci_lg" name = "ner" [components.my_ner] factory = "span_ruler" annotate_ents = true [components.my_ner] factory = "ner" Sofie Van Landeghem - NLP Community - January 23, 2024
  29. Combine the best of both worlds (example) 29 • Machine

    learning textcat model ◦ Identifies topics of sentences, paragraphs or full documents ◦ Classifies the document as relevant or not (e.g. "Clinical trial abstract" or not?) • LLM ◦ Only processes those documents that were deemed relevant in the previous step ◦ Avoid inducing unnecessary costs .txt file textcat Doc object llm Sofie Van Landeghem - NLP Community - January 23, 2024
  30. Annotations are still important 30 • Evaluation ◦ Every project

    requires at least a representative evaluation set! ◦ Measure performance of single components ◦ Measure performance of the full pipeline end-to-end ◦ Measure progress while changing/fine-tuning the pipeline • Training a supervised model ◦ Smaller and specialized models can be more cost efficient • Tuning an LLM ◦ Providing "difficult" examples as few-shot examples in the prompt ◦ Actually running a fine tuning learning step of your LLM Sofie Van Landeghem - NLP Community - January 23, 2024
  31. Summary 33 • LLMs are great for quick prototyping and

    for bootstrapping annotation • NLP solutions need to balance various metrics including accuracy, reliability, maintainability, customizability and cost ◦ Mix and match LLMs with supervised models or rule-based components ◦ spaCy pipelines are very versatile ◦ Easily swap out one component while keeping other components in the pipeline the same • spacy-llm lets you easily integrate LLMs into structured NLP pipelines ◦ Swap out backends easily, switching between closed-source LLMs (API) and open-source ones ◦ Use built-in standard NLP tasks ◦ Write your own custom task, fine-tune the prompt, etc Sofie Van Landeghem - NLP Community - January 23, 2024
  32. Thanks! 34 • Contact ◦ sofi[email protected] ◦ http://www.oxykodit.com ◦ https://www.linkedin.com/in/sofievanlandeghem/

    ◦ https://twitter.com/OxyKodit ◦ https://explosion.ai/tailored-solutions • Resources ◦ https://github.com/explosion/spacy-llm/ ◦ https://spacy.io/usage/large-language-models ◦ https://prodi.gy/docs/large-language-models Core maintainer of spaCy @ Explosion NLP and ML freelancer @ OxyKodit Sofie Van Landeghem - NLP Community - January 23, 2024