Slide 1

Slide 1 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Herding LLMs Towards Structured NLP With spacy-llm 1

Slide 2

Slide 2 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 On Structured NLP (SNLP) ● Goal of extracting a defined set of attributes from texts ○ Entities (locations, persons, …), lemmas, categories, … ○ “Classic” NLP: predictive models ○ SOTA models: often BERT-level transformer models ● Real-life applications chain together several of these tasks ○ E. g. entity recognition, entity linking, … ● Downstream applications often depend on tangible, unambiguous information ○ On doc level: e. g. doc category; on span level: e. g. entity; on token level: e. g. lemma, POS tags, … ● Libraries: spaCy, Stanza, Gensim, Hugging Face, … 2

Slide 3

Slide 3 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 spaCy ● Free, open-source library ● Designed for production use ● Focus on dev productivity ● Free course: https://course.spacy.io 3

Slide 4

Slide 4 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 spaCy ● A modular pipeline approach for linguistic analysis ● Transforming unstructured text into structured data objects 4

Slide 5

Slide 5 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 On Large Language Models ● Generative as opposed to predictive ● Pros ○ Great for prototyping, zero-shot, low dev effort, versatility, … ○ Can yield great results, maybe surpassing small predictive model for some tasks with proper prompting ● Cons ○ Latency, costs/hardware requirements, free-form text, hallucinations ● Libraries: Hugging Face, llama.cpp, LangChain, … ● Providers: OpenAI, Anthropic, Cohere, Google, Amazon, … 5

Slide 6

Slide 6 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 SNLP & LLMs: Not a Dichotomy ● Doing SNLP with LLMs is possible, but usually gets less attention than “AI magic” ● LLMs generate relatively unconstrained responses ○ Turn text into…text ○ Can be narrowed down via e. g. pre-training, fine-tuning, prompts, guardrails ○ Parsing necessary for SNLP ● Modular vs. monolithic approach (not necessarily, but often) ● SNLP more likely to be suitable for industrial/real-life applications 6

Slide 7

Slide 7 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 7 “It's never great to realize you're the guy on the left. But, here I am.” https://explosion.ai/blog/against-llm-maximalism

Slide 8

Slide 8 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Use Case: Clinical Trial Results ● Task: extract information from human-written notes on clinical trial results (e. g. from a paper) ● How does workflow compare between predictive (small) and generative (LLM)? ● Specialized domain, available predictive models trained on general corpora not accurate enough 8

Slide 9

Slide 9 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Use Case: Clinical Trial Results Patients: Eleven of 15 participants were women, median age was 9.2 years (range, 1.7-14.9 yr), and median weight was 26.8 kg (range, 8.5-55.2 kg). Baseline mean pulmonary artery pressure was 49 ± 19 mm Hg, and mean indexed pulmonary vascular resistance was 10 ± 5.4 Wood units. Etiology of pulmonary hypertensive varied, and all were on systemic pulmonary hypertensive medications. Interventions: Patients 1-5 received phenylephrine 1 μg/kg; patients 6-10 received arginine vasopressin 0.03 U/kg; and patients 11-15 received epinephrine 1 μg/kg. Hemodynamics was measured continuously for up to 10 minutes following study drug administration. Measurements and main results: After study drug administration, the ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving phenylephrine, five of five patients receiving arginine vasopressin, and three of five patients receiving epinephrine. Although all three medications resulted in an increase in aortic pressure, only arginine vasopressin consistently resulted in a decrease in the ratio of systolic pulmonary artery-to-aortic pressure. 9

Slide 10

Slide 10 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Use Case: Clinical Trial Results - Predictive ● Annotate data, then train supervised model ● For this use case: required steps include ○ NER/span categorization: identify patient groups, drugs, doses, frequencies, outcomes, … ○ Relation extraction: find the relations between identified entities 10

Slide 11

Slide 11 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Use Case: Clinical Trial Results - Predictive 11 prodi.gy

Slide 12

Slide 12 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Use Case: Clinical Trial Results - Predictive ● Config for serializability & reproducibility of NLP pipelines ● spaCy has built-in architectures for NER, spancat, textcat, tagger, dependency parser, … support for custom models, components ● python -m spacy train my_config.cfg --output ./my_output 12

Slide 13

Slide 13 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Use Case: Clinical Trial Results - Generative ● Write prompt ● Optional: few-shot examples ● Assumption: model was trained on domain knowledge 13 Summarize the trial results in a structured fashion like so: ● Patient group: ● Number of patients in the group: ● Treatment drug or substance: ● Treatment dose: ● Treatment frequency of administration: ● Treatment duration: ● Outcome:

Slide 14

Slide 14 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Use Case: Clinical Trial Results - Generative 14 Patient group: Arginine Vasopressin Group Number of patients in the group: 5 Treatment drug or substance: Arginine vasopressin Treatment dose: 0.03 U/kg Treatment frequency of administration: Single administration Treatment duration: Not specified Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in all five patients receiving arginine vasopressin. Increase in aortic pressure observed. … Prompt Patient group: Phenylephrine Group Number of patients in the group: 5 Treatment drug or substance: Phenylephrine Treatment dose: 1 μg/kg Treatment frequency of administration: Single administration Treatment duration: Not specified Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving phenylephrine. Increase in aortic pressure observed.

Slide 15

Slide 15 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Use Case: Clinical Trial Results - Generative NLP is solved! 15

Slide 16

Slide 16 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Use Case: Clinical Trial Results - Generative (or maybe not) 16

Slide 17

Slide 17 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Issues with LLMs for (S)NLP ● Hallucinations / incorrect replies ● Variability in responses ● Issues with quantities ● No mapping to standardized data structures ● Latency + rate limits ● Limited context length ● Redundant/unnecessary work ● Costly ● Data leakage ● … → Challenges in adaptation for real-world use cases 17 Treatment frequency of administration: “Administered once”, “Single administration”, “One-time dose”, “One time”, “Single dose”, “One-time administration”, “once”... Number of patients: 15 Treatment drug or substance: - Group 1: Patient 1-5 received phenylephrine 1 μg/kg - Group 2: Patient 6-10 received arginine vasopressin 0.03 U/kg - Group 3: Patient 11-15 received epinephrine 1 μg/kg

Slide 18

Slide 18 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 ● Some issues are inherent to models/APIs: ○ Hallucinations, math problems, round trip times etc. ○ Can be mitigated with pre-training, fine-tuning, RLHF, tool use etc. ● Others can be mitigated with tooling around LLMs: ○ Break problem down into several chained tasks ○ Robust parsing ○ Quality assurance ○ Rich data structures for results and metadata ○ Swap generative with predictive models for tasks where appropriate Issues with LLMs for (S)NLP 18

Slide 19

Slide 19 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 ● Some issues are inherent to models/APIs: ○ Hallucinations, math problems, round trip times etc. ○ Can be mitigated with pre-training, fine-tuning, RLHF, tool use etc. ● Others can be mitigated with tooling around LLMs: ○ Break problem down into several chained tasks ○ Robust parsing ○ Quality assurance ○ Rich data structures for results and metadata ○ Swap generative with predictive models for tasks where appropriate Issues with LLMs for (S)NLP 19 spacy-llm ]

Slide 20

Slide 20 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 ● spaCy extension - uses spaCy’s data structures, pipeline concept, config system, other functionality ● Core idea: pipeline of SNLP problems, solved with LLMs ○ Each problem is solved by a task, which is responsible for the prompt, prompt splitting, parsing ○ Highly configurable ○ Map results onto spaCy’s data structure ○ In pipeline: easy to swap out LLMs with predictive models and vice versa → easy prototyping ● Currently at 0.6.4, 1.0.0 coming soon spacy-llm: SNLP with LLMs 20

Slide 21

Slide 21 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 ● Models: integration with Hugging Face, LLM providers, LangChain ● Tasks: ○ Built-in tasks include NER, REL, sentiment analysis, summarization, translation, QA, entity linking, lemmatization, span categorization, text categorization, … ○ Easy to add new tasks ● Is a spaCy component: integrates into its config and pipeline system and supports all usual features like parallelization* and serialization ● Batching, response logging for easier debugging, caching spacy-llm: Integrations 21

Slide 22

Slide 22 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 spacy-llm: Workflow & Use Cases 22

Slide 23

Slide 23 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 spacy-llm: Workflow and Use Cases I 23 ● LLM-assisted annotation - for: evaluation data, training data, examples for few-shot learning prodi.gy LLM zero-shot predictions Manual curation

Slide 24

Slide 24 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 spacy-llm: Workflow and Use Cases II ● Preprocessing text before prompting LLM ○ PII: recognize and replace personal identifiable information ○ Remove non-informative boilerplate snippets ○ … 24 PII NER LLM

Slide 25

Slide 25 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 spacy-llm: Workflow and Use Cases III ● Only send texts (sentences/paragraphs/documents) with certain topics or entities to the LLM ○ Avoid unnecessary costs ○ Adjust prompt according to earlier classification and/or identified entities ○ … 25 TextCat LLM NER

Slide 26

Slide 26 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 spacy-llm: Workflow and Use Cases IV ● LLM response postprocessing ○ Quality assurance / fact-checking ○ Response normalization: improve response robustness for downstream tasks ○ Hook up to external knowledge bases ○ … 26 Rules LLM Entity linking

Slide 27

Slide 27 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Run pipeline: spacy-llm: How To 27 [nlp] lang = "en" pipeline = ["llm_ner"] [components] [components.llm_ner] factory = "llm" [components.llm_ner.task] @llm_tasks = "spacy.NER.v3" labels = SIZE,TYPE,TOPPING,PRODUCT [components.llm_ner.model] @llm_models = "spacy.GPT-3-5.v3" name = "gpt-3.5-turbo" nlp = assemble(config_path) doc = nlp(text) for ent in doc.ents: print(ent.text, ent.label_) [components.llm_ner.task] @llm_tasks = "spacy.NER.v3" labels = SIZE,TYPE [components.llm_ner.model] @llm_models = "spacy.Mistral.v1" name = "Mistral-7B-v0.1"

Slide 28

Slide 28 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Recap ● SNLP unlocks information from text and makes it available to down-stream business applications in a structured form ● LLMs have impressive text generation/understanding abilities ● It’s become super easy to prototype NLP applications with LLMs ● When building a production-ready pipeline, you need to consider other traits such as customizability, robustness, inference cost, network latency, etc. ● spaCy is a production-ready NLP framework written for developers ● Its extension spacy-llm allows easy integration of LLMs into structured NLP pipelines ● LLM-assisted annotation allows fast bootstrapping of training/evaluation data 28

Slide 29

Slide 29 text

2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 Thank you! ● [email protected] ● https://www.linkedin.com/in/raphaelmitsch/ ● https://github.com/explosion/spaCy ● https://github.com/explosion/spacy-llm ● https://explosion.ai/ ● https://prodi.gy ● https://explosion.ai/blog/against-llm-maximalism 29