$30 off During Our Annual Pro Sale. View Details »

Herding LLMs Towards Structured NLP

Herding LLMs Towards Structured NLP


spacy-llm is spaCy's recent integration with large language models and is maintained by the team behind spaCy. In this talk I elaborate on the motivation behind this library, which problems it tries to solve, and lessons learned while working on it.

Recent LLMs exhibit impressive capabilities and enable extremely fast prototyping of NLP applications. spacy-llm addresses some of the challenges in making LLMs production-ready:
- LLMs turn text into … more text. NLP applications however often aim to extract structured information from text and use it further downstream.
- spacy-llm parses LLM responses and maps the parsed responses onto existing data structures for documents, spans and tokens.
- Closed and open models have markedly different drawbacks. Closed models aren't free, induce network latency, are inflexible black boxes, aren't suitable for all (commercial) use-cases - check the TOS!
- and may leak user data. Also, you may get rate-limited. Open models require quite a bit of computing power to run locally, are more complicated to set up, and are still less capable than (some) closed models.
- Your mileage with closed and open models may therefore vary depending on which point in your dev cycle you're at. spacy-llm allows smoothly transitioning between open and closed LLMs without touching any code.

spacy-llm plugs open and proprietary LLMs into spaCy, leveraging its modular and customizable framework for working with text. This allows for a cheaper, faster and more robust NLP workflow - driven by cutting-edge LLMs.

Raphael Mitsch

December 12, 2023
Tweet

Other Decks in Programming

Transcript

  1. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Herding LLMs Towards Structured NLP With spacy-llm 1
  2. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    On Structured NLP (SNLP) • Goal of extracting a defined set of attributes from texts ◦ Entities (locations, persons, …), lemmas, categories, … ◦ “Classic” NLP: predictive models ◦ SOTA models: often BERT-level transformer models • Real-life applications chain together several of these tasks ◦ E. g. entity recognition, entity linking, … • Downstream applications often depend on tangible, unambiguous information ◦ On doc level: e. g. doc category; on span level: e. g. entity; on token level: e. g. lemma, POS tags, … • Libraries: spaCy, Stanza, Gensim, Hugging Face, … 2
  3. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    spaCy • Free, open-source library • Designed for production use • Focus on dev productivity • Free course: https://course.spacy.io 3
  4. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    spaCy • A modular pipeline approach for linguistic analysis • Transforming unstructured text into structured data objects 4
  5. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    On Large Language Models • Generative as opposed to predictive • Pros ◦ Great for prototyping, zero-shot, low dev effort, versatility, … ◦ Can yield great results, maybe surpassing small predictive model for some tasks with proper prompting • Cons ◦ Latency, costs/hardware requirements, free-form text, hallucinations • Libraries: Hugging Face, llama.cpp, LangChain, … • Providers: OpenAI, Anthropic, Cohere, Google, Amazon, … 5
  6. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    SNLP & LLMs: Not a Dichotomy • Doing SNLP with LLMs is possible, but usually gets less attention than “AI magic” • LLMs generate relatively unconstrained responses ◦ Turn text into…text ◦ Can be narrowed down via e. g. pre-training, fine-tuning, prompts, guardrails ◦ Parsing necessary for SNLP • Modular vs. monolithic approach (not necessarily, but often) • SNLP more likely to be suitable for industrial/real-life applications 6
  7. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    7 “It's never great to realize you're the guy on the left. But, here I am.” https://explosion.ai/blog/against-llm-maximalism
  8. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Use Case: Clinical Trial Results • Task: extract information from human-written notes on clinical trial results (e. g. from a paper) • How does workflow compare between predictive (small) and generative (LLM)? • Specialized domain, available predictive models trained on general corpora not accurate enough 8
  9. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Use Case: Clinical Trial Results Patients: Eleven of 15 participants were women, median age was 9.2 years (range, 1.7-14.9 yr), and median weight was 26.8 kg (range, 8.5-55.2 kg). Baseline mean pulmonary artery pressure was 49 ± 19 mm Hg, and mean indexed pulmonary vascular resistance was 10 ± 5.4 Wood units. Etiology of pulmonary hypertensive varied, and all were on systemic pulmonary hypertensive medications. Interventions: Patients 1-5 received phenylephrine 1 μg/kg; patients 6-10 received arginine vasopressin 0.03 U/kg; and patients 11-15 received epinephrine 1 μg/kg. Hemodynamics was measured continuously for up to 10 minutes following study drug administration. Measurements and main results: After study drug administration, the ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving phenylephrine, five of five patients receiving arginine vasopressin, and three of five patients receiving epinephrine. Although all three medications resulted in an increase in aortic pressure, only arginine vasopressin consistently resulted in a decrease in the ratio of systolic pulmonary artery-to-aortic pressure. 9
  10. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Use Case: Clinical Trial Results - Predictive • Annotate data, then train supervised model • For this use case: required steps include ◦ NER/span categorization: identify patient groups, drugs, doses, frequencies, outcomes, … ◦ Relation extraction: find the relations between identified entities 10
  11. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Use Case: Clinical Trial Results - Predictive 11 prodi.gy
  12. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Use Case: Clinical Trial Results - Predictive • Config for serializability & reproducibility of NLP pipelines • spaCy has built-in architectures for NER, spancat, textcat, tagger, dependency parser, … support for custom models, components • python -m spacy train my_config.cfg --output ./my_output 12
  13. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Use Case: Clinical Trial Results - Generative • Write prompt • Optional: few-shot examples • Assumption: model was trained on domain knowledge 13 Summarize the trial results in a structured fashion like so: • Patient group: <name> • Number of patients in the group: <number> • Treatment drug or substance: <drug> • Treatment dose: <drug> • Treatment frequency of administration: <frequency> • Treatment duration: <duration> • Outcome: <outcome>
  14. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Use Case: Clinical Trial Results - Generative 14 Patient group: Arginine Vasopressin Group Number of patients in the group: 5 Treatment drug or substance: Arginine vasopressin Treatment dose: 0.03 U/kg Treatment frequency of administration: Single administration Treatment duration: Not specified Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in all five patients receiving arginine vasopressin. Increase in aortic pressure observed. … Prompt Patient group: Phenylephrine Group Number of patients in the group: 5 Treatment drug or substance: Phenylephrine Treatment dose: 1 μg/kg Treatment frequency of administration: Single administration Treatment duration: Not specified Outcome: The ratio of pulmonary-to-systemic vascular resistance decreased in three of five patients receiving phenylephrine. Increase in aortic pressure observed.
  15. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Use Case: Clinical Trial Results - Generative NLP is solved! 15
  16. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Use Case: Clinical Trial Results - Generative (or maybe not) 16
  17. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Issues with LLMs for (S)NLP • Hallucinations / incorrect replies • Variability in responses • Issues with quantities • No mapping to standardized data structures • Latency + rate limits • Limited context length • Redundant/unnecessary work • Costly • Data leakage • … → Challenges in adaptation for real-world use cases 17 Treatment frequency of administration: “Administered once”, “Single administration”, “One-time dose”, “One time”, “Single dose”, “One-time administration”, “once”... Number of patients: 15 Treatment drug or substance: - Group 1: Patient 1-5 received phenylephrine 1 μg/kg - Group 2: Patient 6-10 received arginine vasopressin 0.03 U/kg - Group 3: Patient 11-15 received epinephrine 1 μg/kg
  18. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    • Some issues are inherent to models/APIs: ◦ Hallucinations, math problems, round trip times etc. ◦ Can be mitigated with pre-training, fine-tuning, RLHF, tool use etc. • Others can be mitigated with tooling around LLMs: ◦ Break problem down into several chained tasks ◦ Robust parsing ◦ Quality assurance ◦ Rich data structures for results and metadata ◦ Swap generative with predictive models for tasks where appropriate Issues with LLMs for (S)NLP 18
  19. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    • Some issues are inherent to models/APIs: ◦ Hallucinations, math problems, round trip times etc. ◦ Can be mitigated with pre-training, fine-tuning, RLHF, tool use etc. • Others can be mitigated with tooling around LLMs: ◦ Break problem down into several chained tasks ◦ Robust parsing ◦ Quality assurance ◦ Rich data structures for results and metadata ◦ Swap generative with predictive models for tasks where appropriate Issues with LLMs for (S)NLP 19 spacy-llm ]
  20. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    • spaCy extension - uses spaCy’s data structures, pipeline concept, config system, other functionality • Core idea: pipeline of SNLP problems, solved with LLMs ◦ Each problem is solved by a task, which is responsible for the prompt, prompt splitting, parsing ◦ Highly configurable ◦ Map results onto spaCy’s data structure ◦ In pipeline: easy to swap out LLMs with predictive models and vice versa → easy prototyping • Currently at 0.6.4, 1.0.0 coming soon spacy-llm: SNLP with LLMs 20
  21. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    • Models: integration with Hugging Face, LLM providers, LangChain • Tasks: ◦ Built-in tasks include NER, REL, sentiment analysis, summarization, translation, QA, entity linking, lemmatization, span categorization, text categorization, … ◦ Easy to add new tasks • Is a spaCy component: integrates into its config and pipeline system and supports all usual features like parallelization* and serialization • Batching, response logging for easier debugging, caching spacy-llm: Integrations 21
  22. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    spacy-llm: Workflow and Use Cases I 23 • LLM-assisted annotation - for: evaluation data, training data, examples for few-shot learning prodi.gy LLM zero-shot predictions Manual curation
  23. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    spacy-llm: Workflow and Use Cases II • Preprocessing text before prompting LLM ◦ PII: recognize and replace personal identifiable information ◦ Remove non-informative boilerplate snippets ◦ … 24 PII NER LLM
  24. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    spacy-llm: Workflow and Use Cases III • Only send texts (sentences/paragraphs/documents) with certain topics or entities to the LLM ◦ Avoid unnecessary costs ◦ Adjust prompt according to earlier classification and/or identified entities ◦ … 25 TextCat LLM NER
  25. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    spacy-llm: Workflow and Use Cases IV • LLM response postprocessing ◦ Quality assurance / fact-checking ◦ Response normalization: improve response robustness for downstream tasks ◦ Hook up to external knowledge bases ◦ … 26 Rules LLM Entity linking
  26. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Run pipeline: spacy-llm: How To 27 [nlp] lang = "en" pipeline = ["llm_ner"] [components] [components.llm_ner] factory = "llm" [components.llm_ner.task] @llm_tasks = "spacy.NER.v3" labels = SIZE,TYPE,TOPPING,PRODUCT [components.llm_ner.model] @llm_models = "spacy.GPT-3-5.v3" name = "gpt-3.5-turbo" nlp = assemble(config_path) doc = nlp(text) for ent in doc.ents: print(ent.text, ent.label_) [components.llm_ner.task] @llm_tasks = "spacy.NER.v3" labels = SIZE,TYPE [components.llm_ner.model] @llm_models = "spacy.Mistral.v1" name = "Mistral-7B-v0.1"
  27. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Recap • SNLP unlocks information from text and makes it available to down-stream business applications in a structured form • LLMs have impressive text generation/understanding abilities • It’s become super easy to prototype NLP applications with LLMs • When building a production-ready pipeline, you need to consider other traits such as customizability, robustness, inference cost, network latency, etc. • spaCy is a production-ready NLP framework written for developers • Its extension spacy-llm allows easy integration of LLMs into structured NLP pipelines • LLM-assisted annotation allows fast bootstrapping of training/evaluation data 28
  28. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023

    Thank you! • [email protected] • https://www.linkedin.com/in/raphaelmitsch/ • https://github.com/explosion/spaCy • https://github.com/explosion/spacy-llm • https://explosion.ai/ • https://prodi.gy • https://explosion.ai/blog/against-llm-maximalism 29