Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Herding LLMs Towards Structured NLP

Herding LLMs Towards Structured NLP


spacy-llm is spaCy's recent integration with large language models and is maintained by the team behind spaCy. In this talk I elaborate on the motivation behind this library, which problems it tries to solve, and lessons learned while working on it.

Recent LLMs exhibit impressive capabilities and enable extremely fast prototyping of NLP applications. spacy-llm addresses some of the challenges in making LLMs production-ready:
- LLMs turn text into … more text. NLP applications however often aim to extract structured information from text and use it further downstream.
- spacy-llm parses LLM responses and maps the parsed responses onto existing data structures for documents, spans and tokens.
- Closed and open models have markedly different drawbacks. Closed models aren't free, induce network latency, are inflexible black boxes, aren't suitable for all (commercial) use-cases - check the TOS!
- and may leak user data. Also, you may get rate-limited. Open models require quite a bit of computing power to run locally, are more complicated to set up, and are still less capable than (some) closed models.
- Your mileage with closed and open models may therefore vary depending on which point in your dev cycle you're at. spacy-llm allows smoothly transitioning between open and closed LLMs without touching any code.

spacy-llm plugs open and proprietary LLMs into spaCy, leveraging its modular and customizable framework for working with text. This allows for a cheaper, faster and more robust NLP workflow - driven by cutting-edge LLMs.

Raphael Mitsch

December 12, 2023
Tweet

Transcript

  1. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Herding LLMs Towards
    Structured NLP
    With spacy-llm
    1

    View full-size slide

  2. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    On Structured NLP (SNLP)
    ● Goal of extracting a defined set of attributes from texts
    ○ Entities (locations, persons, …), lemmas, categories, …
    ○ “Classic” NLP: predictive models
    ○ SOTA models: often BERT-level transformer models
    ● Real-life applications chain together several of these tasks
    ○ E. g. entity recognition, entity linking, …
    ● Downstream applications often depend on tangible, unambiguous information
    ○ On doc level: e. g. doc category; on span level: e. g. entity; on token level: e. g. lemma, POS tags, …
    ● Libraries: spaCy, Stanza, Gensim, Hugging Face, …
    2

    View full-size slide

  3. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    spaCy
    ● Free, open-source library
    ● Designed for production use
    ● Focus on dev productivity
    ● Free course: https://course.spacy.io
    3

    View full-size slide

  4. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    spaCy
    ● A modular pipeline approach for
    linguistic analysis
    ● Transforming unstructured text
    into structured data objects
    4

    View full-size slide

  5. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    On Large Language Models
    ● Generative as opposed to predictive
    ● Pros
    ○ Great for prototyping, zero-shot, low dev effort, versatility, …
    ○ Can yield great results, maybe surpassing small predictive model for some tasks with proper prompting
    ● Cons
    ○ Latency, costs/hardware requirements, free-form text, hallucinations
    ● Libraries: Hugging Face, llama.cpp, LangChain, …
    ● Providers: OpenAI, Anthropic, Cohere, Google, Amazon, …
    5

    View full-size slide

  6. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    SNLP & LLMs: Not a Dichotomy
    ● Doing SNLP with LLMs is possible, but usually gets less attention than “AI magic”
    ● LLMs generate relatively unconstrained responses
    ○ Turn text into…text
    ○ Can be narrowed down via e. g. pre-training, fine-tuning, prompts, guardrails
    ○ Parsing necessary for SNLP
    ● Modular vs. monolithic approach (not necessarily, but often)
    ● SNLP more likely to be suitable for industrial/real-life applications
    6

    View full-size slide

  7. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023 7
    “It's never great to realize you're the guy on the left. But, here I am.”
    https://explosion.ai/blog/against-llm-maximalism

    View full-size slide

  8. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Use Case: Clinical Trial Results
    ● Task: extract information from human-written notes on clinical trial results (e. g. from a paper)
    ● How does workflow compare between predictive (small) and generative (LLM)?
    ● Specialized domain, available predictive models trained on general corpora not accurate enough
    8

    View full-size slide

  9. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Use Case: Clinical Trial Results
    Patients: Eleven of 15 participants were women, median age was 9.2 years (range, 1.7-14.9 yr), and median weight was 26.8 kg (range,
    8.5-55.2 kg). Baseline mean pulmonary artery pressure was 49 ± 19 mm Hg, and mean indexed pulmonary vascular resistance was 10 ±
    5.4 Wood units. Etiology of pulmonary hypertensive varied, and all were on systemic pulmonary hypertensive medications.
    Interventions: Patients 1-5 received phenylephrine 1 μg/kg; patients 6-10 received arginine vasopressin 0.03 U/kg; and patients 11-15
    received epinephrine 1 μg/kg. Hemodynamics was measured continuously for up to 10 minutes following study drug administration.
    Measurements and main results: After study drug administration, the ratio of pulmonary-to-systemic vascular resistance decreased in
    three of five patients receiving phenylephrine, five of five patients receiving arginine vasopressin, and three of five patients receiving
    epinephrine. Although all three medications resulted in an increase in aortic pressure, only arginine vasopressin consistently resulted in
    a decrease in the ratio of systolic pulmonary artery-to-aortic pressure.
    9

    View full-size slide

  10. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Use Case: Clinical Trial Results - Predictive
    ● Annotate data, then train supervised model
    ● For this use case: required steps include
    ○ NER/span categorization: identify patient groups, drugs, doses, frequencies, outcomes, …
    ○ Relation extraction: find the relations between identified entities
    10

    View full-size slide

  11. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Use Case: Clinical Trial Results - Predictive
    11
    prodi.gy

    View full-size slide

  12. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Use Case: Clinical Trial Results - Predictive
    ● Config for serializability & reproducibility of NLP pipelines
    ● spaCy has built-in architectures for NER, spancat, textcat, tagger,
    dependency parser, … support for custom models, components
    ● python -m spacy train my_config.cfg --output ./my_output
    12

    View full-size slide

  13. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Use Case: Clinical Trial Results - Generative
    ● Write prompt
    ● Optional: few-shot examples
    ● Assumption: model was trained on domain
    knowledge
    13
    Summarize the trial results in a structured fashion like so:
    ● Patient group:
    ● Number of patients in the group:
    ● Treatment drug or substance:
    ● Treatment dose:
    ● Treatment frequency of administration:
    ● Treatment duration:
    ● Outcome:

    View full-size slide

  14. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Use Case: Clinical Trial Results - Generative
    14
    Patient group: Arginine Vasopressin Group
    Number of patients in the group: 5
    Treatment drug or substance: Arginine vasopressin
    Treatment dose: 0.03 U/kg
    Treatment frequency of administration: Single
    administration
    Treatment duration: Not specified
    Outcome: The ratio of pulmonary-to-systemic vascular
    resistance decreased in all five patients receiving
    arginine vasopressin. Increase in aortic pressure
    observed. …
    Prompt
    Patient group: Phenylephrine Group
    Number of patients in the group: 5
    Treatment drug or substance: Phenylephrine
    Treatment dose: 1 μg/kg
    Treatment frequency of administration: Single
    administration
    Treatment duration: Not specified
    Outcome: The ratio of pulmonary-to-systemic vascular
    resistance decreased in three of five patients receiving
    phenylephrine. Increase in aortic pressure observed.

    View full-size slide

  15. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Use Case: Clinical Trial Results - Generative
    NLP is solved!
    15

    View full-size slide

  16. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Use Case: Clinical Trial Results - Generative
    (or maybe not)
    16

    View full-size slide

  17. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Issues with LLMs for (S)NLP
    ● Hallucinations / incorrect replies
    ● Variability in responses
    ● Issues with quantities
    ● No mapping to standardized data structures
    ● Latency + rate limits
    ● Limited context length
    ● Redundant/unnecessary work
    ● Costly
    ● Data leakage
    ● …
    → Challenges in adaptation for real-world use cases
    17
    Treatment frequency of administration: “Administered
    once”, “Single administration”, “One-time dose”, “One time”,
    “Single dose”, “One-time administration”, “once”...
    Number of patients: 15
    Treatment drug or substance:
    - Group 1: Patient 1-5 received phenylephrine 1 μg/kg
    - Group 2: Patient 6-10 received arginine vasopressin 0.03
    U/kg
    - Group 3: Patient 11-15 received epinephrine 1 μg/kg

    View full-size slide

  18. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    ● Some issues are inherent to models/APIs:
    ○ Hallucinations, math problems, round trip times etc.
    ○ Can be mitigated with pre-training, fine-tuning, RLHF, tool use etc.
    ● Others can be mitigated with tooling around LLMs:
    ○ Break problem down into several chained tasks
    ○ Robust parsing
    ○ Quality assurance
    ○ Rich data structures for results and metadata
    ○ Swap generative with predictive models for tasks where appropriate
    Issues with LLMs for (S)NLP
    18

    View full-size slide

  19. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    ● Some issues are inherent to models/APIs:
    ○ Hallucinations, math problems, round trip times etc.
    ○ Can be mitigated with pre-training, fine-tuning, RLHF, tool use etc.
    ● Others can be mitigated with tooling around LLMs:
    ○ Break problem down into several chained tasks
    ○ Robust parsing
    ○ Quality assurance
    ○ Rich data structures for results and metadata
    ○ Swap generative with predictive models for tasks where appropriate
    Issues with LLMs for (S)NLP
    19
    spacy-llm
    ]

    View full-size slide

  20. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    ● spaCy extension - uses spaCy’s data structures, pipeline concept, config system, other
    functionality
    ● Core idea: pipeline of SNLP problems, solved with LLMs
    ○ Each problem is solved by a task, which is responsible for the prompt, prompt splitting, parsing
    ○ Highly configurable
    ○ Map results onto spaCy’s data structure
    ○ In pipeline: easy to swap out LLMs with predictive models and vice versa → easy prototyping
    ● Currently at 0.6.4, 1.0.0 coming soon
    spacy-llm: SNLP with LLMs
    20

    View full-size slide

  21. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    ● Models: integration with Hugging Face, LLM providers, LangChain
    ● Tasks:
    ○ Built-in tasks include NER, REL, sentiment analysis, summarization, translation, QA, entity linking,
    lemmatization, span categorization, text categorization, …
    ○ Easy to add new tasks
    ● Is a spaCy component: integrates into its config and pipeline system and supports all usual features
    like parallelization* and serialization
    ● Batching, response logging for easier debugging, caching
    spacy-llm: Integrations
    21

    View full-size slide

  22. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    spacy-llm: Workflow & Use Cases
    22

    View full-size slide

  23. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    spacy-llm: Workflow and Use Cases I
    23
    ● LLM-assisted annotation - for: evaluation data, training data, examples for few-shot learning
    prodi.gy
    LLM zero-shot predictions Manual curation

    View full-size slide

  24. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    spacy-llm: Workflow and Use Cases II
    ● Preprocessing text before prompting LLM
    ○ PII: recognize and replace personal identifiable information
    ○ Remove non-informative boilerplate snippets
    ○ …
    24
    PII NER LLM

    View full-size slide

  25. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    spacy-llm: Workflow and Use Cases III
    ● Only send texts (sentences/paragraphs/documents) with certain topics or entities to the LLM
    ○ Avoid unnecessary costs
    ○ Adjust prompt according to earlier classification and/or identified entities
    ○ …
    25
    TextCat LLM
    NER

    View full-size slide

  26. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    spacy-llm: Workflow and Use Cases IV
    ● LLM response postprocessing
    ○ Quality assurance / fact-checking
    ○ Response normalization: improve response robustness for downstream tasks
    ○ Hook up to external knowledge bases
    ○ …
    26
    Rules
    LLM
    Entity
    linking

    View full-size slide

  27. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Run pipeline:
    spacy-llm: How To
    27
    [nlp]
    lang = "en"
    pipeline = ["llm_ner"]
    [components]
    [components.llm_ner]
    factory = "llm"
    [components.llm_ner.task]
    @llm_tasks = "spacy.NER.v3"
    labels = SIZE,TYPE,TOPPING,PRODUCT
    [components.llm_ner.model]
    @llm_models = "spacy.GPT-3-5.v3"
    name = "gpt-3.5-turbo"
    nlp = assemble(config_path)
    doc = nlp(text)
    for ent in doc.ents:
    print(ent.text, ent.label_)
    [components.llm_ner.task]
    @llm_tasks = "spacy.NER.v3"
    labels = SIZE,TYPE
    [components.llm_ner.model]
    @llm_models = "spacy.Mistral.v1"
    name = "Mistral-7B-v0.1"

    View full-size slide

  28. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Recap
    ● SNLP unlocks information from text and makes it available to down-stream business applications in a
    structured form
    ● LLMs have impressive text generation/understanding abilities
    ● It’s become super easy to prototype NLP applications with LLMs
    ● When building a production-ready pipeline, you need to consider other traits such as customizability,
    robustness, inference cost, network latency, etc.
    ● spaCy is a production-ready NLP framework written for developers
    ● Its extension spacy-llm allows easy integration of LLMs into structured NLP pipelines
    ● LLM-assisted annotation allows fast bootstrapping of training/evaluation data
    28

    View full-size slide

  29. 2023-12-12 | Raphael Mitsch (Explosion) | Global AI Conference 2023
    Thank you!
    [email protected]
    ● https://www.linkedin.com/in/raphaelmitsch/
    ● https://github.com/explosion/spaCy
    ● https://github.com/explosion/spacy-llm
    ● https://explosion.ai/
    ● https://prodi.gy
    ● https://explosion.ai/blog/against-llm-maximalism
    29

    View full-size slide