Upgrade to Pro — share decks privately, control downloads, hide ads and more …

spaCy meets LLMs: Using Generative AI for Structured Data

spaCy meets LLMs: Using Generative AI for Structured Data

Large Language Models (LLMs) have enormous potential, but also challenge existing workflows in industry that require modularity, transparency and structured data. In this talk, I'll present pragmatic and practical approaches for how to use the latest generative models beyond just chat bots. I'll dive deeper into spaCy's LLM integration, which lets you plug in open-source and proprietary models and provides a robust framework for extracting structured information from text, distilling large models into smaller task-specific components, and closing the gap between prototype and production.

Ines Montani

June 11, 2024
Tweet

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. carefully designed and consistent API extensible and programmable serializable data

    structures good error handling designed for production
  2. carefully designed and consistent API extensible and programmable serializable data

    structures good error handling pipeline approach to combine techniques and share data designed for production
  3. carefully designed and consistent API extensible and programmable serializable data

    structures good error handling pipeline approach to combine techniques and share data designed for production “just works” pre-configured solutions
  4. Prototype HOW TO AVOID THE Plateau standardize inputs and outputs

    start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking
  5. Prototype HOW TO AVOID THE Plateau standardize inputs and outputs

    start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking work iteratively
  6. Prototype HOW TO AVOID THE Plateau standardize inputs and outputs

    start with evaluation assess utility, not just accuracy explosion.ai/blog/applied-nlp-thinking consider structure and properties of language work iteratively
  7. Doc llm ... other components spacy.io/usage/large-language-models Prompt Template Model Response

    Parser Structured Attributes unified, model- agnostic API con fi g.cfg
  8. Doc llm ... other components LLM trained model rules spacy.io/usage/large-language-models

    Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg
  9. Doc llm ... other components LLM trained model rules spacy.io/usage/large-language-models

    Prompt Template Model Response Parser Structured Attributes unified, model- agnostic API con fi g.cfg Entity Linking, Summarization, Entity Recognition, Span Categorization, Relation Extraction, Text Categorization, Sentiment Analysis, Translation
  10. production Doc Doc llm Entity Recognizer Text Categorizer … human-in-

    the-loop distillation prototype continuous evaluation baseline
  11. production Doc Doc llm Entity Recognizer Text Categorizer … human-in-

    the-loop distillation prototype continuous evaluation baseline distilled model