Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Incorporating LLMs into practical NLP workflows

Incorporating LLMs into practical NLP workflows

In this talk, I'll show how large language models such as GPT-3 complement rather than replace existing machine learning workflows. Initial annotations are gathered from the OpenAI API via zero- or few-shot learning, and then corrected by a human decision maker using an annotation tool. The resulting annotations can then be used to train and evaluate models as normal. This process results in higher accuracy than can be achieved from the OpenAI API alone, with the added benefit that you'll own and control the model for runtime.

Video: https://youtu.be/Bd2ciwinFUE

Ines Montani
PRO

April 17, 2023
Tweet

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. Ines Montani


    Explosion
    incorporating

    llms into practical

    nlp workflows

    View Slide

  2. spaCy
    Open-source library for
    industrial-strength Natural
    Language Processing
    100k+
    USERS
    130m+
    DOWNLOADS
    → spacy.io

    View Slide

  3. → spacy.io

    View Slide

  4. prodigy
    Annotation tool for creating


    training data for machine
    learning models
    8000+
    USERS
    → prodigy.ai

    View Slide

  5. → prodigy.ai

    View Slide

  6. incorporating

    llms* into practical

    nlp workflows
    * large language models

    View Slide

  7. practical

    workflows

    View Slide

  8. • supervised learning
    practical

    workflows

    View Slide

  9. • supervised learning
    • tell computers exactly what to do
    practical

    workflows

    View Slide

  10. • supervised learning
    • tell computers exactly what to do
    • needs enough good data
    practical

    workflows

    View Slide

  11. • supervised learning
    • tell computers exactly what to do
    • needs enough good data
    • ML + business logic
    practical

    workflows

    View Slide

  12. LLMs as a tool #1
    specific is better

    View Slide

  13. faster is better
    LLMs as a tool #2

    View Slide

  14. private is better
    LLMs as a tool #3

    View Slide

  15. better is better
    LLMs as a tool #4

    View Slide

  16. problems

    View Slide

  17. problems
    • prompt engineering

    View Slide

  18. problems
    • prompt engineering
    • inconsistent results

    View Slide

  19. problems
    • prompt engineering
    • inconsistent results
    • unstructured responses

    View Slide

  20. working

    with llms

    View Slide

  21. working

    with llms
    • iterative (prompting, parsing)

    View Slide

  22. working

    with llms
    • iterative (prompting, parsing)
    • evaluation is extremely important

    View Slide

  23. working

    with llms
    • iterative (prompting, parsing)
    • evaluation is extremely important
    • improve, not replace task-specific models

    View Slide

  24. working

    with llms
    • iterative (prompting, parsing)
    • evaluation is extremely important
    • improve, not replace task-specific models
    scriptable workflows

    View Slide

  25. working

    with llms
    • iterative (prompting, parsing)
    • evaluation is extremely important
    • improve, not replace task-specific models
    scriptable workflows
    human in the loop

    View Slide

  26. working

    with llms
    • iterative (prompting, parsing)
    • evaluation is extremely important
    • improve, not replace task-specific models
    scriptable workflows
    human in the loop
    business logic

    View Slide

  27. → github.com/explosion/prodigy-openai-recipes

    View Slide

  28. → prodigy.ai

    View Slide

  29. → prodigy.ai
    query LLM and
    parse response

    View Slide

  30. → prodigy.ai
    query LLM and
    parse response
    tune prompt
    if needed

    View Slide

  31. → prodigy.ai

    View Slide

  32. → prodigy.ai

    View Slide

  33. → prodigy.ai
    correct
    mistakes

    View Slide

  34. → prodigy.ai
    correct
    mistakes

    View Slide

  35. → prodigy.ai
    correct
    mistakes
    add correct
    answer to prompt
    to tune it

    View Slide

  36. → prodigy.ai

    View Slide

  37. → prodigy.ai
    generate and
    display reason

    View Slide

  38. → prodigy.ai

    View Slide

  39. reality is not

    an end-to-end

    prediction problem

    View Slide

  40. “Microsoft acquires software development
    platform GitHub for $7.5 billion”

    View Slide

  41. “Microsoft acquires software development
    platform GitHub for $7.5 billion”

    View Slide

  42. TEXT CLASSIFIER
    “Microsoft acquires software development
    platform GitHub for $7.5 billion”

    View Slide

  43. TEXT CLASSIFIER
    ENTITY RECOGNIZER
    “Microsoft acquires software development
    platform GitHub for $7.5 billion”

    View Slide

  44. TEXT CLASSIFIER
    ENTITY RECOGNIZER
    ENTITY LINKER
    “Microsoft acquires software development
    platform GitHub for $7.5 billion”

    View Slide

  45. TEXT CLASSIFIER
    ENTITY RECOGNIZER
    ENTITY LINKER
    ATTRIBUTE LOOKUP
    “Microsoft acquires software development
    platform GitHub for $7.5 billion”

    View Slide

  46. TEXT CLASSIFIER
    ENTITY RECOGNIZER
    ENTITY LINKER
    ATTRIBUTE LOOKUP
    CURRENCY NORMALIZER
    “Microsoft acquires software development
    platform GitHub for $7.5 billion”

    View Slide

  47. TEXT CLASSIFIER
    ENTITY RECOGNIZER
    ENTITY LINKER
    ATTRIBUTE LOOKUP
    CURRENCY NORMALIZER
    “Microsoft acquires software development
    platform GitHub for $7.5 billion”
    *
    *

    View Slide

  48. → github.com/explosion/prodigy-openai-recipes
    summary

    View Slide

  49. → github.com/explosion/prodigy-openai-recipes
    summary
    • LLMs are a great tool for creating better data

    faster and iteratively

    View Slide

  50. → github.com/explosion/prodigy-openai-recipes
    summary
    • LLMs are a great tool for creating better data

    faster and iteratively
    • you’ll always need task-specific data

    View Slide

  51. → github.com/explosion/prodigy-openai-recipes
    summary
    • LLMs are a great tool for creating better data

    faster and iteratively
    • you’ll always need task-specific data
    • many new applications in the future

    View Slide

  52. future

    work

    View Slide

  53. future

    work
    • data structures for result parsing

    View Slide

  54. future

    work
    • data structures for result parsing
    • workflows for robust evaluation

    View Slide

  55. future

    work
    • data structures for result parsing
    • workflows for robust evaluation
    • interactive prompt testing

    View Slide

  56. future

    work
    • data structures for result parsing
    • workflows for robust evaluation
    • interactive prompt testing
    • support for open-source models

    View Slide

  57. 💥 Explosion

    explosion.ai


    📲 Twitter

    @_inesmontani


    📲 Mastodon

    @[email protected]
    thank you!

    View Slide