Incorporating LLMs into practical NLP workflows

Ines Montani Explosion incorporating   llms into practical   nlp
workflows

spaCy Open-source library for industrial-strength Natural Language Processing 100k+ USERS
130m+ DOWNLOADS → spacy.io

→ spacy.io

prodigy Annotation tool for creating training data for machine learning
models 8000+ USERS → prodigy.ai

→ prodigy.ai

incorporating   llms* into practical   nlp workflows * large
language models

practical   workflows

• supervised learning practical   workflows

• supervised learning • tell computers exactly what to do
practical   workflows

• needs enough good data practical   workflows

• needs enough good data • ML + business logic practical   workflows

LLMs as a tool #1 specific is better

faster is better LLMs as a tool #2

private is better LLMs as a tool #3

better is better LLMs as a tool #4

problems

problems • prompt engineering

problems • prompt engineering • inconsistent results

problems • prompt engineering • inconsistent results • unstructured responses

working   with llms

working   with llms • iterative (prompting, parsing)

working   with llms • iterative (prompting, parsing) • evaluation
is extremely important

is extremely important • improve, not replace task-specific models

is extremely important • improve, not replace task-specific models scriptable workflows

is extremely important • improve, not replace task-specific models scriptable workflows human in the loop

is extremely important • improve, not replace task-specific models scriptable workflows human in the loop business logic

→ github.com/explosion/prodigy-openai-recipes

→ prodigy.ai

→ prodigy.ai query LLM and parse response

→ prodigy.ai query LLM and parse response tune prompt if
needed

→ prodigy.ai

→ prodigy.ai correct mistakes

→ prodigy.ai correct mistakes add correct answer to prompt to
tune it

→ prodigy.ai

→ prodigy.ai generate and display reason

→ prodigy.ai

reality is not   an end-to-end   prediction problem

“Microsoft acquires software development platform GitHub for $7.5 billion”

TEXT CLASSIFIER “Microsoft acquires software development platform GitHub for $7.5
billion”

TEXT CLASSIFIER ENTITY RECOGNIZER “Microsoft acquires software development platform GitHub
for $7.5 billion”

TEXT CLASSIFIER ENTITY RECOGNIZER ENTITY LINKER “Microsoft acquires software development
platform GitHub for $7.5 billion”

TEXT CLASSIFIER ENTITY RECOGNIZER ENTITY LINKER ATTRIBUTE LOOKUP “Microsoft acquires
software development platform GitHub for $7.5 billion”

TEXT CLASSIFIER ENTITY RECOGNIZER ENTITY LINKER ATTRIBUTE LOOKUP CURRENCY NORMALIZER
“Microsoft acquires software development platform GitHub for $7.5 billion”

TEXT CLASSIFIER ENTITY RECOGNIZER ENTITY LINKER ATTRIBUTE LOOKUP CURRENCY NORMALIZER
“Microsoft acquires software development platform GitHub for $7.5 billion” * *

→ github.com/explosion/prodigy-openai-recipes summary

→ github.com/explosion/prodigy-openai-recipes summary • LLMs are a great tool for
creating better data   faster and iteratively

creating better data   faster and iteratively • you’ll always need task-specific data

creating better data   faster and iteratively • you’ll always need task-specific data • many new applications in the future

future   work

future   work • data structures for result parsing

future   work • data structures for result parsing •
workflows for robust evaluation

workflows for robust evaluation • interactive prompt testing

workflows for robust evaluation • interactive prompt testing • support for open-source models

💥 Explosion   explosion.ai 📲 Twitter   @_inesmontani 📲 Mastodon
  @[email protected] thank you!

Incorporating LLMs into practical NLP workflows

Incorporating LLMs into practical NLP workflows

Video

More Decks by Ines Montani

Other Decks in Programming

Featured

Transcript