$30 off During Our Annual Pro Sale. View Details »

Workshop: Half hour of labeling power: Can we beat GPT?

Workshop: Half hour of labeling power: Can we beat GPT?

Video: https://www.youtube.com/watch?v=Ta45SfbZNcM

Large Language Models (LLMs) offer a lot of value for modern NLP and can typically achieve surprisingly good accuracy on predictive NLP tasks with a reasonably structured prompt and pretty much no labelled examples. But can we do even better than that? It’s much more effective to use LLMs to create classifiers, instead of using them as classifiers. By using LLMs to assist with annotation, we can quickly create labelled data and systems that are much faster and much more accurate than using LLM prompts alone. In this workshop, we'll show you how to use LLMs at development time to create high-quality datasets and train specific, smaller, private and more accurate fine-tuned models for your business problems.

Ines Montani
PRO

November 01, 2023
Tweet

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. Ines Montani & Ryan Wesslen Explosion
    Half hour of
    labeling power
    Can we beat GPT?

    View Slide

  2. spacy.io

    View Slide

  3. spacy.io
    Open-source library for
    industrial-strength natural
    language processing
    170m+
    downloads

    View Slide

  4. spacy.io
    prodigy.ai
    Open-source library for
    industrial-strength natural
    language processing
    170m+
    downloads

    View Slide

  5. spacy.io
    prodigy.ai
    Open-source library for
    industrial-strength natural
    language processing
    170m+
    downloads
    Modern scriptable
    annotation tool for
    machine learning
    developers
    9k+
    users
    800+
    companies

    View Slide

  6. spacy.io
    prodigy.ai
    Open-source library for
    industrial-strength natural
    language processing
    170m+
    downloads
    Modern scriptable
    annotation tool for
    machine learning
    developers
    9k+
    users
    800+
    companies
    prodigy.ai/teams

    View Slide

  7. spacy.io
    prodigy.ai
    Open-source library for
    industrial-strength natural
    language processing
    170m+
    downloads
    Modern scriptable
    annotation tool for
    machine learning
    developers
    9k+
    users
    800+
    companies
    prodigy.ai/teams
    Collaborative data
    development platform
    GPT-4
    API
    Alex Smith
    Developer

    View Slide

  8. Generative
    ! single/multi-doc summarization
    " reasoning ✅ problem solving
    ✍ paraphrasing % style transfer
    ❓question answering
    Predictive
    ' text classification
    ( relation extraction ) coreference
    * grammar & morphology
    + entity recognition
    , semantic parsing - discourse structure

    View Slide

  9. SST2 AG News Banking77 GPT-3
    65
    70
    75
    80
    85
    90
    95
    100
    1% 5% 10% 20% 50% 100%
    Text Classification

    View Slide

  10. SST2 AG News Banking77 GPT-3
    65
    70
    75
    80
    85
    90
    95
    100
    1% 5% 10% 20% 50% 100%
    Text Classification
    10
    20
    30
    40
    50
    60
    70
    80
    90
    100
    0 100 200 300 400 500
    FabNER Claude 2
    Entity Recognition

    View Slide

  11. SST2 AG News Banking77 GPT-3
    65
    70
    75
    80
    85
    90
    95
    100
    1% 5% 10% 20% 50% 100%
    Text Classification
    10
    20
    30
    40
    50
    60
    70
    80
    90
    100
    0 100 200 300 400 500
    FabNER Claude 2
    Entity Recognition

    View Slide

  12. 1

    View Slide

  13. 1
    2

    View Slide

  14. Correct LLM few-shot results

    View Slide

  15. Correct LLM few-shot results

    View Slide

  16. Annotation Guidelines
    DISH known food dishes, e.g. lobster
    ravioli, garlic bread
    INGREDIENT
    EQUIPMENT
    individual parts of a food dish,
    including herbs and spices
    any kind of cooking equipment,
    e.g. oven, cooking pot, grill

    View Slide

  17. annotate
    evaluate
    update

    View Slide

  18. annotate
    evaluate
    update
    1

    View Slide

  19. annotate
    evaluate
    update
    1
    resolve disagreements
    retrospective meetings
    assess if more data is needed
    2

    View Slide

  20. annotate
    evaluate
    update
    1
    resolve disagreements
    retrospective meetings
    assess if more data is needed
    2
    update annotation guidelines
    add more examples
    expand label definitions
    3

    View Slide

  21. spacy-llm config prompt template
    spacy.io/usage/large-language-models

    View Slide

  22. Evaluation Results
    0
    20
    40
    60
    80
    100
    Zero-shot Chain-of-thought Few-shot Task-specific
    ?
    F DISH (F) INGREDIENT (F) EQUIPMENT (F)

    View Slide

  23. Evaluation Results
    0
    20
    40
    60
    80
    100
    Zero-shot Chain-of-thought Few-shot Task-specific
    ?
    F DISH (F) INGREDIENT (F) EQUIPMENT (F)

    View Slide

  24. Annotation Guidelines
    DISH known food dishes, e.g. lobster
    ravioli, garlic bread
    INGREDIENT
    EQUIPMENT
    individual parts of a food dish,
    including herbs and spices
    any kind of cooking equipment,
    e.g. oven, cooking pot, grill

    View Slide

  25. Evaluation Results
    0
    20
    40
    60
    80
    100
    Zero-shot Chain-of-thought Few-shot Task-specific
    F DISH (F) INGREDIENT (F) EQUIPMENT (F)
    2000
    words/second

    View Slide

  26. prodigy.ai/features/task-routing
    Use task routing to
    distribute workloads and
    determine inter-annotator
    agreement.
    pro tip:

    View Slide

  27. koaning.io/posts/large-disagreement-models
    Focus on examples where
    models disagree, similar
    to active learning.
    pro tip:

    View Slide

  28. ChatGPT
    Use generative models to
    create spaCy rule sets!
    pro tip:
    spacy.io/usage/rule-based-matching

    View Slide

  29. Takeaways
    Generative complements predictive, it
    doesn't replace it.

    View Slide

  30. Takeaways
    Generative complements predictive, it
    doesn't replace it.
    Use generative models to create
    better, more accurate, faster, smaller
    and private task-specific models.

    View Slide

  31. Takeaways
    Generative complements predictive, it
    doesn't replace it.
    Use generative models to create
    better, more accurate, faster, smaller
    and private task-specific models.
    With good tooling, you can make
    human input more e icient.

    View Slide

  32. thank
    you! Explosion
    spaCy
    Prodigy
    explosion.ai
    spacy.io
    prodigy.ai
    Twitter
    Mastodon
    Bluesky
    @explosion_ai
    @[email protected]
    @explosion-ai.bsky.social
    LinkedIn

    View Slide