Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient data development: Optimizing annotati...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Efficient data development: Optimizing annotation workflows

Tips and advice for how to build efficient human-in-the-loop data development workflows, break down business problems into actionable annotation steps and make the most of automation and model assistance. All examples are inspired by real use cases.

Blog post: https://explosion.ai/blog/optimizing-annotation-workflows

Avatar for Ines Montani

Ines Montani PRO

February 24, 2026
Tweet

Resources

The ultimate guide to optimizing annotation workflows

https://explosion.ai/blog/optimizing-annotation-workflows

More extensive blog posts based on this talk, featuring additional tips and examples

How S&P Global is making markets more transparent with NLP, spaCy and Prodigy

https://explosion.ai/blog/sp-global-commodities

A case study on S&P Global’s efficient information extraction pipelines for real-time commodities trading insights in a high-security environment using human-in-the-loop distillation.

How the Guardian approaches quote extraction with NLP

https://explosion.ai/blog/guardian

A case study of the Guardian's spaCy-Prodigy workflow to modularize quote extraction for content creation.

A practical guide to human-in-the-loop distillation

https://explosion.ai/blog/human-in-the-loop-distillation

This blog post presents practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

Prodigy

https://prodi.gy

A modern and powerful annotation and model improvement tool used for human-in-the-loop training, rapid iteration, and custom NLP workflows.

More Decks by Ines Montani

Other Decks in Technology

Transcript

  1. 520m+ downloads Open-source library for industrial-strength natural language processing spacy.io

    SPACY Modern scriptable annotation tool for machine learning developers prodigy.ai PRODIgy 12,000+ users
  2. DO THIS for annotation_type in annotation_types: for example in examples:

    annotate(example, annotation_type) ✅ NOT THIS for example in examples: for annotation_type in annotation_types: annotate(example, annotation_type) ❌ Humans have a cache, too!
  3. DO THIS for annotation_type in annotation_types: for example in examples:

    annotate(example, annotation_type) ✅ NOT THIS for example in examples: for annotation_type in annotation_types: annotate(example, annotation_type) ❌ Humans have a cache, too! 1. labels, classification, spans etc. 2. individual examples
  4. variate ana prodigy.ai/docs selection “snaps” to token boundaries GPT-4 API

    model pre-annotates Automate what machines are better at!
  5. =============== Train curve diagnostic =============== Training 4 times with 25%,

    50%, 75%, 100% of the data % Score ner ---- ------ ------ 0% 0.00 0.00 25% 0.31 ▲ 0.31 ▲ 50% 0.44 ▲ 0.44 ▲ 75% 0.43 ▼ 0.43 ▼ 100% 0.56 ▲ 0.56 ▲ Prodigy train curve experiment: prodigy.ai/docs/recipes Will more data improve the model?
  6. =============== Train curve diagnostic =============== Training 4 times with 25%,

    50%, 75%, 100% of the data % Score ner ---- ------ ------ 0% 0.00 0.00 25% 0.31 ▲ 0.31 ▲ 50% 0.44 ▲ 0.44 ▲ 75% 0.43 ▼ 0.43 ▼ 100% 0.56 ▲ 0.56 ▲ Prodigy train curve experiment: Accuracy 0 25 50 75 100 % of examples 25 50 75 100 125 150 projection prodigy.ai/docs/recipes Will more data improve the model?
  7. 🧑💻 human experts in the loop 🔮 suggestions from LLM

    explosion.ai/blog/sp-global-commodities Workflow:
  8. 🧑💻 human experts in the loop 📚 task-specific data ➕

    🔮 suggestions from LLM explosion.ai/blog/sp-global-commodities Workflow:
  9. 🧑💻 human experts in the loop 📦 model package 📚

    task-specific data ➕ 🔮 suggestions from LLM explosion.ai/blog/sp-global-commodities Workflow:
  10. 🧑💻 human experts in the loop 🚀 structured data 📦

    model package 📚 task-specific data ➕ 🔮 suggestions from LLM explosion.ai/blog/sp-global-commodities Workflow: