Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Designing Practical NLP Solutions

Designing Practical NLP Solutions

Ines Montani

June 18, 2020
Tweet

Video

More Decks by Ines Montani

Other Decks in Technology

Transcript

  1. Early 2015 spaCy is first released • open-source library for

    industrial- strength Natural Language Processing • focused on production use
  2. Early 2015 spaCy is first released • open-source library for

    industrial- strength Natural Language Processing • focused on production use Current stats 17m+ total downloads 16k+ stars on GitHub 400+ contributors 80+ extension packages
  3. Late 2016 Explosion • new company for AI developer tools

    • bootstrapped through consulting for the first 6 months • funded through software sales since 2017 • remote team, centered in Berlin
  4. Late 2016 Explosion • new company for AI developer tools

    • bootstrapped through consulting for the first 6 months • funded through software sales since 2017 • remote team, centered in Berlin Current stats 8 team members 100% independent & profitable
  5. Late 2017 Prodigy • first commercial product • modern annotation

    tool • fully scriptable in Python Current stats 4000+ users, including 500+ companies 1600+ forum members
  6. Coming soon • spaCy v2.3: Models for Chinese, Japanese and

    many more • spaCy v3.0: Transformer-based pipelines, custom models using any library, new training workflow • Prodigy v1.10: Dependencies & relation annotation, audio & video annotation & lots of new features • Prodigy Teams: Manage large annotation projects in your cloud
  7. How to maximize your project’s risk of failure Imagineer. Forecast.

    Outsource. Wire. Ship. 1 2 3 4 5 Decide what your application ought to do. Be ambitious! Nobody changed the world saying “uh, will that work?”
  8. Imagineer. Forecast. Outsource. Wire. Ship. How to maximize your project’s

    risk of failure 1 2 3 4 5 Figure out what accuracy you’ll need. If you’re not sure here, just say 90%.
  9. How to maximize your project’s risk of failure 1 2

    3 4 5 Imagineer. Forecast. Outsource. Wire. Ship. Pay someone else to gather your data. Think carefully about your accuracy requirements, and then ask for 10k rows.
  10. Imagineer. Forecast. Outsource. Wire. Ship. How to maximize your project’s

    risk of failure 1 2 3 4 5 Implement your network. This is the fun part! Tensor all your flows, descend every gradient!
  11. How to maximize your project’s risk of failure 1 2

    3 4 5 Imagineer. Forecast. Outsource. Wire. Ship. Put it all together. If it doesn’t work, maybe blame the intern?
  12. Requirements We’re building a crime database based on news reports.

    We want to label the following: victim name perpetrator name crime location offence date arrest date #1
  13. Requirements We’re adding data from financial news about company sales

    to our internal database, so we can connect it to our analytics. We need to extract: buyer (official company name) and stock ticker acquired company with stock ticker sale price and currency #2
  14. TEXT CLASSIFIER ENTITY RECOGNIZER ENTITY LINKER ATTRIBUTE LOOKUP “Microsoft acquires

    software development platform GitHub for $7.5 billion”
  15. TEXT CLASSIFIER ENTITY RECOGNIZER ENTITY LINKER ATTRIBUTE LOOKUP CURRENCY NORMALIZER

    “Microsoft acquires software development platform GitHub for $7.5 billion”
  16. The great thing about practical NLP: you can choose to

    make the problem simpler and the solution cheaper. #1
  17. The most interesting problems are very specific and also need

    specific solutions. That’s what makes them valuable. #2