The Future of NLP in Python (Keynote, PyCon Colombia 2020)

C005d9d90f1b1b1c2a0a478d67f1fee9?s=47 Ines Montani
February 08, 2020

The Future of NLP in Python (Keynote, PyCon Colombia 2020)

C005d9d90f1b1b1c2a0a478d67f1fee9?s=128

Ines Montani

February 08, 2020
Tweet

Transcript

  1. None
  2. None
  3. SEBASTIÁN says hi!

  4. SP ACY Open-source library for industrial-strength Natural Language Processing 100k+

    U S E R S
  5. None
  6. Annotation tool for creating training data for machine learning models

    3000+ U S E R S P R O D IG Y
  7. None
  8. Lightweight deep learning library for composing models with a functional

    type-checked API THINC new R E L E A S E
  9. None
  10. None
  11. Python GROWTH

  12. Why PYTHON? C extensions dynamic language general-purpose

  13. better than specialized “AI language” easier for developers to branch

    out General PURPOSE
  14. GENERALISTS SPECIALISTS

  15. GENERALISTS SPECIALISTS COMPLEMENTARY

  16. GENERALISTS SPECIALISTS COMPLEMENTARY Tree-shaped SKILLS T-SHAPED TREE-SHAPED

  17. you ship your organizational structure GENERALISTS SPECIALISTS COMPLEMENTARY Tree-shaped SKILLS

    T-SHAPED TREE-SHAPED
  18. None
  19. Processing PIPELINE TEXT DOC

  20. Processing PIPELINE PART-OF- SPEECH TAGGER NAMED ENTITY RECOGNIZER SYNTACTIC DEPENDENCY

    PARSER TEXT DOC
  21. Processing PIPELINE PART-OF- SPEECH TAGGER NAMED ENTITY RECOGNIZER SYNTACTIC DEPENDENCY

    PARSER TEXT DOC PERSON
  22. Transfer LEARNING

  23. Transfer LEARNING TASK- SPECIFIC MODEL TEXT

  24. Transfer LEARNING TASK- SPECIFIC MODEL TEXT GENERAL LANGUAGE MODEL

  25. Transformer MODELS accurate and reusable subnetwork different workflows: working at

    the tensor level
  26. None
  27. Problem #1 Local AI startup’s code base “kind of hard

    to read” Matt (25, Senior Engineer): “array[:, ..., :4] – what does this even mean?” BREAKING
  28. How many DIMENSIONS?

  29. How many DIMENSIONS? 2

  30. How many DIMENSIONS?

  31. How many DIMENSIONS? 2

  32. How many DIMENSIONS?

  33. 1 How many DIMENSIONS?

  34. None
  35. None
  36. None
  37. Y: Floats3d Incompatible return value type (got "Tuple[Floats3d, Callable[[Any], Any]]",

    expected "Tuple[Floats1d, Callable[..., Any]]")
  38. None
  39. None
  40. Relu: Relu Layer outputs type (thinc.types.Floats2d) but the next layer

    expects (thinc.types.Ragged) as an input
  41. None
  42. Problem #2 HYPER- PARAMETERS WEIGHTS OTHER SETTINGS MODEL CODE MACHINE

    LEARNING LIBRARY
  43. Problem #2 HYPER- PARAMETERS WEIGHTS OTHER SETTINGS MODEL CODE MACHINE

    LEARNING LIBRARY
  44. THINC.AI →

  45. 1 THINC.AI →

  46. 1 2 THINC.AI →

  47. 1 2 THINC.AI →

  48. 1 2 3 THINC.AI →

  49. None
  50. Under THE HOOD 1

  51. Under THE HOOD 1 2

  52. Coming SOON

  53. None
  54. Problem #3 WE NEED A DATABASE OF COMPANY ACQUISITIONS WITH

    PRICES AND STOCK TICKERS. pytorch predict company acquisitions with prices and stock tickers No results. OKAY, I'M ON IT!
  55. Microsoft acquires software development platform GitHub for $7.5 billion

  56. Microsoft acquires software development platform GitHub for $7.5 billion

  57. TEXT CLASSIFIER Microsoft acquires software development platform GitHub for $7.5

    billion
  58. TEXT CLASSIFIER ENTITY RECOGNIZER Microsoft acquires software development platform GitHub

    for $7.5 billion
  59. TEXT CLASSIFIER ENTITY RECOGNIZER ENTITY LINKER Microsoft acquires software development

    platform GitHub for $7.5 billion
  60. TEXT CLASSIFIER ENTITY RECOGNIZER ENTITY LINKER ATTRIBUTE LOOKUP Microsoft acquires

    software development platform GitHub for $7.5 billion
  61. TEXT CLASSIFIER ENTITY RECOGNIZER ENTITY LINKER ATTRIBUTE LOOKUP CURRENCY NORMALIZER

    Microsoft acquires software development platform GitHub for $7.5 billion
  62. None
  63. Problem #4 in practice CODE DATA in theory DATA CODE

  64. Pope Francis visits U.S. Which is CORRECT? P E R

    S O N Pope Francis visits U.S. P E R S O N
  65. I love cats . I hate cats . Similar OR

    NOT?
  66. PRODIGY.AI →

  67. PRODIGY.AI →

  68. PRODIGY.AI →

  69. PRODIGY.AI →

  70. PRODIGY.AI →

  71. Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) typical

    project
  72. S W A M P O F UN C E

    R TAIN T Y Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) typical project
  73. H I L L O F H O P E

    S W A M P O F UN C E R TAIN T Y Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) typical project
  74. P L A T E A U O F F

    R U S T R A T I O N H I L L O F H O P E S W A M P O F UN C E R TAIN T Y Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) typical project
  75. Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) typical

    project
  76. S W A M P O F UN C E

    R TAIN T Y Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) typical project
  77. Q U I C K S A N D O

    F S U N K C O S T S S W A M P O F UN C E R TAIN T Y Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) typical project
  78. Q U I C K S A N D O

    F S U N K C O S T S S W A M P O F UN C E R TAIN T Y Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) typical project when TO STOP?
  79. I T E R A T I V E W

    E T L A N D S O F S L I G H T L Y L E S S UN C E R TAIN T Y Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) future project
  80. I T E R A T I V E W

    E T L A N D S O F S L I G H T L Y L E S S UN C E R TAIN T Y Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) future project STOP STOP
  81. M E A D O W S O F S

    U C C E S S GO! I T E R A T I V E W E T L A N D S O F S L I G H T L Y L E S S UN C E R TAIN T Y Effort (training data size, time, experimenting) Effectiveness (accuracy, quality) future project STOP STOP
  82. None
  83. Future OUTLOOK

  84. Future OUTLOOK lots of developers generalists & specialists WHO?

  85. Future OUTLOOK transfer learning component pipelines WHAT? lots of developers

    generalists & specialists WHO?
  86. Future OUTLOOK transfer learning component pipelines WHAT? iterative in-house HOW?

    lots of developers generalists & specialists WHO?
  87. None