Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Future of NLP in Python (Keynote, PyCon Colombia 2020)

The Future of NLP in Python (Keynote, PyCon Colombia 2020)

Ines Montani
PRO

February 08, 2020
Tweet

More Decks by Ines Montani

Other Decks in Programming

Transcript

  1. View Slide

  2. View Slide

  3. SEBASTIÁN
    says hi!

    View Slide

  4. SP ACY
    Open-source library for
    industrial-strength Natural
    Language Processing
    100k+
    U S E R S

    View Slide

  5. View Slide

  6. Annotation tool for creating
    training data for machine
    learning models
    3000+
    U S E R S
    P R O D IG Y

    View Slide

  7. View Slide

  8. Lightweight deep learning library
    for composing models with a
    functional type-checked API
    THINC
    new
    R E L E A S E

    View Slide

  9. View Slide

  10. View Slide

  11. Python
    GROWTH

    View Slide

  12. Why
    PYTHON?
    C extensions
    dynamic language
    general-purpose

    View Slide

  13. better than specialized
    “AI language”
    easier for developers
    to branch out
    General
    PURPOSE

    View Slide

  14. GENERALISTS
    SPECIALISTS

    View Slide

  15. GENERALISTS
    SPECIALISTS COMPLEMENTARY

    View Slide

  16. GENERALISTS
    SPECIALISTS COMPLEMENTARY
    Tree-shaped
    SKILLS
    T-SHAPED TREE-SHAPED

    View Slide

  17. you ship your
    organizational
    structure
    GENERALISTS
    SPECIALISTS COMPLEMENTARY
    Tree-shaped
    SKILLS
    T-SHAPED TREE-SHAPED

    View Slide

  18. View Slide

  19. Processing
    PIPELINE
    TEXT DOC

    View Slide

  20. Processing
    PIPELINE
    PART-OF-
    SPEECH
    TAGGER
    NAMED
    ENTITY
    RECOGNIZER
    SYNTACTIC
    DEPENDENCY
    PARSER
    TEXT DOC

    View Slide

  21. Processing
    PIPELINE
    PART-OF-
    SPEECH
    TAGGER
    NAMED
    ENTITY
    RECOGNIZER
    SYNTACTIC
    DEPENDENCY
    PARSER
    TEXT DOC
    PERSON

    View Slide

  22. Transfer
    LEARNING

    View Slide

  23. Transfer
    LEARNING
    TASK-
    SPECIFIC
    MODEL
    TEXT

    View Slide

  24. Transfer
    LEARNING
    TASK-
    SPECIFIC
    MODEL
    TEXT
    GENERAL
    LANGUAGE
    MODEL

    View Slide

  25. Transformer
    MODELS
    accurate and reusable
    subnetwork
    different workflows:
    working at the tensor level

    View Slide

  26. View Slide

  27. Problem #1
    Local AI startup’s code base
    “kind of hard to read”
    Matt (25, Senior Engineer): “array[:, ..., :4]
    – what does this even mean?”
    BREAKING

    View Slide

  28. How many
    DIMENSIONS?

    View Slide

  29. How many
    DIMENSIONS?
    2

    View Slide

  30. How many
    DIMENSIONS?

    View Slide

  31. How many
    DIMENSIONS?
    2

    View Slide

  32. How many
    DIMENSIONS?

    View Slide

  33. 1
    How many
    DIMENSIONS?

    View Slide

  34. View Slide

  35. View Slide

  36. View Slide

  37. Y: Floats3d
    Incompatible return value type
    (got "Tuple[Floats3d, Callable[[Any], Any]]",
    expected "Tuple[Floats1d, Callable[..., Any]]")

    View Slide

  38. View Slide

  39. View Slide

  40. Relu: Relu
    Layer outputs type (thinc.types.Floats2d) but
    the next layer expects (thinc.types.Ragged) as
    an input

    View Slide

  41. View Slide

  42. Problem #2
    HYPER-
    PARAMETERS
    WEIGHTS
    OTHER
    SETTINGS
    MODEL
    CODE
    MACHINE
    LEARNING
    LIBRARY

    View Slide

  43. Problem #2
    HYPER-
    PARAMETERS
    WEIGHTS
    OTHER
    SETTINGS
    MODEL
    CODE
    MACHINE
    LEARNING
    LIBRARY

    View Slide

  44. THINC.AI

    View Slide

  45. 1
    THINC.AI

    View Slide

  46. 1
    2
    THINC.AI

    View Slide

  47. 1
    2
    THINC.AI

    View Slide

  48. 1
    2
    3
    THINC.AI

    View Slide

  49. View Slide

  50. Under
    THE HOOD
    1

    View Slide

  51. Under
    THE HOOD
    1
    2

    View Slide

  52. Coming
    SOON

    View Slide

  53. View Slide

  54. Problem #3 WE NEED A
    DATABASE OF COMPANY
    ACQUISITIONS WITH
    PRICES AND STOCK
    TICKERS.
    pytorch predict company acquisitions with prices and stock tickers
    No results.
    OKAY, I'M
    ON IT!

    View Slide

  55. Microsoft
    acquires software
    development
    platform GitHub
    for $7.5 billion

    View Slide

  56. Microsoft acquires software development
    platform GitHub for $7.5 billion

    View Slide

  57. TEXT CLASSIFIER
    Microsoft acquires software development
    platform GitHub for $7.5 billion

    View Slide

  58. TEXT CLASSIFIER
    ENTITY RECOGNIZER
    Microsoft acquires software development
    platform GitHub for $7.5 billion

    View Slide

  59. TEXT CLASSIFIER
    ENTITY RECOGNIZER
    ENTITY LINKER
    Microsoft acquires software development
    platform GitHub for $7.5 billion

    View Slide

  60. TEXT CLASSIFIER
    ENTITY RECOGNIZER
    ENTITY LINKER
    ATTRIBUTE LOOKUP
    Microsoft acquires software development
    platform GitHub for $7.5 billion

    View Slide

  61. TEXT CLASSIFIER
    ENTITY RECOGNIZER
    ENTITY LINKER
    ATTRIBUTE LOOKUP
    CURRENCY NORMALIZER
    Microsoft acquires software development
    platform GitHub for $7.5 billion

    View Slide

  62. View Slide

  63. Problem #4
    in practice
    CODE
    DATA
    in theory
    DATA
    CODE

    View Slide

  64. Pope Francis visits U.S.
    Which is
    CORRECT?
    P E R S O N
    Pope Francis visits U.S.
    P E R S O N

    View Slide

  65. I love cats .
    I hate cats .
    Similar
    OR NOT?

    View Slide

  66. PRODIGY.AI

    View Slide

  67. PRODIGY.AI

    View Slide

  68. PRODIGY.AI

    View Slide

  69. PRODIGY.AI

    View Slide

  70. PRODIGY.AI

    View Slide

  71. Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    typical
    project

    View Slide

  72. S W A M P O F
    UN C E R TAIN T Y
    Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    typical
    project

    View Slide

  73. H I L L O F H O P E
    S W A M P O F
    UN C E R TAIN T Y
    Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    typical
    project

    View Slide

  74. P L A T E A U O F
    F R U S T R A T I O N
    H I L L O F H O P E
    S W A M P O F
    UN C E R TAIN T Y
    Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    typical
    project

    View Slide

  75. Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    typical
    project

    View Slide

  76. S W A M P O F
    UN C E R TAIN T Y
    Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    typical
    project

    View Slide

  77. Q U I C K S A N D O F
    S U N K C O S T S
    S W A M P O F
    UN C E R TAIN T Y
    Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    typical
    project

    View Slide

  78. Q U I C K S A N D O F
    S U N K C O S T S
    S W A M P O F
    UN C E R TAIN T Y
    Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    typical
    project
    when
    TO STOP?

    View Slide

  79. I T E R A T I V E
    W E T L A N D S O F
    S L I G H T L Y L E S S
    UN C E R TAIN T Y
    Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    future
    project

    View Slide

  80. I T E R A T I V E
    W E T L A N D S O F
    S L I G H T L Y L E S S
    UN C E R TAIN T Y
    Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    future
    project
    STOP
    STOP

    View Slide

  81. M E A D O W S
    O F S U C C E S S
    GO!
    I T E R A T I V E
    W E T L A N D S O F
    S L I G H T L Y L E S S
    UN C E R TAIN T Y
    Effort (training data size, time, experimenting)
    Effectiveness (accuracy, quality)
    future
    project
    STOP
    STOP

    View Slide

  82. View Slide

  83. Future
    OUTLOOK

    View Slide

  84. Future
    OUTLOOK
    lots of developers
    generalists & specialists
    WHO?

    View Slide

  85. Future
    OUTLOOK
    transfer learning
    component pipelines
    WHAT?
    lots of developers
    generalists & specialists
    WHO?

    View Slide

  86. Future
    OUTLOOK
    transfer learning
    component pipelines
    WHAT?
    iterative
    in-house
    HOW?
    lots of developers
    generalists & specialists
    WHO?

    View Slide

  87. View Slide