Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The state of NLP in production 🥽

The state of NLP in production 🥽

NLP in production vs real life

Abdur-Rahmaan Janhangeer

August 27, 2023
Tweet

More Decks by Abdur-Rahmaan Janhangeer

Other Decks in Programming

Transcript

  1. The state of NLP in production

    View Slide

  2. View Slide

  3. Python Mauritius Usergroup
    site
    fb
    linkedin
    mailing list
    3

    View Slide

  4. url
    pymug.com site
    4

    View Slide

  5. About me
    compileralchemy.com
    5

    View Slide

  6. slides
    6

    View Slide

  7. The state of NLP in production
    7

    View Slide

  8. Hardest part of a real-world project
    8

    View Slide

  9. ?
    9

    View Slide

  10. Is it cooking up an awesome model?
    10

    View Slide

  11. No, the world is more complex than this
    11

    View Slide

  12. Elements of an NLP project
    12

    View Slide

  13. NLP project
    gather data
    clean
    store
    train
    use model
    retrain model
    13

    View Slide

  14. gather data
    14

    View Slide

  15. Toy project
    use curated data set
    quick extraction
    15

    View Slide

  16. Real project
    a lot of data needed
    data corresponds to business case. data probably does
    not exist
    speed of data gathering
    find ingenious / better ways of getting data
    automate collection
    16

    View Slide

  17. clean/preprocess data
    17

    View Slide

  18. Toy project
    use an existing parser / curator e.g. NLTK existing options
    18

    View Slide

  19. Real project
    use a parser intended for it, several custom steps
    parallel processing of data
    19

    View Slide

  20. store data
    20

    View Slide

  21. Toy project
    laptop
    21

    View Slide

  22. Real project
    cloud database
    hot / cold data
    TTL
    22

    View Slide

  23. training
    23

    View Slide

  24. Toy project
    use laptop / external GPU
    24

    View Slide

  25. Real project
    on cloud training
    on cloud knowledge
    cross-cloud skills
    fault tolerance
    25

    View Slide

  26. use model
    26

    View Slide

  27. Toy project
    local website / code
    27

    View Slide

  28. Real project
    continuation of pipeline
    web service architecture
    devops / deploy
    28

    View Slide

  29. retraining
    29

    View Slide

  30. Toy project
    euhh this even exists????
    30

    View Slide

  31. Real project
    learn cloud offerings for continuous learning
    ways to retrain / fine tune
    31

    View Slide

  32. It's more than serving a model
    32

    View Slide

  33. Operation model
    33

    View Slide

  34. [ pipeline ]
    data collection --- process --- train -<-
    | |
    --------------------------- model ^
    | | |
    | --->---
    V
    web service [pod] [pod] --- happy user
    |
    -> users service [pod] [pod]
    |
    -> db service [pod]
    34

    View Slide

  35. skills chart
    35

    View Slide

  36. skills
    --------------- ---------------
    | | | |
    | backend | | devops |
    | | | |
    --------------- ---------------
    --------------- ---------------
    | | | |
    | backend | | data eng |
    | | | |
    --------------- ---------------
    36

    View Slide

  37. skills
    --------------- ---------------
    | | | |
    | backend | | devops |
    | | | |
    --------------- ---------------
    web service deploy
    --------------- ---------------
    | | | |
    | ml | | data eng |
    | | | |
    --------------- ---------------
    models pipelining
    37

    View Slide

  38. code blueprint
    [ architecture repos ]
    [ pipeline repos ]
    [ ml repos ]
    [ backend repos ]
    38

    View Slide

  39. Tools
    39

    View Slide

  40. Pandas
    Good queries
    Much resources
    Read SQL
    40

    View Slide

  41. Dask
    Good for it's purpose: Parallelize tasks
    Poor docs
    41

    View Slide

  42. Polars
    Awesome parallelizations
    Great docs
    42

    View Slide

  43. NLTK
    use spacy if possible
    43

    View Slide

  44. Notebooks
    great for cloud
    used in production on the cloud
    44

    View Slide

  45. Advice to research / scientists folks
    keep everything clean
    people will come after you
    always in hurry / messy / i'll clean it later mood
    good practices? is this phrase in the korean dictionary?
    45

    View Slide

  46. General advices
    have great docs
    good onboarding
    have great standards
    46

    View Slide

  47. Keep learning!
    47

    View Slide