Slide 1

Slide 1 text

The state of NLP in production

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Python Mauritius Usergroup site fb linkedin mailing list 3

Slide 4

Slide 4 text

url pymug.com site 4

Slide 5

Slide 5 text

About me compileralchemy.com 5

Slide 6

Slide 6 text

slides 6

Slide 7

Slide 7 text

The state of NLP in production 7

Slide 8

Slide 8 text

Hardest part of a real-world project 8

Slide 9

Slide 9 text

? 9

Slide 10

Slide 10 text

Is it cooking up an awesome model? 10

Slide 11

Slide 11 text

No, the world is more complex than this 11

Slide 12

Slide 12 text

Elements of an NLP project 12

Slide 13

Slide 13 text

NLP project gather data clean store train use model retrain model 13

Slide 14

Slide 14 text

gather data 14

Slide 15

Slide 15 text

Toy project use curated data set quick extraction 15

Slide 16

Slide 16 text

Real project a lot of data needed data corresponds to business case. data probably does not exist speed of data gathering find ingenious / better ways of getting data automate collection 16

Slide 17

Slide 17 text

clean/preprocess data 17

Slide 18

Slide 18 text

Toy project use an existing parser / curator e.g. NLTK existing options 18

Slide 19

Slide 19 text

Real project use a parser intended for it, several custom steps parallel processing of data 19

Slide 20

Slide 20 text

store data 20

Slide 21

Slide 21 text

Toy project laptop 21

Slide 22

Slide 22 text

Real project cloud database hot / cold data TTL 22

Slide 23

Slide 23 text

training 23

Slide 24

Slide 24 text

Toy project use laptop / external GPU 24

Slide 25

Slide 25 text

Real project on cloud training on cloud knowledge cross-cloud skills fault tolerance 25

Slide 26

Slide 26 text

use model 26

Slide 27

Slide 27 text

Toy project local website / code 27

Slide 28

Slide 28 text

Real project continuation of pipeline web service architecture devops / deploy 28

Slide 29

Slide 29 text

retraining 29

Slide 30

Slide 30 text

Toy project euhh this even exists???? 30

Slide 31

Slide 31 text

Real project learn cloud offerings for continuous learning ways to retrain / fine tune 31

Slide 32

Slide 32 text

It's more than serving a model 32

Slide 33

Slide 33 text

Operation model 33

Slide 34

Slide 34 text

[ pipeline ] data collection --- process --- train -<- | | --------------------------- model ^ | | | | --->--- V web service [pod] [pod] --- happy user | -> users service [pod] [pod] | -> db service [pod] 34

Slide 35

Slide 35 text

skills chart 35

Slide 36

Slide 36 text

skills --------------- --------------- | | | | | backend | | devops | | | | | --------------- --------------- --------------- --------------- | | | | | backend | | data eng | | | | | --------------- --------------- 36

Slide 37

Slide 37 text

skills --------------- --------------- | | | | | backend | | devops | | | | | --------------- --------------- web service deploy --------------- --------------- | | | | | ml | | data eng | | | | | --------------- --------------- models pipelining 37

Slide 38

Slide 38 text

code blueprint [ architecture repos ] [ pipeline repos ] [ ml repos ] [ backend repos ] 38

Slide 39

Slide 39 text

Tools 39

Slide 40

Slide 40 text

Pandas Good queries Much resources Read SQL 40

Slide 41

Slide 41 text

Dask Good for it's purpose: Parallelize tasks Poor docs 41

Slide 42

Slide 42 text

Polars Awesome parallelizations Great docs 42

Slide 43

Slide 43 text

NLTK use spacy if possible 43

Slide 44

Slide 44 text

Notebooks great for cloud used in production on the cloud 44

Slide 45

Slide 45 text

Advice to research / scientists folks keep everything clean people will come after you always in hurry / messy / i'll clean it later mood good practices? is this phrase in the korean dictionary? 45

Slide 46

Slide 46 text

General advices have great docs good onboarding have great standards 46

Slide 47

Slide 47 text

Keep learning! 47