Slide 1

Slide 1 text

Matteo Grella 2021 Natural Language Processing in Go GopherCon Europe 2021 ( )

Slide 2

Slide 2 text

Matteo Grella 2021 Matteo Grella ★ Head of Artificial Intelligence, Resp. Italian Branch at EXOP Group ★ Applying AI to detect security incidents ★ Go/Ada/Lisp Software Developer ★ Researcher on NLP ★ AI/ML/NLP Advisor ★ Creator and maintainer of spaGO @GrellaMatteo | matteo-grella | Matteo Grella 2/22

Slide 3

Slide 3 text

Matteo Grella 2021 Outline 1. Introducing spaGO https://github.com/nlpodyssey/spago/ 2. Tour of the current NLP capabilities https://github.com/matteo-grella/gophercon-eu-2021 3. Conclusion & Future Works 3/22

Slide 4

Slide 4 text

Matteo Grella 2021 What is spaGO? Self-contained ML/NLP library in pure Go: ➢ Go module ➢ CLI mode / Library mode ➢ Static-linked binary (no burden of Deep Learning frameworks) ➢ Suitable for embedded systems ➢ Docker image as small as 20 MB ➢ Support for both training and inference of DL models 4/22

Slide 5

Slide 5 text

Matteo Grella 2021 1. package main 2. 3. import ( 4. mat "github.com/nlpodyssey/spago/pkg/mat32" 5. "github.com/nlpodyssey/spago/pkg/ml/ag" 6. "github.com/nlpodyssey/spago/pkg/ml/ag/encoding/dot" 7. ) 8. 9. func main() { 10. g := ag.NewGraph() 11. w := g.NewVariableWithName(mat.NewScalar(3.0), true, "w") 12. b := g.NewVariableWithName(mat.NewScalar(1), true, "b") 13. x := g.NewVariableWithName(mat.NewScalar(2), false, "x") 14. 15. y := g.Add(g.Mul(w, x), b) // 7.0 16. _ := g.Sigmoid(y) // 0.99 17. 18. out, err := dot.Marshal(g) 19. if err != nil { 20. log.Fatal(err) 21. } 22. fmt.Println(string(out)) 23. } Internal ML Framework Lightweight “define-by-run” expression graph go run main.go | dot -Tsvg > example.svg 5/22

Slide 6

Slide 6 text

Matteo Grella 2021 Internal ML Framework Optimize mathematical expressions by back-propagating gradients (i.e. Learning) Loss (output, expected) Output Gradient Parameter Parameter Input 6/22 Δ Δ

Slide 7

Slide 7 text

Matteo Grella 2021 What is a Neural Model? 1. // Model for the LSTM 2. type Model struct { 3. WIn nn.Param `spago:"type:weights"` 4. WInRec nn.Param `spago:"type:weights"` 5. BIn nn.Param `spago:"type:biases"` 6. WOut nn.Param `spago:"type:weights"` 7. WOutRec nn.Param `spago:"type:weights"` 8. BOut nn.Param `spago:"type:biases"` 9. WFor nn.Param `spago:"type:weights"` 10. WForRec nn.Param `spago:"type:weights"` 11. BFor nn.Param `spago:"type:biases"` 12. WCand nn.Param `spago:"type:weights"` 13. WCandRec nn.Param `spago:"type:weights"` 14. BCand nn.Param `spago:"type:biases"` 15. } 1. // Forward performs the forward step. 2. func (m *Model) Forward(x, yPrev, cellPrev ag.Node) (cell, y ag.Node) { 3. g := m.Graph() 4. inG := g.Sigmoid(nn.Affine(g, m.BIn, m.WIn, x, m.WInRec, yPrev)) 5. outG := g.Sigmoid(nn.Affine(g, m.BOut, m.WOut, x, m.WOutRec, yPrev)) 6. forG := g.Sigmoid(nn.Affine(g, m.BFor, m.WFor, x, m.WForRec, yPrev)) 7. cand := g.Tanh(nn.Affine(g, m.BCand, m.WCand, x, m.WCandRec, yPrev)) 8. cell = g.Add(g.Prod(inG, cand), g.Prod(forG, cellPrev)) 9. y = g.Prod(s.OutG, g.Tanh(s.Cell)) 10. return 11. } Trained Parameters (weights and biases) Mathematical expressions (Forward() method) 7/22

Slide 8

Slide 8 text

Matteo Grella 2021 Long-Short Term Memory (LSTM) https://github.com/nlpodyssey/spago/blob/main/pkg/ml/nn/recurrent/lstm/lstm.go 8/22

Slide 9

Slide 9 text

Matteo Grella 2021 ➢ Social Media Monitoring ➢ Chatbot ➢ Customer Support ➢ Improvement of Search Tools ➢ Content Classification ➢ Language Translation ➢ ... Use Cases 9/22

Slide 10

Slide 10 text

Matteo Grella 2021 SOTA NLP Neural Models https://huggingface.co/ 10/22

Slide 11

Slide 11 text

Matteo Grella 2021 SOTA NLP Neural Models https://huggingface.co/models 11/22

Slide 12

Slide 12 text

Matteo Grella 2021 1. Build the spaGO CLI `Hugging Face Importer` 2. Use the tool to: a. Download pre-trained PyTorch models from the Hugging Face Model Hub b. Convert them into spaGO format Neural Model Importer 1. #!/usr/bin/env bash 2. set -e 3. models_path=${1:-.} 4. mkdir -p "$models_path" 5. 6. model_names=( 7. # Masked Language Model 8. 'bert-base-cased' 9. # Question Answering 10. 'deepset/bert-base-cased-squad2' 11. # Natural Language Inference 12. 'valhalla/distilbart-mnli-12-3/' 13. # Language-agnostic BERT Sentence Embedding 14. 'pvl/labse_bert' 15. # Machine Translation 16. 'Helsinki-NLP/opus-mt-it-en' 17. ) 18. 19. for model_name in "${model_names[@]}"; do 20. hf-importer --repo "$models_path" --model "$model_name" 21. done 12/22 ➢ Compatible with and PyTorch pre-trained models (GoPickle)

Slide 13

Slide 13 text

Matteo Grella 2021 Named Entities Recognition 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/sequencelabeler" 5. ) 6. 7. func main() { 8. model, err := sequencelabeler.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. fn := func(text string) { 15. result := model.Analyze(text, true, true) 16. for _, token := range result.Tokens { 17. fmt.Printf("%s -> %s\n", token.Text, token.Label) 18. } 19. fmt.Println() 20. } 21. forEachInput(os.Stdin, fn) 22. } > Matteo Grella was born in Turin (Italy) on December 18, 1987. Matteo Grella -> PERSON Turin -> GPE Italy -> GPE December 18 , 1987 -> DATE > He is the Head of AI at EXOP GmbH. EXOP GmbH -> ORG 13/22

Slide 14

Slide 14 text

Matteo Grella 2021 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/sequencelabeler/grpcapi" 5. "google.golang.org/grpc" 6. ) 7. 8. func main() { 9. conn, err := grpc.Dial("localhost:3264", grpc.WithInsecure()) 10. if err != nil { 11. log.Fatal(err) 12. } 13. defer conn.Close() 14. 15. client := grpcapi.NewSequenceLabelerClient(conn) 16. 17. fn := func(text string) { 18. result, err := client.Analyze(context.Background(), 19. &grpcapi.AnalyzeRequest{ 20. Text: text, 21. MergeEntities: true, 22. FilterNotEntities: true, 23. }) 24. if err != nil { 25. log.Fatal(err) 26. } 27. for _, token := range result.Tokens { 28. fmt.Printf("%s -> %s\n", token.Text, token.Label) 29. } 30. fmt.Println() 31. } 32. forEachInput(os.Stdin, fn) 33. } 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/sequencelabeler" 5. "github.com/nlpodyssey/spago/pkg/nlp/sequencelabeler/grpcapi" 6. "google.golang.org/grpc" 7. ) 8. 9. func main() { 10. model, err := sequencelabeler.LoadModel(os.Args[1]) 11. if err != nil { 12. log.Fatal(err) 13. } 14. defer model.Close() 15. 16. grpcServer := grpc.NewServer() 17. grpcapi.RegisterSequenceLabelerServer(grpcServer, 18. sequencelabeler.NewServer(model)) 19. 20. listener, err := net.Listen("tcp", "localhost:3264") 21. if err != nil { 22. log.Fatal(err) 23. } 24. 25. fmt.Println("Listening...") 26. err = grpcServer.Serve(listener) 27. if err != nil { 28. log.Fatal(err) 29. } 30. } Named Entities Recognition (gRPC) client.go server.go 14/22

Slide 15

Slide 15 text

Matteo Grella 2021 Character-Level Language Model 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/charlm" 5. ) 6. 7. func main() { 8. model, err := charlm.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. generator := charlm.NewGenerator(model, 15. charlm.GeneratorConfig{ 16. MaxCharacters: 300, StopAtEOS: true, Temperature: 0.4, 17. }, 18. ) 19. 20. fn := func(text string) { 21. result, _ := generator.GenerateText(text) 22. fmt.Printf("%s\n", result) 23. } 24. forEachInput(os.Stdin, fn) 25. } > Italy Italy , a country that has been seen as a partner and the only country to have a population . > I really enjoy this I really enjoy this beautiful thing . > I am very sad for I am very sad for the course of the season , where the results were dead . 15/22

Slide 16

Slide 16 text

Matteo Grella 2021 Masked Language Model 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bert" 5. ) 6. 7. func main() { 8. model, err := bert.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. fn := func(text string) { 15. result := model.PredictMLM(text + " ") 16. for _, token := range result { 17. text = strings.Replace(text, 18. "[MASK]", green(token.Text), 1) 19. } 20. fmt.Printf("%s\n\n", text) 21. } 22. forEachInput(os.Stdin, fn) 23. } > I'm so [MASK] to talk about this topic! I'm so excited to talk about this topic! > The [MASK] of this neural [MASK] is impressive. The efficiency of this neural network is impressive. > [MASK] is a programming language Python is a programming language > Berlin is the capital of [MASK] . Berlin is the capital of Germany . > [MASK] is the capital of Germany. Frankfurt is the capital of Germany. 16/22

Slide 17

Slide 17 text

Matteo Grella 2021 Question-Answering 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bert" 5. ) 6. 7. func main() { 8. model, err := bert.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. paragraph := readFile(os.Args[2]) 15. 16. fn := func(text string) { 17. result := model.Answer(text, paragraph) 18. if result != nil && result[0].Confidence < 0.5 { 19. fmt.Print("Sorry, I'm not sure.\n\n") 20. return 21. } 22. for i, answer := range result { 23. fmt.Printf("%d. %s [%.2f]\n", 24. i, answer.Text, answer.Confidence) 25. } 26. fmt.Println() 27. } 28. forEachInput(os.Stdin, fn) 29. } > What is Go? 0. a programming language [0.93] > When Go was created? 0. 2007 [0.96] > What is the purpose of Go? Sorry, I'm not sure. > Why Go was created? 0. to address criticism of other languages [0.53] > Who invented Go? 0. Robert Griesemer, Rob Pike, and Ken Thompson [0.96] > Where was Robert working when he created Go? 0. Google [1.00] Go is a programming language designed at Google in 2007 by Robert Griesemer, Rob Pike, and Ken Thompson to address criticism of other languages. 17/22

Slide 18

Slide 18 text

Matteo Grella 2021 Zero-shot Text Classification 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bart/tasks/zsc" 5. ) 6. 7. func main() { 8. model, err := zsc.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. // arbitrary list of topics 15. classes := []string{"positive", "negative"} 16. 17. fn := func(text string) { 18. result, err := model.Classify(text, "", classes, false) 19. if err != nil { 20. log.Fatal(err) 21. } 22. for i, item := range result.Distribution { 23. fmt.Printf("%d. %s [%.2f]\n", 24. i, item.Class, item.Confidence) 25. } 26. fmt.Println() 27. } 28. forEachInput(os.Stdin, fn) 29. } > I got a promotion at work! 0. positive [0.98] 1. negative [0.02] > I've been working at a company but I got fired. 0. negative [0.90] 1. positive [0.10] > I am pleased with my new phone, but it has a short battery life. 0. positive [0.61] 1. negative [0.39] 18/22

Slide 19

Slide 19 text

Matteo Grella 2021 Cross-Lingual Text Similarity 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/mat32" 5. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bert" 6. ) 7. 8. func main() { 9. model, err := bert.LoadModel(os.Args[1]) 10. if err != nil { 11. log.Fatal(err) 12. } 13. defer model.Close() 14. 15. vectorize := func(text string) []float32 { 16. vector, err := model.Vectorize(text, bert.ClsToken) 17. if err != nil { 18. log.Fatal(err) 19. } 20. return vector.(*mat32.Dense).Normalize2().Data() 21. } 22. 23. fn := func(text string) { 24. text1, text2 := splitByPipe(text) 25. score := dotProduct(vectorize(text1), vectorize(text2)) 26. fmt.Printf("Similarity: %s\n", colorize(score)) 27. } 28. forEachInput(os.Stdin, fn) 29. } > I love my dog | I enjoy my puppy Similarity: 0.81 > I love my dog | I love playing guitar Similarity: 0.54 > I love my dog | Amo il mio cane Similarity: 0.94 > I love my dog | Ich liebe meinen Hund Similarity: 0.93 > I love my dog | Я люблю свою собаку Similarity: 0.93 19/22

Slide 20

Slide 20 text

Matteo Grella 2021 Cross-Lingual Text Similarity (2) 1. func main() { 2. model, err := bert.LoadModel(os.Args[1]) 3. if err != nil { 4. log.Fatal(err) 5. } 6. defer model.Close() 7. 8. vectorize := func(text string) []float32 { 9. vector, err := model.Vectorize(text, bert.ClsToken) 10. if err != nil { 11. log.Fatal(err) 12. } 13. return vector.(*mat32.Dense).Normalize2().Data() 14. } 15. 16. sentences := getSentencesFromFile(os.Args[2]) 17. vectors := make([][]float32, len(sentences)) 18. for i, sentence := range sentences { 19. vectors[i] = vectorize(sentence) // can be concurrent 20. } 21. 22. fn := func(text string) { 23. hits := rankBySimilarity(vectors, vectorize(text)) 24. for i, item := range limit(hits, 5) { 25. fmt.Printf("%d. %s [%s]\n", 26. i, sentences[item.id], colorize(item.score)) 27. } 28. fmt.Println() 29. } 30. forEachInput(os.Stdin, fn) 31. } > I am very satisfied with my phone. 0. I really like my phone. [0.82] 1. I love my wife very much. [0.56] 2. I'm looking forward to dining with you. [0.39] 3. Did you get a new car? [0.22] 4. How old are you? [0.12] > ¿Qué edad tienes? 0. How old are you? [0.86] 1. Did you get a new car? [0.48] 2. Will it snow tomorrow? [0.37] 3. I love my wife very much. [0.22] 4. I really like my phone. [0.20] > Hai una macchina nuova? 0. Did you get a new car? [0.88] 1. Will it snow tomorrow? [0.46] 2. How old are you? [0.42] 3. I really like my phone. [0.33] 4. I love my wife very much. [0.26] 20/22

Slide 21

Slide 21 text

Matteo Grella 2021 Machine Translation 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bart/tasks/seq2seq" 5. ) 6. 7. func main() { 8. model, err := seq2seq.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. fn := func(text string) { 15. result, err := model.Generate(text) 16. if err != nil { 17. log.Fatal(err) 18. } 19. fmt.Printf("%s\n\n", result) 20. } 21. forEachInput(os.Stdin, fn) 22. } > Questo è un traduttore automatico scritto in puro Go... ci puoi credere?! This is a machine translation written in pure Go... Can you believe it?! Production-ready HTTP JSON and gRPC API translation service https://github.com/SpecializedGeneralist/translator 21/22

Slide 22

Slide 22 text

Matteo Grella 2021 ➔ Go is suitable for AI development! ➔ Roadmap: ◆ Include other SOTA models (e.g. Google T5) ◆ Optimize like mainstream DL frameworks ◆ Quantization (float32 → int) ◆ Make GPU/TPU-friendly (Gorgonia Tensors) ★ Join the project ;) https://github.com/nlpodyssey/spago/ Conclusion and Future Works 22/22

Slide 23

Slide 23 text

Matteo Grella 2021 Matteo Grella @GrellaMatteo [email protected] Thanks! https://github.com/matteo-grella https://github.com/nlpodyssey/spago/ 23/22