Upgrade to Pro — share decks privately, control downloads, hide ads and more …

(Neural) Natural Language Processing in Go

(Neural) Natural Language Processing in Go

At GopherCon Europe 2021, Matteo Grella discussed the spaGO package, the first and only pure Go library that focuses on cutting-edge neural technologies for Natural Language Processing (NLP).

Matteo Grella

May 28, 2021
Tweet

More Decks by Matteo Grella

Other Decks in Programming

Transcript

  1. Matteo Grella 2021 Matteo Grella ★ Head of Artificial Intelligence,

    Resp. Italian Branch at EXOP Group ★ Applying AI to detect security incidents ★ Go/Ada/Lisp Software Developer ★ Researcher on NLP ★ AI/ML/NLP Advisor ★ Creator and maintainer of spaGO @GrellaMatteo | matteo-grella | Matteo Grella 2/22
  2. Matteo Grella 2021 Outline 1. Introducing spaGO https://github.com/nlpodyssey/spago/ 2. Tour

    of the current NLP capabilities https://github.com/matteo-grella/gophercon-eu-2021 3. Conclusion & Future Works 3/22
  3. Matteo Grella 2021 What is spaGO? Self-contained ML/NLP library in

    pure Go: ➢ Go module ➢ CLI mode / Library mode ➢ Static-linked binary (no burden of Deep Learning frameworks) ➢ Suitable for embedded systems ➢ Docker image as small as 20 MB ➢ Support for both training and inference of DL models 4/22
  4. Matteo Grella 2021 1. package main 2. 3. import (

    4. mat "github.com/nlpodyssey/spago/pkg/mat32" 5. "github.com/nlpodyssey/spago/pkg/ml/ag" 6. "github.com/nlpodyssey/spago/pkg/ml/ag/encoding/dot" 7. ) 8. 9. func main() { 10. g := ag.NewGraph() 11. w := g.NewVariableWithName(mat.NewScalar(3.0), true, "w") 12. b := g.NewVariableWithName(mat.NewScalar(1), true, "b") 13. x := g.NewVariableWithName(mat.NewScalar(2), false, "x") 14. 15. y := g.Add(g.Mul(w, x), b) // 7.0 16. _ := g.Sigmoid(y) // 0.99 17. 18. out, err := dot.Marshal(g) 19. if err != nil { 20. log.Fatal(err) 21. } 22. fmt.Println(string(out)) 23. } Internal ML Framework Lightweight “define-by-run” expression graph go run main.go | dot -Tsvg > example.svg 5/22
  5. Matteo Grella 2021 Internal ML Framework Optimize mathematical expressions by

    back-propagating gradients (i.e. Learning) Loss (output, expected) Output Gradient Parameter Parameter Input 6/22 Δ Δ
  6. Matteo Grella 2021 What is a Neural Model? 1. //

    Model for the LSTM 2. type Model struct { 3. WIn nn.Param `spago:"type:weights"` 4. WInRec nn.Param `spago:"type:weights"` 5. BIn nn.Param `spago:"type:biases"` 6. WOut nn.Param `spago:"type:weights"` 7. WOutRec nn.Param `spago:"type:weights"` 8. BOut nn.Param `spago:"type:biases"` 9. WFor nn.Param `spago:"type:weights"` 10. WForRec nn.Param `spago:"type:weights"` 11. BFor nn.Param `spago:"type:biases"` 12. WCand nn.Param `spago:"type:weights"` 13. WCandRec nn.Param `spago:"type:weights"` 14. BCand nn.Param `spago:"type:biases"` 15. } 1. // Forward performs the forward step. 2. func (m *Model) Forward(x, yPrev, cellPrev ag.Node) (cell, y ag.Node) { 3. g := m.Graph() 4. inG := g.Sigmoid(nn.Affine(g, m.BIn, m.WIn, x, m.WInRec, yPrev)) 5. outG := g.Sigmoid(nn.Affine(g, m.BOut, m.WOut, x, m.WOutRec, yPrev)) 6. forG := g.Sigmoid(nn.Affine(g, m.BFor, m.WFor, x, m.WForRec, yPrev)) 7. cand := g.Tanh(nn.Affine(g, m.BCand, m.WCand, x, m.WCandRec, yPrev)) 8. cell = g.Add(g.Prod(inG, cand), g.Prod(forG, cellPrev)) 9. y = g.Prod(s.OutG, g.Tanh(s.Cell)) 10. return 11. } Trained Parameters (weights and biases) Mathematical expressions (Forward() method) 7/22
  7. Matteo Grella 2021 ➢ Social Media Monitoring ➢ Chatbot ➢

    Customer Support ➢ Improvement of Search Tools ➢ Content Classification ➢ Language Translation ➢ ... Use Cases 9/22
  8. Matteo Grella 2021 1. Build the spaGO CLI `Hugging Face

    Importer` 2. Use the tool to: a. Download pre-trained PyTorch models from the Hugging Face Model Hub b. Convert them into spaGO format Neural Model Importer 1. #!/usr/bin/env bash 2. set -e 3. models_path=${1:-.} 4. mkdir -p "$models_path" 5. 6. model_names=( 7. # Masked Language Model 8. 'bert-base-cased' 9. # Question Answering 10. 'deepset/bert-base-cased-squad2' 11. # Natural Language Inference 12. 'valhalla/distilbart-mnli-12-3/' 13. # Language-agnostic BERT Sentence Embedding 14. 'pvl/labse_bert' 15. # Machine Translation 16. 'Helsinki-NLP/opus-mt-it-en' 17. ) 18. 19. for model_name in "${model_names[@]}"; do 20. hf-importer --repo "$models_path" --model "$model_name" 21. done 12/22 ➢ Compatible with and PyTorch pre-trained models (GoPickle)
  9. Matteo Grella 2021 Named Entities Recognition 1. package main 2.

    3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/sequencelabeler" 5. ) 6. 7. func main() { 8. model, err := sequencelabeler.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. fn := func(text string) { 15. result := model.Analyze(text, true, true) 16. for _, token := range result.Tokens { 17. fmt.Printf("%s -> %s\n", token.Text, token.Label) 18. } 19. fmt.Println() 20. } 21. forEachInput(os.Stdin, fn) 22. } > Matteo Grella was born in Turin (Italy) on December 18, 1987. Matteo Grella -> PERSON Turin -> GPE Italy -> GPE December 18 , 1987 -> DATE > He is the Head of AI at EXOP GmbH. EXOP GmbH -> ORG 13/22
  10. Matteo Grella 2021 1. package main 2. 3. import (

    4. "github.com/nlpodyssey/spago/pkg/nlp/sequencelabeler/grpcapi" 5. "google.golang.org/grpc" 6. ) 7. 8. func main() { 9. conn, err := grpc.Dial("localhost:3264", grpc.WithInsecure()) 10. if err != nil { 11. log.Fatal(err) 12. } 13. defer conn.Close() 14. 15. client := grpcapi.NewSequenceLabelerClient(conn) 16. 17. fn := func(text string) { 18. result, err := client.Analyze(context.Background(), 19. &grpcapi.AnalyzeRequest{ 20. Text: text, 21. MergeEntities: true, 22. FilterNotEntities: true, 23. }) 24. if err != nil { 25. log.Fatal(err) 26. } 27. for _, token := range result.Tokens { 28. fmt.Printf("%s -> %s\n", token.Text, token.Label) 29. } 30. fmt.Println() 31. } 32. forEachInput(os.Stdin, fn) 33. } 1. package main 2. 3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/sequencelabeler" 5. "github.com/nlpodyssey/spago/pkg/nlp/sequencelabeler/grpcapi" 6. "google.golang.org/grpc" 7. ) 8. 9. func main() { 10. model, err := sequencelabeler.LoadModel(os.Args[1]) 11. if err != nil { 12. log.Fatal(err) 13. } 14. defer model.Close() 15. 16. grpcServer := grpc.NewServer() 17. grpcapi.RegisterSequenceLabelerServer(grpcServer, 18. sequencelabeler.NewServer(model)) 19. 20. listener, err := net.Listen("tcp", "localhost:3264") 21. if err != nil { 22. log.Fatal(err) 23. } 24. 25. fmt.Println("Listening...") 26. err = grpcServer.Serve(listener) 27. if err != nil { 28. log.Fatal(err) 29. } 30. } Named Entities Recognition (gRPC) client.go server.go 14/22
  11. Matteo Grella 2021 Character-Level Language Model 1. package main 2.

    3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/charlm" 5. ) 6. 7. func main() { 8. model, err := charlm.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. generator := charlm.NewGenerator(model, 15. charlm.GeneratorConfig{ 16. MaxCharacters: 300, StopAtEOS: true, Temperature: 0.4, 17. }, 18. ) 19. 20. fn := func(text string) { 21. result, _ := generator.GenerateText(text) 22. fmt.Printf("%s\n", result) 23. } 24. forEachInput(os.Stdin, fn) 25. } > Italy Italy , a country that has been seen as a partner and the only country to have a population . > I really enjoy this I really enjoy this beautiful thing . > I am very sad for I am very sad for the course of the season , where the results were dead . 15/22
  12. Matteo Grella 2021 Masked Language Model 1. package main 2.

    3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bert" 5. ) 6. 7. func main() { 8. model, err := bert.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. fn := func(text string) { 15. result := model.PredictMLM(text + " ") 16. for _, token := range result { 17. text = strings.Replace(text, 18. "[MASK]", green(token.Text), 1) 19. } 20. fmt.Printf("%s\n\n", text) 21. } 22. forEachInput(os.Stdin, fn) 23. } > I'm so [MASK] to talk about this topic! I'm so excited to talk about this topic! > The [MASK] of this neural [MASK] is impressive. The efficiency of this neural network is impressive. > [MASK] is a programming language Python is a programming language > Berlin is the capital of [MASK] . Berlin is the capital of Germany . > [MASK] is the capital of Germany. Frankfurt is the capital of Germany. 16/22
  13. Matteo Grella 2021 Question-Answering 1. package main 2. 3. import

    ( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bert" 5. ) 6. 7. func main() { 8. model, err := bert.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. paragraph := readFile(os.Args[2]) 15. 16. fn := func(text string) { 17. result := model.Answer(text, paragraph) 18. if result != nil && result[0].Confidence < 0.5 { 19. fmt.Print("Sorry, I'm not sure.\n\n") 20. return 21. } 22. for i, answer := range result { 23. fmt.Printf("%d. %s [%.2f]\n", 24. i, answer.Text, answer.Confidence) 25. } 26. fmt.Println() 27. } 28. forEachInput(os.Stdin, fn) 29. } > What is Go? 0. a programming language [0.93] > When Go was created? 0. 2007 [0.96] > What is the purpose of Go? Sorry, I'm not sure. > Why Go was created? 0. to address criticism of other languages [0.53] > Who invented Go? 0. Robert Griesemer, Rob Pike, and Ken Thompson [0.96] > Where was Robert working when he created Go? 0. Google [1.00] Go is a programming language designed at Google in 2007 by Robert Griesemer, Rob Pike, and Ken Thompson to address criticism of other languages. 17/22
  14. Matteo Grella 2021 Zero-shot Text Classification 1. package main 2.

    3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bart/tasks/zsc" 5. ) 6. 7. func main() { 8. model, err := zsc.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. // arbitrary list of topics 15. classes := []string{"positive", "negative"} 16. 17. fn := func(text string) { 18. result, err := model.Classify(text, "", classes, false) 19. if err != nil { 20. log.Fatal(err) 21. } 22. for i, item := range result.Distribution { 23. fmt.Printf("%d. %s [%.2f]\n", 24. i, item.Class, item.Confidence) 25. } 26. fmt.Println() 27. } 28. forEachInput(os.Stdin, fn) 29. } > I got a promotion at work! 0. positive [0.98] 1. negative [0.02] > I've been working at a company but I got fired. 0. negative [0.90] 1. positive [0.10] > I am pleased with my new phone, but it has a short battery life. 0. positive [0.61] 1. negative [0.39] 18/22
  15. Matteo Grella 2021 Cross-Lingual Text Similarity 1. package main 2.

    3. import ( 4. "github.com/nlpodyssey/spago/pkg/mat32" 5. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bert" 6. ) 7. 8. func main() { 9. model, err := bert.LoadModel(os.Args[1]) 10. if err != nil { 11. log.Fatal(err) 12. } 13. defer model.Close() 14. 15. vectorize := func(text string) []float32 { 16. vector, err := model.Vectorize(text, bert.ClsToken) 17. if err != nil { 18. log.Fatal(err) 19. } 20. return vector.(*mat32.Dense).Normalize2().Data() 21. } 22. 23. fn := func(text string) { 24. text1, text2 := splitByPipe(text) 25. score := dotProduct(vectorize(text1), vectorize(text2)) 26. fmt.Printf("Similarity: %s\n", colorize(score)) 27. } 28. forEachInput(os.Stdin, fn) 29. } > I love my dog | I enjoy my puppy Similarity: 0.81 > I love my dog | I love playing guitar Similarity: 0.54 > I love my dog | Amo il mio cane Similarity: 0.94 > I love my dog | Ich liebe meinen Hund Similarity: 0.93 > I love my dog | Я люблю свою собаку Similarity: 0.93 19/22
  16. Matteo Grella 2021 Cross-Lingual Text Similarity (2) 1. func main()

    { 2. model, err := bert.LoadModel(os.Args[1]) 3. if err != nil { 4. log.Fatal(err) 5. } 6. defer model.Close() 7. 8. vectorize := func(text string) []float32 { 9. vector, err := model.Vectorize(text, bert.ClsToken) 10. if err != nil { 11. log.Fatal(err) 12. } 13. return vector.(*mat32.Dense).Normalize2().Data() 14. } 15. 16. sentences := getSentencesFromFile(os.Args[2]) 17. vectors := make([][]float32, len(sentences)) 18. for i, sentence := range sentences { 19. vectors[i] = vectorize(sentence) // can be concurrent 20. } 21. 22. fn := func(text string) { 23. hits := rankBySimilarity(vectors, vectorize(text)) 24. for i, item := range limit(hits, 5) { 25. fmt.Printf("%d. %s [%s]\n", 26. i, sentences[item.id], colorize(item.score)) 27. } 28. fmt.Println() 29. } 30. forEachInput(os.Stdin, fn) 31. } > I am very satisfied with my phone. 0. I really like my phone. [0.82] 1. I love my wife very much. [0.56] 2. I'm looking forward to dining with you. [0.39] 3. Did you get a new car? [0.22] 4. How old are you? [0.12] > ¿Qué edad tienes? 0. How old are you? [0.86] 1. Did you get a new car? [0.48] 2. Will it snow tomorrow? [0.37] 3. I love my wife very much. [0.22] 4. I really like my phone. [0.20] > Hai una macchina nuova? 0. Did you get a new car? [0.88] 1. Will it snow tomorrow? [0.46] 2. How old are you? [0.42] 3. I really like my phone. [0.33] 4. I love my wife very much. [0.26] 20/22
  17. Matteo Grella 2021 Machine Translation 1. package main 2. 3.

    import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bart/tasks/seq2seq" 5. ) 6. 7. func main() { 8. model, err := seq2seq.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. fn := func(text string) { 15. result, err := model.Generate(text) 16. if err != nil { 17. log.Fatal(err) 18. } 19. fmt.Printf("%s\n\n", result) 20. } 21. forEachInput(os.Stdin, fn) 22. } > Questo è un traduttore automatico scritto in puro Go... ci puoi credere?! This is a machine translation written in pure Go... Can you believe it?! Production-ready HTTP JSON and gRPC API translation service https://github.com/SpecializedGeneralist/translator 21/22
  18. Matteo Grella 2021 ➔ Go is suitable for AI development!

    ➔ Roadmap: ◆ Include other SOTA models (e.g. Google T5) ◆ Optimize like mainstream DL frameworks ◆ Quantization (float32 → int) ◆ Make GPU/TPU-friendly (Gorgonia Tensors) ★ Join the project ;) https://github.com/nlpodyssey/spago/ Conclusion and Future Works 22/22