At GopherCon Europe 2021, Matteo Grella discussed the spaGO package, the first and only pure Go library that focuses on cutting-edge neural technologies for Natural Language Processing (NLP).
Resp. Italian Branch at EXOP Group ★ Applying AI to detect security incidents ★ Go/Ada/Lisp Software Developer ★ Researcher on NLP ★ AI/ML/NLP Advisor ★ Creator and maintainer of spaGO @GrellaMatteo | matteo-grella | Matteo Grella 2/22
pure Go: ➢ Go module ➢ CLI mode / Library mode ➢ Static-linked binary (no burden of Deep Learning frameworks) ➢ Suitable for embedded systems ➢ Docker image as small as 20 MB ➢ Support for both training and inference of DL models 4/22
Importer` 2. Use the tool to: a. Download pre-trained PyTorch models from the Hugging Face Model Hub b. Convert them into spaGO format Neural Model Importer 1. #!/usr/bin/env bash 2. set -e 3. models_path=${1:-.} 4. mkdir -p "$models_path" 5. 6. model_names=( 7. # Masked Language Model 8. 'bert-base-cased' 9. # Question Answering 10. 'deepset/bert-base-cased-squad2' 11. # Natural Language Inference 12. 'valhalla/distilbart-mnli-12-3/' 13. # Language-agnostic BERT Sentence Embedding 14. 'pvl/labse_bert' 15. # Machine Translation 16. 'Helsinki-NLP/opus-mt-it-en' 17. ) 18. 19. for model_name in "${model_names[@]}"; do 20. hf-importer --repo "$models_path" --model "$model_name" 21. done 12/22 ➢ Compatible with and PyTorch pre-trained models (GoPickle)
3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/charlm" 5. ) 6. 7. func main() { 8. model, err := charlm.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. generator := charlm.NewGenerator(model, 15. charlm.GeneratorConfig{ 16. MaxCharacters: 300, StopAtEOS: true, Temperature: 0.4, 17. }, 18. ) 19. 20. fn := func(text string) { 21. result, _ := generator.GenerateText(text) 22. fmt.Printf("%s\n", result) 23. } 24. forEachInput(os.Stdin, fn) 25. } > Italy Italy , a country that has been seen as a partner and the only country to have a population . > I really enjoy this I really enjoy this beautiful thing . > I am very sad for I am very sad for the course of the season , where the results were dead . 15/22
3. import ( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bert" 5. ) 6. 7. func main() { 8. model, err := bert.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. fn := func(text string) { 15. result := model.PredictMLM(text + " ") 16. for _, token := range result { 17. text = strings.Replace(text, 18. "[MASK]", green(token.Text), 1) 19. } 20. fmt.Printf("%s\n\n", text) 21. } 22. forEachInput(os.Stdin, fn) 23. } > I'm so [MASK] to talk about this topic! I'm so excited to talk about this topic! > The [MASK] of this neural [MASK] is impressive. The efficiency of this neural network is impressive. > [MASK] is a programming language Python is a programming language > Berlin is the capital of [MASK] . Berlin is the capital of Germany . > [MASK] is the capital of Germany. Frankfurt is the capital of Germany. 16/22
( 4. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bert" 5. ) 6. 7. func main() { 8. model, err := bert.LoadModel(os.Args[1]) 9. if err != nil { 10. log.Fatal(err) 11. } 12. defer model.Close() 13. 14. paragraph := readFile(os.Args[2]) 15. 16. fn := func(text string) { 17. result := model.Answer(text, paragraph) 18. if result != nil && result[0].Confidence < 0.5 { 19. fmt.Print("Sorry, I'm not sure.\n\n") 20. return 21. } 22. for i, answer := range result { 23. fmt.Printf("%d. %s [%.2f]\n", 24. i, answer.Text, answer.Confidence) 25. } 26. fmt.Println() 27. } 28. forEachInput(os.Stdin, fn) 29. } > What is Go? 0. a programming language [0.93] > When Go was created? 0. 2007 [0.96] > What is the purpose of Go? Sorry, I'm not sure. > Why Go was created? 0. to address criticism of other languages [0.53] > Who invented Go? 0. Robert Griesemer, Rob Pike, and Ken Thompson [0.96] > Where was Robert working when he created Go? 0. Google [1.00] Go is a programming language designed at Google in 2007 by Robert Griesemer, Rob Pike, and Ken Thompson to address criticism of other languages. 17/22
3. import ( 4. "github.com/nlpodyssey/spago/pkg/mat32" 5. "github.com/nlpodyssey/spago/pkg/nlp/transformers/bert" 6. ) 7. 8. func main() { 9. model, err := bert.LoadModel(os.Args[1]) 10. if err != nil { 11. log.Fatal(err) 12. } 13. defer model.Close() 14. 15. vectorize := func(text string) []float32 { 16. vector, err := model.Vectorize(text, bert.ClsToken) 17. if err != nil { 18. log.Fatal(err) 19. } 20. return vector.(*mat32.Dense).Normalize2().Data() 21. } 22. 23. fn := func(text string) { 24. text1, text2 := splitByPipe(text) 25. score := dotProduct(vectorize(text1), vectorize(text2)) 26. fmt.Printf("Similarity: %s\n", colorize(score)) 27. } 28. forEachInput(os.Stdin, fn) 29. } > I love my dog | I enjoy my puppy Similarity: 0.81 > I love my dog | I love playing guitar Similarity: 0.54 > I love my dog | Amo il mio cane Similarity: 0.94 > I love my dog | Ich liebe meinen Hund Similarity: 0.93 > I love my dog | Я люблю свою собаку Similarity: 0.93 19/22
{ 2. model, err := bert.LoadModel(os.Args[1]) 3. if err != nil { 4. log.Fatal(err) 5. } 6. defer model.Close() 7. 8. vectorize := func(text string) []float32 { 9. vector, err := model.Vectorize(text, bert.ClsToken) 10. if err != nil { 11. log.Fatal(err) 12. } 13. return vector.(*mat32.Dense).Normalize2().Data() 14. } 15. 16. sentences := getSentencesFromFile(os.Args[2]) 17. vectors := make([][]float32, len(sentences)) 18. for i, sentence := range sentences { 19. vectors[i] = vectorize(sentence) // can be concurrent 20. } 21. 22. fn := func(text string) { 23. hits := rankBySimilarity(vectors, vectorize(text)) 24. for i, item := range limit(hits, 5) { 25. fmt.Printf("%d. %s [%s]\n", 26. i, sentences[item.id], colorize(item.score)) 27. } 28. fmt.Println() 29. } 30. forEachInput(os.Stdin, fn) 31. } > I am very satisfied with my phone. 0. I really like my phone. [0.82] 1. I love my wife very much. [0.56] 2. I'm looking forward to dining with you. [0.39] 3. Did you get a new car? [0.22] 4. How old are you? [0.12] > ¿Qué edad tienes? 0. How old are you? [0.86] 1. Did you get a new car? [0.48] 2. Will it snow tomorrow? [0.37] 3. I love my wife very much. [0.22] 4. I really like my phone. [0.20] > Hai una macchina nuova? 0. Did you get a new car? [0.88] 1. Will it snow tomorrow? [0.46] 2. How old are you? [0.42] 3. I really like my phone. [0.33] 4. I love my wife very much. [0.26] 20/22
➔ Roadmap: ◆ Include other SOTA models (e.g. Google T5) ◆ Optimize like mainstream DL frameworks ◆ Quantization (float32 → int) ◆ Make GPU/TPU-friendly (Gorgonia Tensors) ★ Join the project ;) https://github.com/nlpodyssey/spago/ Conclusion and Future Works 22/22