Slide 1

Slide 1 text

Data Science in Go @chewxy

Slide 2

Slide 2 text

WHY GO? Follow @chewxy on Twi/er

Slide 3

Slide 3 text

GO PROVERBS Follow @chewxy on Twi/er

Slide 4

Slide 4 text

Go Proverbs gofmt's style is no one's favourite, yet gofmt is everyone's favourite. Follow @chewxy on Twi/er

Slide 5

Slide 5 text

Go Proverbs gofmt's style is no one's favourite, yet gofmt is everyone's favourite. Clear is better than clever. Follow @chewxy on Twi/er

Slide 6

Slide 6 text

Go Proverbs gofmt's style is no one's favourite, yet gofmt is everyone's favourite. Clear is better than clever. Don't just check errors. Handle them gracefully. Follow @chewxy on Twi/er

Slide 7

Slide 7 text

THE ZEN OF PYTHON Follow @chewxy on Twi/er

Slide 8

Slide 8 text

The Zen of Python Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! Follow @chewxy on Twi/er

Slide 9

Slide 9 text

The Zen of Python Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! Follow @chewxy on Twi/er

Slide 10

Slide 10 text

The Zen of Python Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! Follow @chewxy on Twi/er

Slide 11

Slide 11 text

The Zen of Python Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! Follow @chewxy on Twi/er

Slide 12

Slide 12 text

DATA SCIENCE, BRIEFLY Follow @chewxy on Twi/er

Slide 13

Slide 13 text

Statistics S/W Eng X Data Science Follow @chewxy on Twi/er

Slide 14

Slide 14 text

Ad-hocness of work longer-lived programs shorter-lived programs Follow @chewxy on Twi/er

Slide 15

Slide 15 text

Ad-hocness of work work that exists in production exploratory work Follow @chewxy on Twi/er

Slide 16

Slide 16 text

Ad-hocness of work longer-lived programs shorter-lived programs Complexity/Relative Effort Follow @chewxy on Twi/er

Slide 17

Slide 17 text

Ad-hocness of work longer-lived programs shorter-lived programs Complexity/Relative Effort Python *Mere visual approximation. No hard data. Purely anecdotal with plenty of heuristics Follow @chewxy on Twi/er

Slide 18

Slide 18 text

Ad-hocness of work longer-lived programs shorter-lived programs Complexity/Relative Effort Python Go *Mere visual approximation. No hard data. Purely anecdotal with plenty of heuristics Follow @chewxy on Twi/er

Slide 19

Slide 19 text

Ad-hocness of work longer-lived programs shorter-lived programs Complexity/Relative Effort Python Go *Mere visual approximation. No hard data. Purely anecdotal with plenty of heuristics Haskell Follow @chewxy on Twi/er

Slide 20

Slide 20 text

Ad-hocness of work longer-lived programs shorter-lived programs Complexity/Relative Effort Python Go *Mere visual approximation. No hard data. Purely anecdotal with plenty of heuristics Most data science programs are here Follow @chewxy on Twi/er

Slide 21

Slide 21 text

"Nothing more permanent than a temporary hack" By Joy Leelawat (2016) Follow @chewxy on Twi/er

Slide 22

Slide 22 text

ROBUST DATA SCIENCE Follow @chewxy on Twi/er

Slide 23

Slide 23 text

Robust Data Science •  Good statistical understanding Follow @chewxy on Twi/er

Slide 24

Slide 24 text

Robust Data Science •  Good statistical understanding – Use the right statistical underpinnings Follow @chewxy on Twi/er

Slide 25

Slide 25 text

Robust Data Science •  Good statistical understanding – Use the right statistical underpinnings – Do it on pen and paper to check understanding Follow @chewxy on Twi/er

Slide 26

Slide 26 text

Robust Data Science •  Good statistical understanding – Use the right statistical underpinnings – Do it on pen and paper to check understanding – Topic for another day Follow @chewxy on Twi/er

Slide 27

Slide 27 text

Robust Data Science •  Good statistical understanding – Use the right statistical underpinnings – Do it on pen and paper to check understanding •  Robust software engineering Follow @chewxy on Twi/er

Slide 28

Slide 28 text

WHAT DOES A DATA SCIENTIST DO? Follow @chewxy on Twi/er

Slide 29

Slide 29 text

3% 60% 19% 9% 4% 4% 1% Building Training Sets Cleaning Data Collecting Data Statistical Analysis Refining Algorithms Other Telling people you shouldn't use pie charts *Data from Forbes Follow @chewxy on Twi/er

Slide 30

Slide 30 text

3% 60% 19% 9% 4% 4% 1% Building Training Sets Cleaning Data Collecting Data Statistical Analysis Refining Algorithms Other Telling people you shouldn't use pie charts *Data from Forbes Follow @chewxy on Twi/er

Slide 31

Slide 31 text

Robust software engineering to the rescue! Follow @chewxy on Twi/er

Slide 32

Slide 32 text

PYTHON V GO DAWN OF ROBUST Follow @chewxy on Twi/er

Slide 33

Slide 33 text

example.csv 1,testval1 2,testval2 3,testval3 *example taken from Dan Whitenack Follow @chewxy on Twi/er

Slide 34

Slide 34 text

import pandas as pd! data = pd.read_csv('examples.csv', names=['fst','snd'])! print(data['fst'].max())! f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 35

Slide 35 text

import pandas as pd! data = pd.read_csv('examples.csv', names=['fst','snd'])! print(data['fst'].max())! f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 36

Slide 36 text

$ python ex.py! 3! $ go run ex.go! 3! Follow @chewxy on Twi/er

Slide 37

Slide 37 text

example.csv 1,testval1 2,testval2 ,testval3 Follow @chewxy on Twi/er

Slide 38

Slide 38 text

$ python ex.py! 2.0! $ go run ex.go! Parse failed: strconv.Atoi: parsing "": invalid syntax! exit status 1! Follow @chewxy on Twi/er

Slide 39

Slide 39 text

$ python ex.py! 2.0! $ go run ex.go! Parse failed: strconv.Atoi: parsing "": invalid syntax! exit status 1! Follow @chewxy on Twi/er WTF? •  Suddenly a float?! •  Why does it even work??

Slide 40

Slide 40 text

The Zen of Python Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! Follow @chewxy on Twi/er

Slide 41

Slide 41 text

A Closer Look f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 42

Slide 42 text

A Closer Look f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 43

Slide 43 text

A Closer Look f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 44

Slide 44 text

A Closer Look f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 45

Slide 45 text

A Closer Look f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 46

Slide 46 text

Go Proverbs gofmt's style is no one's favourite, yet gofmt is everyone's favourite. Clear is better than clever. Don't just check errors. Handle them gracefully. Follow @chewxy on Twi/er

Slide 47

Slide 47 text

A Closer Look f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 48

Slide 48 text

A Closer Look f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! For i, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrapf(err, "Failed at %d", i)! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 49

Slide 49 text

A Closer Look f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 50

Slide 50 text

Go Proverbs gofmt's style is no one's favourite, yet gofmt is everyone's favourite. Clear is better than clever. Don't just check errors. Handle them gracefully. Make the zero value useful. Follow @chewxy on Twi/er

Slide 51

Slide 51 text

A Closer Look f, _ := os.Open("example.csv")! r := csv.NewReader(bufio.NewReader(f))! records, _ := r.ReadAll()! ! var intMax int! for _, record := range records {! intVal, err := strconv.Atoi(record[0])! if err != nil {! err = errors.Wrap(err, "Parse failed")! log.Fatal(err)! }! if intVal > intMax {! intMax = intVal! }! }! ! fmt.Println(intMax)! Follow @chewxy on Twi/er

Slide 52

Slide 52 text

ON PANDAS Follow @chewxy on Twi/er

Slide 53

Slide 53 text

On Pandas •  Pandas is great! I <3 Pandas Follow @chewxy on Twi/er

Slide 54

Slide 54 text

On Pandas •  Pandas is great! I <3 Pandas. •  Pandas makes assumptions for you. Follow @chewxy on Twi/er

Slide 55

Slide 55 text

On Pandas •  Pandas is great! I <3 Pandas. •  Pandas makes assumptions for you. •  90% of the time, the assumption works 100% of the time. Follow @chewxy on Twi/er

Slide 56

Slide 56 text

Ad-hocness of work longer-lived programs shorter-lived programs Complexity/Relative Effort Python Go *Mere visual approximation. No hard data. Purely anecdotal with plenty of heuristics Most data science programs are here Follow @chewxy on Twi/er

Slide 57

Slide 57 text

Ad-hocness of work longer-lived programs shorter-lived programs Complexity/Relative Effort Python Go *Mere visual approximation. No hard data. Purely anecdotal with plenty of heuristics Pandas+Jupyter services this area Follow @chewxy on Twi/er

Slide 58

Slide 58 text

On Pandas •  Pandas is great! I <3 Pandas. •  Pandas makes assumptions for you. •  90% of the time, the assumption works 100% of the time. •  Pandas + Jupyter = match made in heaven. Follow @chewxy on Twi/er

Slide 59

Slide 59 text

Ad-hocness of work longer-lived programs shorter-lived programs Complexity/Relative Effort Python Go *Mere visual approximation. No hard data. Purely anecdotal with plenty of heuristics What about here? Follow @chewxy on Twi/er

Slide 60

Slide 60 text

C/C++ TO THE RESCUE! Follow @chewxy on Twi/er

Slide 61

Slide 61 text

Follow @chewxy on Twi/er

Slide 62

Slide 62 text

JAVA TO THE RESCUE? Follow @chewxy on Twi/er

Slide 63

Slide 63 text

Follow @chewxy on Twi/er

Slide 64

Slide 64 text

GO TO THE RESCUE! Follow @chewxy on Twi/er

Slide 65

Slide 65 text

WHY GO? Follow @chewxy on Twi/er

Slide 66

Slide 66 text

Why Go? •  Philosophy that drives robust software Follow @chewxy on Twi/er

Slide 67

Slide 67 text

Why Go? •  Philosophy that drives robust software. •  Language that promotes mechanical sympathy. Follow @chewxy on Twi/er

Slide 68

Slide 68 text

Why Go? •  Philosophy that drives robust software. •  Language that promotes mechanical sympathy. – Data structures map closely to machine layout. Follow @chewxy on Twi/er

Slide 69

Slide 69 text

Why Go? •  Philosophy that drives robust software. •  Language that promotes mechanical sympathy. – Data structures map closely to machine layout. – As a result, fast(ish)! Follow @chewxy on Twi/er

Slide 70

Slide 70 text

Why Go? •  Philosophy that drives robust software. •  Language that promotes mechanical sympathy. Follow @chewxy on Twi/er

Slide 71

Slide 71 text

Why Go? •  Philosophy that drives robust software. •  Language that promotes mechanical sympathy. •  Right levels of abstraction. Follow @chewxy on Twi/er

Slide 72

Slide 72 text

Why Go? •  Philosophy that drives robust software. •  Language that promotes mechanical sympathy. •  Right levels of abstraction. – Encourages users to understand underlying data structures and algorithms. Follow @chewxy on Twi/er

Slide 73

Slide 73 text

Why Go? •  Philosophy that drives robust software. •  Language that promotes mechanical sympathy. •  Right levels of abstraction. – Encourages users to understand underlying data structures and algorithms. – High level enough to be productive. Follow @chewxy on Twi/er

Slide 74

Slide 74 text

USING GO FOR DATA SCIENCE Follow @chewxy on Twi/er

Slide 75

Slide 75 text

Introducing Go There are Go libraries for data science. Follow @chewxy on Twi/er

Slide 76

Slide 76 text

Introducing Go There are Go libraries for data science. •  Gonum – set of packages for numerical and scientific algorithms Follow @chewxy on Twi/er

Slide 77

Slide 77 text

Introducing Go There are Go libraries for data science. •  Gonum – set of packages for numerical and scientific algorithms •  Gophernotes – like Jupyter for Go Follow @chewxy on Twi/er

Slide 78

Slide 78 text

Introducing Go There are Go libraries for data science. •  Gonum – set of packages for numerical and scientific algorithms •  Gophernotes – like Jupyter for Go •  Gota – data frames for Go Follow @chewxy on Twi/er

Slide 79

Slide 79 text

Introducing Go There are Go libraries for data science. •  Gonum – set of packages for numerical and scientific algorithms •  Gophernotes – like Jupyter for Go •  Gota – data frames for Go •  Gorgonia* – packages for deep learning in Go * @chewxy is the author of Gorgonia Follow @chewxy on Twi/er

Slide 80

Slide 80 text

Introducing Go There are Go libraries for data science. •  Gonum – set of packages for numerical and scientific algorithms •  Gophernotes – like Jupyter for Go •  Gota – data frames for Go •  Gorgonia – packages for deep learning in Go Follow @chewxy on Twi/er

Slide 81

Slide 81 text

Gonum + Gorgonia = <3 Coming from Numpy/Scipy? Handy guide here. Follow @chewxy on Twi/er

Slide 82

Slide 82 text

Other Resources Follow @chewxy on Twi/er

Slide 83

Slide 83 text

Q&A Follow @chewxy on Twi/er

Slide 84

Slide 84 text

THE END FOLLOW @CHEWXY ON TWITTER Follow @chewxy on Twi/er