Clojure for Machine Learning

Slide 1

Slide 1 text

CLOJURE FOR MACHINE LEARNING @henrygarner

Slide 2

Slide 2 text

CLOJURE FOR STATISTICAL INFERENCE & PREDICTIVE ANALYTICS WITH TRANSDUCERS AND VISUALISATION TOO!

Slide 3

Slide 3 text

CLOJURE FOR STATISTICAL INFERENCE & PREDICTIVE ANALYTICS WITH TRANSDUCERS AND VISUALISATION TOO!

Slide 4

Slide 4 text

“If you can convince an engineer they're doing machine learning, you can get them to do anything” Josh Wills, Director of Engineering at Slack

Slide 5

Slide 5 text

“Data Scientist (n.): Person who is better at statistics than any so ware engineer and better at so ware engineering than any statistician.” Josh Wills, Director of Engineering at Slack

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

#SORRYNOTSORRY

Slide 8

Slide 8 text

CONTENTS Bandit testing Inference & significance kixi.stats Regression models Goodness of fit redux Neural networks Feature learning cortex

Slide 9

Slide 9 text

http://writeandimprove.com/

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

H H T H T T H H H T H H H H H H H H H H H H H H H H H H H H

Slide 15

Slide 15 text

♥ Reagent interactive UIs thi.ng/geom-viz SVG graphs jStat JavaScript distributions

Slide 16

Slide 16 text

0 5 10 15 20 25 30 35 40 45 50 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00 Distribution over k

Slide 17

Slide 17 text

n: 25 p (x 100): 50 Parameters Distribution over k

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

n: 0 k: 0 Alpha: 1 Beta: 1 Observed Parameters Distribution over p

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

0:00 / 0:19

Slide 24

Slide 24 text

(def trials {:trial-1 {:n 10 :k 5} :trial-2 {:n 20 :k 10}} (defn bayes-bandit [trials] (let [score (fn [{:keys [n k]}] (sample-beta 1 :alpha (inc k) :beta (inc (- n k))))] (key (apply max-key (comp score val) trials))))

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

BEWARE Ensure trials are independent Test users, not visits Has the variation been seen? Assigned variations may not be active variations Don't call too early Conversion may take a day or longer. Wait and see. Not a panacea A sensible prior will stabilise early fluctuations

Slide 27

Slide 27 text

(def trials {:trial-1 {:n 10 :k 5} :trial-2 {:n 20 :k 10}} (defn bayes-bandit [trials] (let [score (fn [{:keys [n k]}] (sample-beta 1 :alpha (inc (+ k 10)) :beta (inc (+ (- n k) 40))))] (key (apply max-key (comp score val) trials))))

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

(transduce (map inc) + (range 10))

Slide 30

Slide 30 text

Init Step Complete (fn +' ([] 0) ;; init ([acc x] (+ acc x)) ;; step ([acc] acc)) ;; complete

Slide 31

Slide 31 text

KIXI.STATS https://github.com/mastodonc/kixi.stats Mean Variance Standard deviation Covariance Correlation Simple linear regression

Slide 32

Slide 32 text

"AWKWARD-SIZED DATA"

Slide 33

Slide 33 text

https://www.theguardian.com/sport/datablog/2012/aug/07/olym 2012-athletes-age-weight-height {:sport "Swimming", :age 27, :sex "M", :birth-place "Towson (USA)", :name "Michael Phelps", :bronze 0, :birth-date "6/30/1985", :gold 2, :weight 88, :silver 2, :height 193}

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

(require '[kixi.stats.core :as kixi]) (->> (data-source "athletes.txt") (transduce (map :height) kixi/mean)) ;; => 1603855/9038 (->> (data-source "athletes.txt") (transduce (map :height) kixi/standard-deviation)) ;; => 11.202506235734145

Slide 36

Slide 36 text

(require '[kixi.stats.core :as kixi]) (def rf (kixi/correlation-matrix {:height :height :weight :weight :age :age})) (->> (data-source "athletes.txt") (transduce identity rf)) ;; {[:height :weight] 0.7602753595140576, ;; [:height :age] 0.0835619870171009, ;; [:weight :height] 0.7602753595140576, ;; [:weight :age] 0.1263794369985025, ;; [:age :height] 0.0835619870171009, ;; [:age :weight] 0.1263794369985025}

Slide 37

Slide 37 text

(require '[kixi.stats.core :as kixi]) (def rf (kixi/correlation-matrix {:height :height :weight :weight :age :age})) (->> (data-source "athletes.txt") (transduce (filter swimmer?) rf)) ;; {[:height :weight] 0.8649145683086642, ;; [:height :age] 0.3011551185677323, ;; [:weight :height] 0.8649145683086642, ;; [:weight :age] 0.32150444584208426, ;; [:age :height] 0.3011551185677323, ;; [:age :weight] 0.32150444584208426}

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

y = ax + b

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

Fahrenheit = 1.8 ∗ Centigrade + 32

Slide 42

Slide 42 text

(require '[kixi.stats.core :as kixi]) (def rf (kixi/simple-linear-regression :height :weight)) (->> (data-source "athletes.txt") (transduce (filter swimmer?) rf)) ;; [-1286496024/11650283 11809306/11650283]

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

http://www.topendsports.com/athletes/swimming/spitz- mark.htm (def fy :weight) (def fx :height) (def regression (kixi/simple-linear-regression fx fy)) (def data (filter swimmer? (data-source "athletes"))) (let [[b a] (transduce identity regression data) predict (fn [x] (double (+ (* a x) b)))] (predict 185)) ;; 77.09903579166274

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

(def fy :weight) (def fx :height) (def estimate-error (kixi/standard-error-prediction fx fy 185)) (def data (filter swimmer? (data-source "athletes"))) (let [[b a] (transduce identity regression data) std-e (transduce identity estimate-error data) confidence-interval (fn [x] (let [estimate (double (+ (* a x) b))] [(- estimate (* std-e 1.94)) (+ estimate (* std-e 1.94))]))] (confidence-interval 185)) ;; [65.97046903896646 88.22760254435903]

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

[69.97 ≤ 79 ≤ 88.22]

Slide 53

Slide 53 text

[69.97 ≤ 79 ≤ 88.22] ✅

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

REDUX https://github.com/henrygarner/redux pre-step post-complete fuse + more!

Slide 57

Slide 57 text

(def rf (fuse {:mean kixi/mean :sd kixi/standard-deviation})) (transduce (map :height) rf (data-source "athletes.txt")) ;; => {:mean 1603855/9038, :sd 11.202506235734145}

Slide 58

Slide 58 text

R-SQUARE = 1 − R 2 var(e) var(Y)

Slide 59

Slide 59 text

R-SQUARE (defn residual [fy-hat fy] #(- (fy-hat %) (fy %))) (defn r-square [fy-hat fy] (pre-step kixi/variance (residual fy-hat fy)) (pre-step kixi/variance fy) )

Slide 60

Slide 60 text

R-SQUARE (defn residual [fy-hat fy] #(- (fy-hat %) (fy %))) (defn r-square [fy-hat fy] (fuse {:var-e (pre-step kixi/variance (residual fy-hat fy)) :var-y (pre-step kixi/variance fy)}) )

Slide 61

Slide 61 text

R-SQUARE (defn residual [fy-hat fy] #(- (fy-hat %) (fy %))) (defn r-square [fy-hat fy] (post-complete (fuse {:var-e (pre-step kixi/variance (residual fy-hat fy)) :var-y (pre-step kixi/variance fy)}) (fn [{:keys [var-e var-y]}] (- 1 (/ var-e var-y)))))

Slide 62

Slide 62 text

R-SQUARE (def fy :weight) (def fx :height) (def regression (kixi/simple-linear-regression fx fy)) (def data (filter swimmer? (data-source "athletes"))) (let [[b a] (transduce identity regression data) estimate (fn [x] (+ (* a x) b)) goodness-of-fit (r-square (comp estimate fx) fy)] (double (transduce identity goodness-of-fit data))) ;; => 0.748

Slide 63

Slide 63 text

θ = ( X y X T ) −1 X T (require '[clojure.core.matrix :refer [mmul transpose]] '[clojure.core.matrix.linear :refer [solve]]) (defn normal-equation [x y] (let [xt (transpose x) xtx (mmul xt x) xty (mmul xt y)] (mmul (solve xtx) xty)))

Slide 64

Slide 64 text

(defn features [& fns] (apply juxt fns)) (def fx (features (constantly 1.0) :height)) (def fy :weight) (let [coefs (normal-equation (map fx data) (map fy data)) estimate (fn [x] (mmul (transpose coefs) x)) goodness-of-fit (r-square (comp estimate fx) fy)] (transduce identity goodness-of-fit data)) ;; 0.7480772104725628

Slide 65

Slide 65 text

(defn dummy-mf [athlete] (if (= (:sex athlete) "F") 0.0 1.0)) (def fx (features (constantly 1.0) :height dummy-mf)) (def fy :weight) (let [coefs (normal-equation (map fx data) (map fy data)) estimate (fn [x] (mmul (transpose coefs) x)) goodness-of-fit (r-square (comp estimate fx) fy)] (double (transduce identity goodness-of-fit data))) ;; 0.8022246027673994

Slide 66

Slide 66 text

“Either building, or contributing to, or forming a nice Clojure-first solution for deep learning would be huge” Eric Weinstein, Clojure for Machine Learning

Slide 67

Slide 67 text

CORTEX https://github.com/thinktopic/cortex

Slide 68

Slide 68 text

LOSS FUNCTION

Slide 69

Slide 69 text

LOSS FUNCTIONS MSE Cross Entropy So max Log Likelihood So max

Slide 70

Slide 70 text

OPTIMISATION FUNCTION

Slide 71

Slide 71 text

OPTIMISATION FUNCTIONS Gradient descent Newton's method Adam Adadelta

Slide 72

Slide 72 text

(def summary-stats (fuse {:mean kixi/mean :sd kixi/standard-deviation})) (defn normalizer [& args] (let [normalize (fn [x {:keys [mean sd]}] (/ (- x mean) sd)) summarise (fn [k] [k (pre-step summary-stats k)])] (post-complete (fuse (into {} (map summarise) args)) (fn [stats] (map #(merge-with normalize % stats)))))) (def normalize (transduce identity (normalizer :height :weight) data)) (sequence (comp normalize (map fx)) data) ;; ([1.0 -1.3898594098622594 0.0] [1.0 1.2798851553621058 1.0] ...)

Slide 73

Slide 73 text

(def n-epochs 100) (def batch-size 1) (def loss (opt/mse-loss)) (def optimiser (opt/newton-optimiser))

Slide 74

Slide 74 text

(def fx (features :height dummy-mf)) (def fy (features :weight)) (let [xs (vec (sequence (comp normalize (map fx)) data)) ys (vec (sequence (comp normalize (map fy)) data)) network (layers/linear-layer 2 1) trained (net/train network optimiser loss xs ys batch-size n-epochs) predict (fn [x] (ffirst (net/run trained [x]))) goodness-of-fit (r-square (comp predict fx) (comp first fy))] (transduce normalize goodness-of-fit data)) ;; 0.8021682807900807

Slide 75

Slide 75 text

No content

Slide 76

Slide 76 text

No content

Slide 77

Slide 77 text

(defn fy [i] (cond (zero? (mod i 15)) [0.0 0.0 0.0 1.0] (zero? (mod i 5)) [0.0 0.0 1.0 0.0] (zero? (mod i 3)) [0.0 1.0 0.0 0.0] :else [1.0 0.0 0.0 0.0]))

Slide 78

Slide 78 text

(defn fx [i] (map #(if (bit-test i %) 1.0 0.0) (range 10))) (encode 4) ;; (0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0) (encode 9) ;; (1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0)

Slide 79

Slide 79 text

No content

Slide 80

Slide 80 text

= [year, yea , . . . , yea ] x f eature r 2 r 11

Slide 81

Slide 81 text

No content

Slide 82

Slide 82 text

No content

Slide 83

Slide 83 text

No content

Slide 84

Slide 84 text

(require '[cortex.nn.core :as core] '[cortex.nn.layers :as layers]) (defn create-network [] (let [network-modules [(layers/linear-layer 10 100) (layers/logistic [100]) (layers/linear-layer 100 4) (core/stack-module network-modules)))

Slide 85

Slide 85 text

No content

Slide 86

Slide 86 text

1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 FizzBuzz 16 17 Fizz 19 Buzz Fizz 22 23 Fizz Buzz 26 Fizz 28 29 FizzBuzz 31 32 Fizz 34 Buzz Fizz 37 38 Fizz Buzz 41 Fizz 43 44 FizzBuzz 46 47 Fizz 49 Buzz Fizz 52 53 Fizz Buzz 56 Fizz 58 59 FizzBuzz 61 62 Fizz 64 Buzz Fizz 67 68 Fizz Buzz 71 Fizz 73 74 FizzBuzz 76 77 Fizz 79 Buzz Fizz 82 83 Fizz Buzz 86 Fizz 88 89 FizzBuzz 91 92 Fizz 94 Buzz Fizz 97 98 Fizz Buzz

Slide 87

Slide 87 text

WE'RE DOOMED

Slide 88

Slide 88 text

(A SAMPLE OF) THINGS I SKIPPED Significance tests t-distribution Classifier evaluators Cross-validation Recurrent NNs LSTM NNs …

Slide 89

Slide 89 text

FURTHER READING Fizzbuzz in Tensorflow Examples of deeper neural networks Java Deep Learning Probability and statistics ALTA Institute http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/ http://github.com/thinktopic/cortex https://deeplearning4j.org/ https://www.chrisstucchio.com/ http://www.wiki.cl.cam.ac.uk/rowiki/NaturalLanguage/ALTA

Slide 90

Slide 90 text

IF YOU LIKED THIS… | http://cljds.com/cljds-book http://cljds.com/cljds-amzn https://github.com/clojuredatascience

Slide 91

Slide 91 text

THANKS! https://github.com/henrygarner/cljx-december-2016 Henry Garner @henrygarner