Simulating Cassandra Cluster Sizes
disclaimer: model is simplified and numbers are made up!
1 node can handle 10K requests
Latency is normally distributed
with mean of 20ms
standard deviation of 5ms
“Extra” requests add overhead exponentially
Anglican
Slide 25
Slide 25 text
(def base-requests (* 10 1000))
(defquery cluster-latency [n write-rate]
(let [per-node (/ write-rate n)
overhead (/ 1000
(if (> per-node base-requests)
(- per-node base-requests)
1))]
(predict :latency
(+ (sample (exponential overhead))
(sample (normal 20 5))))))
disclaimer: model is simplified and numbers are made up!
Simulating Cassandra Cluster Sizes
Slide 26
Slide 26 text
5 Nodes / 50K requests per second
Simulating Cassandra Cluster Sizes
Slide 27
Slide 27 text
5 Nodes / 500K requests per second
Simulating Cassandra Cluster Sizes
Subset of Clojure, compiled into CPS-style fns
Stackless language
Built-in memoisation
DSL for building sampling fns for distributions
Anglican
Slide 30
Slide 30 text
Statistiker
En statistiker er en person som jobber innen
faget statistikk.
Slide 31
Slide 31 text
Implementing gaussian
Naïve Bayes
Algorithm
Slide 32
Slide 32 text
Implementing Naïve Bayes Algorithm
Slide 33
Slide 33 text
P(blue)=
Number of Blue
Total number of objects
P(red)=
Number of Red
Total number of objects
Slide 34
Slide 34 text
P(X | blue)=
Number of Blue near X
Total number of blue
P(X | red)=
Number of Red near X
Total number of Red
Slide 35
Slide 35 text
P(blue)=
Number of Blue
Total number of objects
P(red)=
Number of Blue
Total number of objects
Model (prior)
(defn make-model
[train-data]
(let [total (->> train-data
vals
(map count)
(reduce +))]
(for [[k v] train-data]
[k {:p (/ (count v) total)
:evidence (->> v
transpose
(map (fn [v]
{:mean (mean v)
:variance (variance v)})))}])))
Slide 36
Slide 36 text
Classifier (posterior)
(defn posterior-prob
[point variance mean]
(* (/ 1 (sqrt (* 2 pi variance)))
(exp (/ (* -1 (pow (- point mean) 2))
(* 2 variance)))))
(map
(fn [point {:keys [mean variance]}]
(posterior-prob point variance mean))
model)
P(X | blue)=
Number of Blue near X
Total number of blue
P(X | red)=
Number of Red near X
Total number of Red
Slide 37
Slide 37 text
P(X | blue)=
Number of Blue near X
Total number of blue
P(X | red)=
Number of Red near X
Total number of Red
Slide 38
Slide 38 text
Implementing
Linear Regression
with Gradient Descent
Slide 39
Slide 39 text
Linear Regression with Gradient Descent
(s/defrecord GradientProblem
[^{:s ObjectiveFunction}
objective-fn
^{:s ObjectiveFunctionGradient}
objective-fn-gradient])
Slide 40
Slide 40 text
Linear Regression: Objective Function
Basically, the distance between predicted and actual Y:
(objective-function
(fn [intercept slope]
(let [f (line intercept slope)
res (->> points
(map (fn [[x y]]
(sqr (- y (f x)))))
(reduce +))]
res)))
Slide 41
Slide 41 text
Linear Regression: Objective Function
Basically, the distance between predicted and actual Y:
(objective-function-gradient
(let [factors (->> points
(map butlast)
(map #(cons 1 %)))
y (map last points)]
(fn [& point]
(let [xT (matrix/transpose factors)
m! (matrix/inverse (matrix/dot xT factors))
b (matrix/dot xT y)]
(ops/- (matrix/mmul m! b)
point)))))
Slide 42
Slide 42 text
Linear Regression: Objective Function
Slide 43
Slide 43 text
Bunch of JVM libraries available
clojure.matrix is great
clojure.match greatly helps with algos
Clojure fns are easy to test
With immutable DSs nothing goes wrong
Experience Report
Slide 44
Slide 44 text
Balagan
When `update-in` and `get-in` is not enough
Slide 45
Slide 45 text
No content
Slide 46
Slide 46 text
Nested data structures
Map-inside-vector-inside-map
Straightforward query language
Balagan
Reduce boilerplate for processing topologies
Implicit wiring between occurring parts
No changes to the base API
Attach parts of the stream for better composition
DSLs