Slide 1

Slide 1 text

Rapid prototyping Exploratory simulations Large scale simulations How To Succeed In Business Simulations Without Really Trying Keegan Kang 24th April 2016 Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 2

Slide 2 text

Rapid prototyping Exploratory simulations Large scale simulations What this talk is not about Writing extensive and rigorous unit tests Writing exquisite code Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 3

Slide 3 text

Rapid prototyping Exploratory simulations Large scale simulations Today’s Talk 1 Rapid prototyping 2 Exploratory simulations 3 Large scale simulations Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 4

Slide 4 text

Rapid prototyping Exploratory simulations Large scale simulations Thought of the Day It helps to think of the long term as well as the bigger picture! Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 5

Slide 5 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Univariate / Multivariate Optimization A small list of algorithms: Newton Raphson Coordinate Ascent Golden Section Expectation Maximization Neural nets Levenberg-Marquardt ⋯ Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 6

Slide 6 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Big Picture + Initialize starting point / points + While {convergence criteria not met} + Do {Update Step} + Return converged value Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 7

Slide 7 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques More Precisely... + Initialize starting point / points + While {convergence criteria not met} + Tweaks to update step + Return converged value Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 8

Slide 8 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques The Idea If the framework works for a known algorithm, framework is probably right (could compare using packages but you want the framework for own original code) Can use same framework (swap update steps) to quickly compare algorithms (more on later part of talk) But can also have minor tweaks to update steps to compare several ansatz at once (more on later part of talk) Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 9

Slide 9 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Demo 1 Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 10

Slide 10 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques How Can We Generalize This? My research: compute unbiased estimators for parameter of interest f (x). Want an estimator E[g(x)] = f (x) Usually, have E[g(x)] = af (x) + b *Could* come up with correct g(x), or have various educated guesses g(x), plug in and see how they perform Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 11

Slide 11 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques An Example: Who is here only for the free food? for each person do Generate a Bernoulli random variable B with p = 1 2 if B = 1 then answer truthfully else say “I am here for the free food” end end Expected fraction of those here for free food: Nfree food − 1 2 Ntotal 1 2 Ntotal Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 12

Slide 12 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Extension Of Problem What if we generate a Bernoulli random variable with different probabilities? What if we used some variance reduction methods? Pair generation of random variable with several “guesses” of what expectation should be, verify via simulation Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 13

Slide 13 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Scenario Code for specific case works Need to modify code for general case Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 14

Slide 14 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Demo 2 Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 15

Slide 15 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Scenario Hard to create unit tests for random outputs Could repeatedly run and take averages until error is minimal (or probability of failing is small), but time consuming. Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 16

Slide 16 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Bootstrap Example Initialize empty vector b (matrix B) For i in 1:nsims Sample w.r. from data Compute quantity of interest Update b[i] (or B[i,]) Do {relevant computation} with b (or B) Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 17

Slide 17 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Bootstrap Example Can create a function that takes in an input of random indexes Ensure this function works with fixed indexes Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 18

Slide 18 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Graphical Plot Or, plot the results out, and see if the shape / bias / MSE / etc is what you’d expect for general cases Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 19

Slide 19 text

Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization Building On Previous Code Partitioning Out The Randomness Other Techniques Other Techniques / Reminders Sanity checks (garbage in should give garbage out) Tolerance levels (“boundary conditions” at 0 are never exact) Rule of thumb: If errors take more than a minute or so to debug, add in more helper functions Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 20

Slide 20 text

Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation Of Error General Trend Of Plots Exploratory Simulations See this as convincing colleagues / people around you Improve code written for rapid prototyping to running exploratory simulations Ensure code works without any (major) propagation of error Number of repetitions not as important (yet) - but need to see general trend Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 21

Slide 21 text

Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation Of Error General Trend Of Plots Simulation Structure Depending on the type of research you are doing, would be good to have a specific simulation structure My research is on random projections, usually plot MSE vs columns of matrix / time taken vs columns of matrix Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 22

Slide 22 text

Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation Of Error General Trend Of Plots Simulation Structure Pseudocode # Data: some data # Mymethod: function that calls my method # kvec: vector of ks to plot # opt_para: list of optional parameters runSims<-function(data, mymethod, kvec, opt_para = ...){ output = mymethod(data, kvec, opt_para) # Generally # more complex plot(.....) } Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 23

Slide 23 text

Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation Of Error General Trend Of Plots Toy Example (inspired by whiteboard at Gates 416) Suppose we want to compute: yn = 1 0 xn exp{x} dx using recursion: y0 ∶= exp{1} − 1 y1 ∶= 1 yn ∶= exp{1} − ny(n − 1) Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 24

Slide 24 text

Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation Of Error General Trend Of Plots Toy Example (inspired by whiteboard at Gates 416) 5 10 15 20 0.0 0.4 0.8 Plot of integral with respect to n Value of n Value of y(n) Actual MC Integral Recursion 5 10 15 20 0.0 0.4 0.8 Plot of integral with respect to n Value of n Value of y(n) 5 10 15 20 0.0 0.4 0.8 Plot of integral with respect to n Value of n Value of y(n) 5 10 15 20 0.0 0.4 0.8 Plot of integral with respect to n Value of n Value of y(n) Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 25

Slide 25 text

Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation Of Error General Trend Of Plots Demo 3 Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 26

Slide 26 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization Large Scale Simulations Code would usually have gone through a few iterations before this stage Last step - modify code to make it run faster (usually down to experience) Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 27

Slide 27 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization An Example: Consider the following experiment: Want to estimate x 2 2 , where x ∈ Rp. Compute v = 1 √ k Rx, where R is a random matrix with k rows and p columns. ( v 2 2 is an unbiased estimator of x 2 2 ) As k ↑, how does our error of our estimate vary? Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 28

Slide 28 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization Generic Pseudocode kvec = [2:2:100, 150:50:500, 600:100, 1000] M = zeros(nsims, length(kvec)) for iter = 1:nsims for kval = 1:length(kvec) k = kvec(kval) compute R of size p by k compute estimate of norm store estimate of norm in M end end Plot estimates Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 29

Slide 29 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization Better Pseudocode kvec = [2:2:100, 150:50:500, 600:100, 1000] M = zeros(nsims, length(kvec)) for iter = 1:nsims compute bigR of size p by 1000 for kval = 1:length(kvec) k = kvec(kval) set R to be appropriate subset of bigR compute estimate of norm store estimate of norm in M end end Plot estimates Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 30

Slide 30 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization Why Subset? Fewer generation of random numbers. If k = 1,2,3,⋯,K, generating multiple R would take O(1 + 2 + ⋯ + K) = O(K2 ) of time Generating one big R would take O(K) worth of time. Not just in random projections - say MCMC algorithms (deciding thinning factor, burn-in period, etc) Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 31

Slide 31 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization Demo 4 Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 32

Slide 32 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization Vectorization Vectorization is sometimes not easy to spot (but can be done). Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 33

Slide 33 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization Example: K nearest neighbors Training set: Xtrain,Ytrain Testing set: Xtest,Ytest For each x ∈ Xtest, compute Euclidean distance between all x′ ∈ Xtrain. Look at mode of labels for top k neighbors How can we do this computation? Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 34

Slide 34 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization Example: K nearest neighbors Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 35

Slide 35 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization Demo 5 Keegan Kang How To Succeed In Business Simulations Without Really Trying

Slide 36

Slide 36 text

Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data Vectorization If time permits Simplify equations where possible (likelihood functions, sufficient statistics, factoring out, etc) Some computations are not essential. Eg, no point in rescaling scaled data with constant scaling factor for KNN (see Demo 5) Keegan Kang How To Succeed In Business Simulations Without Really Trying