Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Debugging and Testing Scientific Codes

Debugging and Testing Scientific Codes

Presentation courtesy of Keegan Kang.

Running simulations to verify results is important in research, and usually you get to see pretty graphs and such in computational papers. We'll cover some tips and tricks on what goes behind this process - modularization of code ; debugging heuristics ; making preliminary plots ; optimizing large scale simulations and more.

Presented at SSW: https://cornell-ssw.github.io/meetings/2017-04-24

CUSSW Hosted

April 24, 2017
Tweet

More Decks by CUSSW Hosted

Other Decks in Technology

Transcript

  1. Rapid prototyping Exploratory simulations Large scale simulations How To Succeed

    In Business Simulations Without Really Trying Keegan Kang 24th April 2016 Keegan Kang How To Succeed In Business Simulations Without Really Trying
  2. Rapid prototyping Exploratory simulations Large scale simulations What this talk

    is not about Writing extensive and rigorous unit tests Writing exquisite code Keegan Kang How To Succeed In Business Simulations Without Really Trying
  3. Rapid prototyping Exploratory simulations Large scale simulations Today’s Talk 1

    Rapid prototyping 2 Exploratory simulations 3 Large scale simulations Keegan Kang How To Succeed In Business Simulations Without Really Trying
  4. Rapid prototyping Exploratory simulations Large scale simulations Thought of the

    Day It helps to think of the long term as well as the bigger picture! Keegan Kang How To Succeed In Business Simulations Without Really Trying
  5. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Univariate / Multivariate Optimization A small list of algorithms: Newton Raphson Coordinate Ascent Golden Section Expectation Maximization Neural nets Levenberg-Marquardt ⋯ Keegan Kang How To Succeed In Business Simulations Without Really Trying
  6. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Big Picture + Initialize starting point / points + While {convergence criteria not met} + Do {Update Step} + Return converged value Keegan Kang How To Succeed In Business Simulations Without Really Trying
  7. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques More Precisely... + Initialize starting point / points + While {convergence criteria not met} + Tweaks to update step + Return converged value Keegan Kang How To Succeed In Business Simulations Without Really Trying
  8. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques The Idea If the framework works for a known algorithm, framework is probably right (could compare using packages but you want the framework for own original code) Can use same framework (swap update steps) to quickly compare algorithms (more on later part of talk) But can also have minor tweaks to update steps to compare several ansatz at once (more on later part of talk) Keegan Kang How To Succeed In Business Simulations Without Really Trying
  9. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Demo 1 Keegan Kang How To Succeed In Business Simulations Without Really Trying
  10. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques How Can We Generalize This? My research: compute unbiased estimators for parameter of interest f (x). Want an estimator E[g(x)] = f (x) Usually, have E[g(x)] = af (x) + b *Could* come up with correct g(x), or have various educated guesses g(x), plug in and see how they perform Keegan Kang How To Succeed In Business Simulations Without Really Trying
  11. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques An Example: Who is here only for the free food? for each person do Generate a Bernoulli random variable B with p = 1 2 if B = 1 then answer truthfully else say “I am here for the free food” end end Expected fraction of those here for free food: Nfree food − 1 2 Ntotal 1 2 Ntotal Keegan Kang How To Succeed In Business Simulations Without Really Trying
  12. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Extension Of Problem What if we generate a Bernoulli random variable with different probabilities? What if we used some variance reduction methods? Pair generation of random variable with several “guesses” of what expectation should be, verify via simulation Keegan Kang How To Succeed In Business Simulations Without Really Trying
  13. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Scenario Code for specific case works Need to modify code for general case Keegan Kang How To Succeed In Business Simulations Without Really Trying
  14. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Demo 2 Keegan Kang How To Succeed In Business Simulations Without Really Trying
  15. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Scenario Hard to create unit tests for random outputs Could repeatedly run and take averages until error is minimal (or probability of failing is small), but time consuming. Keegan Kang How To Succeed In Business Simulations Without Really Trying
  16. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Bootstrap Example Initialize empty vector b (matrix B) For i in 1:nsims Sample w.r. from data Compute quantity of interest Update b[i] (or B[i,]) Do {relevant computation} with b (or B) Keegan Kang How To Succeed In Business Simulations Without Really Trying
  17. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Bootstrap Example Can create a function that takes in an input of random indexes Ensure this function works with fixed indexes Keegan Kang How To Succeed In Business Simulations Without Really Trying
  18. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Graphical Plot Or, plot the results out, and see if the shape / bias / MSE / etc is what you’d expect for general cases Keegan Kang How To Succeed In Business Simulations Without Really Trying
  19. Rapid prototyping Exploratory simulations Large scale simulations Partitioning / Modularization

    Building On Previous Code Partitioning Out The Randomness Other Techniques Other Techniques / Reminders Sanity checks (garbage in should give garbage out) Tolerance levels (“boundary conditions” at 0 are never exact) Rule of thumb: If errors take more than a minute or so to debug, add in more helper functions Keegan Kang How To Succeed In Business Simulations Without Really Trying
  20. Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation

    Of Error General Trend Of Plots Exploratory Simulations See this as convincing colleagues / people around you Improve code written for rapid prototyping to running exploratory simulations Ensure code works without any (major) propagation of error Number of repetitions not as important (yet) - but need to see general trend Keegan Kang How To Succeed In Business Simulations Without Really Trying
  21. Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation

    Of Error General Trend Of Plots Simulation Structure Depending on the type of research you are doing, would be good to have a specific simulation structure My research is on random projections, usually plot MSE vs columns of matrix / time taken vs columns of matrix Keegan Kang How To Succeed In Business Simulations Without Really Trying
  22. Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation

    Of Error General Trend Of Plots Simulation Structure Pseudocode # Data: some data # Mymethod: function that calls my method # kvec: vector of ks to plot # opt_para: list of optional parameters runSims<-function(data, mymethod, kvec, opt_para = ...){ output = mymethod(data, kvec, opt_para) # Generally # more complex plot(.....) } Keegan Kang How To Succeed In Business Simulations Without Really Trying
  23. Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation

    Of Error General Trend Of Plots Toy Example (inspired by whiteboard at Gates 416) Suppose we want to compute: yn = 1 0 xn exp{x} dx using recursion: y0 ∶= exp{1} − 1 y1 ∶= 1 yn ∶= exp{1} − ny(n − 1) Keegan Kang How To Succeed In Business Simulations Without Really Trying
  24. Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation

    Of Error General Trend Of Plots Toy Example (inspired by whiteboard at Gates 416) 5 10 15 20 0.0 0.4 0.8 Plot of integral with respect to n Value of n Value of y(n) Actual MC Integral Recursion 5 10 15 20 0.0 0.4 0.8 Plot of integral with respect to n Value of n Value of y(n) 5 10 15 20 0.0 0.4 0.8 Plot of integral with respect to n Value of n Value of y(n) 5 10 15 20 0.0 0.4 0.8 Plot of integral with respect to n Value of n Value of y(n) Keegan Kang How To Succeed In Business Simulations Without Really Trying
  25. Rapid prototyping Exploratory simulations Large scale simulations (ensuring no) Propagation

    Of Error General Trend Of Plots Demo 3 Keegan Kang How To Succeed In Business Simulations Without Really Trying
  26. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization Large Scale Simulations Code would usually have gone through a few iterations before this stage Last step - modify code to make it run faster (usually down to experience) Keegan Kang How To Succeed In Business Simulations Without Really Trying
  27. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization An Example: Consider the following experiment: Want to estimate x 2 2 , where x ∈ Rp. Compute v = 1 √ k Rx, where R is a random matrix with k rows and p columns. ( v 2 2 is an unbiased estimator of x 2 2 ) As k ↑, how does our error of our estimate vary? Keegan Kang How To Succeed In Business Simulations Without Really Trying
  28. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization Generic Pseudocode kvec = [2:2:100, 150:50:500, 600:100, 1000] M = zeros(nsims, length(kvec)) for iter = 1:nsims for kval = 1:length(kvec) k = kvec(kval) compute R of size p by k compute estimate of norm store estimate of norm in M end end Plot estimates Keegan Kang How To Succeed In Business Simulations Without Really Trying
  29. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization Better Pseudocode kvec = [2:2:100, 150:50:500, 600:100, 1000] M = zeros(nsims, length(kvec)) for iter = 1:nsims compute bigR of size p by 1000 for kval = 1:length(kvec) k = kvec(kval) set R to be appropriate subset of bigR compute estimate of norm store estimate of norm in M end end Plot estimates Keegan Kang How To Succeed In Business Simulations Without Really Trying
  30. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization Why Subset? Fewer generation of random numbers. If k = 1,2,3,⋯,K, generating multiple R would take O(1 + 2 + ⋯ + K) = O(K2 ) of time Generating one big R would take O(K) worth of time. Not just in random projections - say MCMC algorithms (deciding thinning factor, burn-in period, etc) Keegan Kang How To Succeed In Business Simulations Without Really Trying
  31. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization Demo 4 Keegan Kang How To Succeed In Business Simulations Without Really Trying
  32. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization Vectorization Vectorization is sometimes not easy to spot (but can be done). Keegan Kang How To Succeed In Business Simulations Without Really Trying
  33. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization Example: K nearest neighbors Training set: Xtrain,Ytrain Testing set: Xtest,Ytest For each x ∈ Xtest, compute Euclidean distance between all x′ ∈ Xtrain. Look at mode of labels for top k neighbors How can we do this computation? Keegan Kang How To Succeed In Business Simulations Without Really Trying
  34. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization Example: K nearest neighbors Keegan Kang How To Succeed In Business Simulations Without Really Trying
  35. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization Demo 5 Keegan Kang How To Succeed In Business Simulations Without Really Trying
  36. Rapid prototyping Exploratory simulations Large scale simulations Subsetting Of Data

    Vectorization If time permits Simplify equations where possible (likelihood functions, sufficient statistics, factoring out, etc) Some computations are not essential. Eg, no point in rescaling scaled data with constant scaling factor for KNN (see Demo 5) Keegan Kang How To Succeed In Business Simulations Without Really Trying