Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improving Performance with Parallel Programming

Improving Performance with Parallel Programming

Further reading

Clojure Data Analysis Cookbook(2nd) chapter4

Nakamura, Ryotaro

April 18, 2017
Tweet

More Decks by Nakamura, Ryotaro

Other Decks in Programming

Transcript

  1. Agenda • Parallelism and Concurrency What’s the difference? • Improving

    perforamce ◦ Running example ◦ Parallel programming ▪ Parallelizing processing with pmap ▪ Choosing the optimal number of threads ▪ Reducers • Removing wasteful intermediate results • Parallelizing processing with fold ◦ Type hints 2
  2. Parallelism and Concurrency: What’s the Differrence? 3 Clay Breshears, The

    Art of Concurrency: A Thread Monkey’s Guide to Writing Parallel Applications Parallel is a subset of concurrent Parallel in progress at the same time executed simultaneously Are multi cores required? task1 task1 task2 task2 Two or more actions can be .. Execution example Yes No Concurrent resumed task1
  3. Running example 4 WC counts the number occurrence of each

    word in a given input set (defn wc [words] (reduce #(assoc %1 %2 (inc (get %1 %2 0))) {} words)) (defn gen-az [] (lazy-seq (cons (str (char (+ (int \a) (rand-int 26)))) (gen-az)))) (wc (take 3 (gen-az))) ; => {“a” 2 “b” 1}
  4. Agenda • Parallelism and Concurrency What’s the difference? • Improving

    perforamce ◦ Running example ◦ Parallel programming ▪ Parallelizing processing with pmap ▪ Choosing the optimal number of threads ▪ Reducers • Removing wasteful intermediate results • Parallelizing processing with fold ◦ Type hints 5
  5. clojure.core/pmap 6 Like map, except a given function is applied

    in parallel (defn wc-p [lst chunk-num] (let [parts (partition-all (int (/ (count lst) chunk-num)) lst)] (apply merge-with + (pmap wc parts)))) (wc-p (take 10 (gen-az)) 2)
  6. Context switche overhead 7 Context switches are more frequent in

    applications with many threads, and have significant costs wc Threads Execution time in second 1 1,000,000 wc-p 1.059627 4.964622 Length of the arument 1,000,000 Execution time, MacBook Pro (Retina, 13-inch, Early 2015) 1,000,000
  7. Simulated annealing 8 A probalistic method for approximating the global

    minimum of a cost function cost State Simulated annealing
  8. Generic annealing function 9 (defn annealing [init max-iter neighbor-fn cost-fn

    p-fn temp-fn] (loop [state init cost (cost-fn init) k 1 best-seq [{:state state, :cost cost}]] (if (<= k max-iter) (let [t (temp-fn (/ k max-iter)) next-state (neighbor-fn state) next-cost (cost-fn next-state) next-place {:state next-state :cost next-cost}] (if (> (p-fn cost next-cost t) (rand)) (recur next-state next-cost (inc k) (conj best-seq next-place)) (recur state cost (inc k) best-seq))) best-seq))) Eric Rochester, Clojure Data Analysis Cookbook Second Edition
  9. Function parameters 10 (defn get-neighbor; neighbor-fn [state] (max 1 (min

    20 (+ state (- (rand-int 11) 5))))) (def words (take 1000000 (gen-az))) (def get-wc-cst; cost-fn (memoize (fn [state] (-> (q/quick-benchmark (wc-p words state) {}) :mean first)))) (defn should-move; p-fn [c0 c1 t] (* t (if (< c0 c1) 0.25 1.0))) (defn get-temp [r] (- 1.0 (float r))); temp-fn Eric Rochester, Clojure Data Analysis Cookbook Second Edition
  10. Execution example 11 (annealing 10 10 get-neighbor get-wc-cst should-move get-temp)

    ;=> [{:state 10, :cost 0.6465979106666667} {:state 10, :cost 0.6465979106666667} {:state 14, :cost 0.5729814278333334} {:state 14, :cost 0.5729814278333334}] Eric Rochester, Clojure Data Analysis Cookbook Second Edition
  11. Wasteful intermediate results 12 Repeatedly allocating lists and immediately throwing

    them away is wasteful (defn wc-debug [lst] (reduce #(do (println "wc") (assoc %1 %2 (inc (get %1 %2 0)))) {} lst)) (wc-debug (map #(do (println "uc") (str/upper-case %)) (vec (take 2 (gen-az))))) ; --- ; uc ; uc ; wc ; wc ; {"H" 1, "W" 1} creating an intermediate list
  12. 13 no intermediate collections produced clojure.core.reducers Two functions are composed

    into one (defn wc-reduce [lst] (clojure.core.reducers/reduce #(do (println "wc") (assoc %1 %2 (inc (get %1 %2 0)))) {} lst)) (wc-reduce (clojure.core.reducers/map #(do (println "uc") (str/upper-case %)) (vec (take 2 (gen-az))))) ; --- ; uc ; wc ; uc ; wc ; {"D" 1, "Z" 1}
  13. 14 clojure.core.reducers/fold fold implements parallel reduce and combine (fold combinef

    reducef coll) • the collection is paritioned into groups • combinef ◦ associative: a (bc) = (ab) c ◦ takes the result of applying reducef to each group ◦ must produce its identity value, when called with no arguments • reducef ◦ reduces each group ◦ like a function which is passed to reduce https://clojure.org/reference/reducers
  14. 15 Applying fold example WordCound using fold (defn count-words ;reducef

    ([] {}) ([freqs word] (assoc freqs word (inc (get freqs word 0))))) (defn merge-counts ;combinef ([] {}) ([& m] (apply merge-with + m))) (defn word-frequency [text] (r/fold merge-counts count-words (clojure.string/split text #"\s+"))) https://clojure.org/reference/reducers
  15. Agenda • Parallelism and Concurrency What’s the difference? • Improving

    perforamce ◦ Running example ◦ Parallel programming ▪ Parallelizing processing with pmap ▪ Choosing the optimal number of threads ▪ Reducers • Removing wasteful intermediate results • Parallelizing processing with fold ◦ Type hints 16
  16. 17 Type hints Type hints assists the compiler in avoiding

    reflections (defn wc [lst] (reduce #(assoc %1 %2 (inc (get %1 %2 0))) {} lst)) (time (wc ["a" "b" "c"])) ; "Elapsed time: 0.03914 msecs" ; {"a" 1, "b" 1, "c" 1} (defn wc2 ^PersistentArrayMap [^PersistentVector lst] (reduce #(^PersistentArrayMap assoc %1 %2 (^Integer inc (^Integer get %1 %2 0))) {} lst)) (time (wc ["a" "b" "c"])) ; "Elapsed time: 0.038082 msecs" ; {"a" 1, "b" 1, "c" 1}