Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance optimization with Code as Data in C...

Performance optimization with Code as Data in Clojure

Slides from the talk presented at FunctionalConf 2016 on 2016-October-14

Shantanu Kumar

October 14, 2016
Tweet

More Decks by Shantanu Kumar

Other Decks in Technology

Transcript

  1. Who am I? • Principal Engineer at Concur • Author

    of “Clojure High Performance Programming” • Open Source contributor: https://github.com/ kumarshantanu • Using Clojure since early 2009 • @kumarshantanu on Twitter
  2. Vocabulary • Profiling • Latency: Median, Average, 99 Percentile •

    Throughput: Time window, Sustained • System Load
  3. Profiling • Benchmarking • Performance metrics collection • Baseline •

    Simulating load • Sampling • Tracing (Instrumentation)
  4. Code as Data • Opportunity to construct/manipulate code • Macros

    (compile time only) • Eval • Challenges • Debugging, Stack traces • Hard to Compose • JVM’s inlining budget
  5. Macros: String concatenation • Stringer `strcat` (alternative to clojure.core/str) •

    Stringer `strdel` (alternative to clojure.string/join) • In-place `java.lang.StringBuilder` manipulation • Caveats: Macro, Not fns
  6. Macros: Formatted string • Stringer `strfmt` (alternative to clojure.core/format) •

    In-place `java.lang.StringBuilder` manipulation • Faster than Java’s `String.format(..)`! • Caveats: Compile-time format string, Only common formatting specifiers supported
  7. Not macro: Textual table • Stringer `strtbl` (instead of clojure.pprint/print-table)

    • Uses arrays and eager `StringBuilder` manipulation • All optimizations need not leverage code as data!
  8. Macros: Logging • Logging library: clojure/tools.logging • Level macros: Disabled

    levels don’t eval • Cambium (extends clojure/tools.logging) • Same as c.t.l: Disabled levels don’t eval • How it composes: Macros emitting macros
  9. Faster web routing with Eval https://github.com/ kumarshantanu/calfpath Image source: https://www.pexels.com/photo/timelapse-

    photography-of-vehicle-on-concrete-road-near-in-high- rise-building-during-nighttime-169677/
  10. Eval: Web routing • Baseline: Iteration through route data •

    Optimization: Code generation with Eval • Optimization technique: Loop unrolling* • Constraint: Ahead-of-time code generation *https://en.wikipedia.org/wiki/Loop_unrolling
  11. Latency Breakup • Latency measurement using macro • Measure only

    selected points (low overhead) • Thread-local metrics context • Field use: Instrumentation • Field use: Trigger report on threshold exceed
  12. Latency Breakup Chart | :name |:cumulative|:cumul-%|:individual|:indiv-%| :thrown?| |------------------+-----------+--------+-----------+--------+-------------------| |web.post.order |

    357.82 ms|100.00 %| 103.037 ms| 28.80 %| | | biz.item.fetch | 204.407 ms| 57.13 %| 150.723 ms| 42.12 %| | | db.item.fetch | 53.684 ms| 15.00 %| 53.684 ms| 15.00 %| | | queue.post.order| 50.376 ms| 14.08 %| 50.376 ms| 14.08 %|java.lang.Exception|
  13. Benchmarks: Hardware • Intel i7-4770 @ 3.40 GHz processor •

    Quad-core, 64-bit physical machine • L1d cache: 32K, L1i cache: 32K • L2 cache: 256K, L3 cache: 8192K • RAM: 16GB
  14. Benchmarks: Software • OS: CentOS 7.2 (stock kernel 3.10.0-327.el7.x86_64) •

    Java: Oracle JDK 1.8.0_102_b14 • JVM args: “-server -Xms2048m -Xmx2048m” • Clojure 1.8.0 • Criterium 0.4.4, Citius 0.2.3 • Stringer 0.3.0, Calfpath 0.4.0, Espejito 0.1.1