Slide 1

Slide 1 text

Performance optimization with Code as Data in Clojure Shantanu Kumar (@kumarshantanu)

Slide 2

Slide 2 text

Who am I? • Principal Engineer at Concur • Author of “Clojure High Performance Programming” • Open Source contributor: https://github.com/ kumarshantanu • Using Clojure since early 2009 • @kumarshantanu on Twitter

Slide 3

Slide 3 text

Vocabulary • Profiling • Latency: Median, Average, 99 Percentile • Throughput: Time window, Sustained • System Load

Slide 4

Slide 4 text

Profiling • Benchmarking • Performance metrics collection • Baseline • Simulating load • Sampling • Tracing (Instrumentation)

Slide 5

Slide 5 text

Microbenchmarking https://github.com/hugoduncan/criterium

Slide 6

Slide 6 text

Comparative Benchmarking https://github.com/kumarshantanu/citius Threads: 1 Threads: 40

Slide 7

Slide 7 text

Code as Data • Opportunity to construct/manipulate code • Macros (compile time only) • Eval • Challenges • Debugging, Stack traces • Hard to Compose • JVM’s inlining budget

Slide 8

Slide 8 text

Faster string concatenation with Macros https://github.com/ kumarshantanu/stringer Image source: https://www.pexels.com/photo/colorful- knitwear-wool-knitting-110876/

Slide 9

Slide 9 text

Macros: String concatenation • Stringer `strcat` (alternative to clojure.core/str) • Stringer `strdel` (alternative to clojure.string/join) • In-place `java.lang.StringBuilder` manipulation • Caveats: Macro, Not fns

Slide 10

Slide 10 text

String concatenation Threads: 40

Slide 11

Slide 11 text

Delimited string concat Threads: 40

Slide 12

Slide 12 text

Macros: Formatted string • Stringer `strfmt` (alternative to clojure.core/format) • In-place `java.lang.StringBuilder` manipulation • Faster than Java’s `String.format(..)`! • Caveats: Compile-time format string, Only common formatting specifiers supported

Slide 13

Slide 13 text

Formatted string Threads: 40

Slide 14

Slide 14 text

Not macro: Textual table • Stringer `strtbl` (instead of clojure.pprint/print-table) • Uses arrays and eager `StringBuilder` manipulation • All optimizations need not leverage code as data!

Slide 15

Slide 15 text

Textual table Threads: 40

Slide 16

Slide 16 text

High performance Logging with Macros https://github.com/clojure/ tools.logging https://github.com/ kumarshantanu/cambium Image source: https://www.pexels.com/photo/brown-and- grey-fire-wood-131051/

Slide 17

Slide 17 text

Macros: Logging • Logging library: clojure/tools.logging • Level macros: Disabled levels don’t eval • Cambium (extends clojure/tools.logging) • Same as c.t.l: Disabled levels don’t eval • How it composes: Macros emitting macros

Slide 18

Slide 18 text

Faster web routing with Eval https://github.com/ kumarshantanu/calfpath Image source: https://www.pexels.com/photo/timelapse- photography-of-vehicle-on-concrete-road-near-in-high- rise-building-during-nighttime-169677/

Slide 19

Slide 19 text

Eval: Web routing • Baseline: Iteration through route data • Optimization: Code generation with Eval • Optimization technique: Loop unrolling* • Constraint: Ahead-of-time code generation *https://en.wikipedia.org/wiki/Loop_unrolling

Slide 20

Slide 20 text

Web routing with Eval Threads: 40

Slide 21

Slide 21 text

Latency Breakup charts with Macros https://github.com/ kumarshantanu/espejito Image source: https://www.pexels.com/photo/nature- water-rocks-stream-128184/

Slide 22

Slide 22 text

Latency Breakup • Latency measurement using macro • Measure only selected points (low overhead) • Thread-local metrics context • Field use: Instrumentation • Field use: Trigger report on threshold exceed

Slide 23

Slide 23 text

Latency Breakup Chart | :name |:cumulative|:cumul-%|:individual|:indiv-%| :thrown?| |------------------+-----------+--------+-----------+--------+-------------------| |web.post.order | 357.82 ms|100.00 %| 103.037 ms| 28.80 %| | | biz.item.fetch | 204.407 ms| 57.13 %| 150.723 ms| 42.12 %| | | db.item.fetch | 53.684 ms| 15.00 %| 53.684 ms| 15.00 %| | | queue.post.order| 50.376 ms| 14.08 %| 50.376 ms| 14.08 %|java.lang.Exception|

Slide 24

Slide 24 text

Benchmarks: Hardware • Intel i7-4770 @ 3.40 GHz processor • Quad-core, 64-bit physical machine • L1d cache: 32K, L1i cache: 32K • L2 cache: 256K, L3 cache: 8192K • RAM: 16GB

Slide 25

Slide 25 text

Benchmarks: Software • OS: CentOS 7.2 (stock kernel 3.10.0-327.el7.x86_64) • Java: Oracle JDK 1.8.0_102_b14 • JVM args: “-server -Xms2048m -Xmx2048m” • Clojure 1.8.0 • Criterium 0.4.4, Citius 0.2.3 • Stringer 0.3.0, Calfpath 0.4.0, Espejito 0.1.1

Slide 26

Slide 26 text

Thank You! @kumarshantanu