Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Babashka: a meta-circular Clojure interpreter f...

Michiel Borkent
September 21, 2023
890

Babashka: a meta-circular Clojure interpreter for the command line @ Strange Loop 2023

Babashka is a Clojure interpreter for cross platform scripting. It is available as a single binary that starts instantly. It makes Clojure a viable replacement for writing bash scripts. Babashka comes with a handful of libraries out of the box (JSON, command line parsing, etc.) and supports loading libraries from the Clojure ecosystem. The interpreter is written in a meta-circular approach, akin to Structure and Interpretation of Computer Programs. It is compiled to a single binary using GraalVM native-image which is the reason it starts fast, but also uses less memory than a JVM, Clojure's original runtime. As such, babashka brings together many exciting technologies to broaden the reach of Clojure even more. This talk explores the high level use cases of babashka, its impact on the Clojure community, its history, technical implementation details and the author's approach to open source development.

Michiel Borkent

September 21, 2023
Tweet

Transcript

  1. Clojure and me • University: functional programming Miranda, internship Common

    Lisp • CS lecturer: Java ... Clojure as a way to learn about the JVM • Clojure is Lisp on the JVM that emphasises FP • I tried other languages. Haskell, PureScript, Scala, ... I just love Clojure + can make a living from it ¯\_(ツ)_/¯ • I want to use Clojure's standard library everywhere 2
  2. Is Clojure dead (yet)? • Making a modest income from

    Clojure OSS for the past two years • Clojurists Together: supporting the OSS Clojure ecosystem • Nubank (1200+ Clojure devs), largest online bank in South America • Red Planet, Grif fi n, Penpot, Roam Research, Logseq, Pitch, Apps fl yer, Cisco, Walmart, Exoscale, Nextjournal, Metabase, ... • Once you know Clojure, you don't ask that many questions on StackOver fl ow • Never breaking changes: ideal for long lived stable projects (as is the JVM) • Babashka is a sub-community within the Clojure community with its own conference • Long live "dead" languages! 👻 3
  3. • Native Clojure scripting tool, single binary, no JVM •

    Can be used to replace “the grey areas” of bash • Easily installable via script, brew (macOS, linux), aur (linux), scoop (Windows) $ time bb - e '(+ 1 2 3)' 
 6 
 0.00s user 0.00s system 67% cpu 0.013 total 4
  4. Startup time: clj (JVM) vs bb (native-image) bb (n.image) +

    lots of deps loaded at startup JVM Clojure + lots of deps loaded at startup Vanilla JVM Clojure, no deps 5
  5. Included libs • clojure.{core, edn, java.shell, java.io, pprint, set, string,

    test, walk, zip} • clojure.tools.cli • clojure.core.async • clojure.data.csv • cheshire.core (JSON) • clojure.xml • cognitect.transit • clj-yaml • httpkit.server • babashka.http-client, babashka.process, babashka.fs • many others 🔋 🔋 10
  6. Included libs • Babashka aims to hold the promise of

    Clojure: No API changes • A script that runs now, should run forever • Included libraries should hold to this promise as well (Clojure ecosystem culture) • No removing of libs (even if they are deprecated) • Imagine unix command line tools changing their output every year... • Spec-ulation talk by Rich Hickey 🔋 🔋 11
  7. 12

  8. 13

  9. • CLI tools with instant startup! (< 10ms) • clj-kondo:

    linter and static analyzer for Clojure • jet: convert between JSON, EDN and Transit + native-image 14
  10. 15

  11. jet (jq-ish for Clojure) 
 DSL -> scripting • Added

    a query DSL to jet, a GraalVM CLI: 
 
 $ jet - - query '(map :id)' < < < '[{:id 1} {:id 2}]' 
 [1 2] • Extend this DSL to signi fi cant subset of Clojure? Or just use clojure.core/eval ...? 
 16
  12. native-image + eval Clojure compiler emits JVM bytecode but GraalVM

    native-image transforms bytecode to native "ahead of time" 17
  13. Small Clojure Interpreter (def f (sci/eval - string " #

    ( + 1 2 %)")) (f 1) ; ; = > 4 • Works on JVM / GraalVM native-image / JS • Sandboxing • Works in CLJS advanced compiled apps • Supports almost all of Clojure • https://github.com/babashka/sci 19
  14. SCI: adding libs (require '[cheshire.core :as json]) 
 
 (def

    sci - opts 
 {:namespaces 
 {'cheshire.core 
 {'generate - string json/generate - string}}}) 
 (sci/eval - string "(require '[cheshire.core :as json]) (json/generate - string {:a 1})" sci - opts) 
 ; ; = > "{\"a\" : 1}" 20
  15. SCI: meta-circular • Implements language features in terms of similar

    host language features • Self-interpreter: interpreted language = host-language • Function application can be implemented using apply • loop in SCI is implemented using loop in Clojure • Pure core functions like +, assoc, etc. can just be direct references to host functions • Side-effecting functions often re-implemented to make SCI sandboxed 21
  16. SCI's initial (naive) approach Parser 
 (string -> s-expr) Evaluator

    
 (s-expr -> result) "(+ 1 x)" 
 -> '(+ 1 x) Walk s-expression 
 
 Decide that + is a symbol that denotes function, 
 resolve '+ to clojure.core/+ 
 
 Decide that 1 is constant and resolve to itself 
 
 Decide that symbol x is a local, look up local binding in ctx 
 Finally, call + on 1 and resolved value for 'x, e.g. 2 
 '(+ 1 x) -> 3 25 Lots of work happening over and over which could be done only once... (for [x (range 100000)] 
 (+ 1 x))
  17. 26

  18. Do as much preparation as you in analysis Do as

    little as possible here 
 (avoid repeated analysis) 27
  19. Performance (time 
 (loop [val 0 cnt 10000000] 
 (if

    (pos? cnt) 
 (recur (inc val) (dec cnt)) 
 val))) 
 "Elapsed time: 575.72325 msecs" 
 10000000 • Performance was not the initial goal, later concern • Example: Loop of 10M iterations, conditional, 3 function calls • This is fast in compiled code, but SCI is interpreter (Measured on Macbook Air M1) 29
  20. SCI loop perf 16176 ms 991 ms 
 (15x faster

    than July 2020) (Measured on linux amd64 machine) 31
  21. Accessing locals https://github.com/babashka/sci/issues/416 • Hash-map is dynamically sized, 
 implementation

    easy and thread-safe • But fi xed sized arrays are 15x faster... • Need to pre-allocate + pre-calculate indexes • Copy over closed over values to each new closure 
 (thread safety) • Set local value: (assoc bindings 'a 1) -> (aset bindings 0 1) • Read a: (get bindings 'a) -> (aget bindings 0) • Read c:(get bindings 'c) -> (aget bindings 3) 
 (let [a 1, b 2] 
 (fn [x] 
 (let [c (+ a b x)] 
 (inc c))) Closed over locals: a, b Function argument: x Let binding: c 32
  22. Interop (Thread/sleep 100) • SCI classes must be con fi

    gured 
 {:classes {'java.lang.Thread Thread} 
 :imports {'Thread 'java.lang.Thread} • Graal re fl ect-con fi g.json keeps classes + metadata around for re fl ection • Methods are looked up via Java re fl ection API (cached) and invoked • Room for performance improvements by looking up method at analysis time 34
  23. Babashka pods • External binaries that expose namespaces + functions

    to bb via RPC • Moar GraalVM binaries, but can also be built in Golang, Haskell, etc. as long as they return the right data • Gives room for experimentation before including more libs • Access to other language ecosystems 36
  24. Conclusion Clojure might not be the best language for everything,

    like scripting Clojure is the best language for scripting ;) 44