Learning Rust by Crafting Interpreters

Learning Rust by Crafting Interpreters Mario Sangiorgio

This presentation I'll cover what I learnt about Rust !
Lots of nice things but by no means exhaustive! There is not enough time to talk much about interpreters ! Code snippets copied and pasted from my project.

Why Rust? Rust is a systems programming language that runs
blazingly fast, prevents segfaults, and guarantees thread safety — rust-lang.org and also Rust won first place for "most loved programming language" in the Stack Overflow Developer Survey in 2016, 2017, and 2018. — Wikipedia.org

My development environment • Visual Studio Code • Rust Language
Server (rls) • Debugger (based on lldb) • cargo from the command line • other non-Rust-specific tools All runs fine on my old laptop

Crafting Interpreters Ongoing work by Bob Nystrom. Describes Lox, a
toy programming language, and implements • a tree-walking interpreter in Java (complete) • a bytecode VM in C (in progress) Tries to keep things as simple as possible, but shows all the code. It also has nice illustrations.

Lox class Cake { taste() { var adjective = "delicious";
print "The " + this.ﬂavor + " cake is " + adjective + "!"; } } var cake = Cake(); cake.ﬂavor = "German chocolate"; cake.taste(); // Prints "The German chocolate cake is delicious!". This is an actual example from the book

A tree-walk interpreter fn run(&mut self, source: &str) -> Result<(),
LoxError> { // Handcrafted recursive descent parser let statements = self.scan_and_parse(source).map_err(LoxError::Input)?; let lexical_scope = self.lexical_scope_resolver .resolve_all(&statements) .map_err(LoxError::LexicalScopesResolution)?; for statement in &statements { self.interpreter .execute(&lexical_scope, &statement) .map_err(LoxError::Runtime)?; } Ok(()) }

Lexical scoping var a = "global"; { fun showA() {
print a; } showA(); var a = "block"; showA(); } Which a is captured depends only on the text of the program.

A bytecode virtual machine fn run(&mut self, source: &str) ->
Result<(), RunError> { // Single-pass Pratt-parser emitting LoxVm bytecode let chunk = compiler::compile(source).map_err(|_| RunError::Error)?; let stdout = stdout(); let handle = stdout.lock(); let mut writer = LineWriter::new(handle); bytecode::disassemble(&chunk, "Test", &mut writer).map_err(|_| RunError::Error)?; interpreter::trace(&chunk, &mut writer).map_err(|_| RunError::Error)?; Ok(()) } ! Still work in progress. Produces lots of debug output.

How to write Rust code It is important to be
somehow idiomatic. Attempting to "write Java or C in Rust" often won't even compile. My process is more like: 1. read the full chapter, trying to understand the concepts 2. think about how to represent them in Rust 3. go back to the code snippets and 'translate' them

Expressive and for system programming pub enum Expr { Literal(Literal),
Identiﬁer(Identiﬁer), Unary(Box<UnaryExpr>), Binary(Box<BinaryExpr>), ... } You have very expressive constructs (e.g. enum) but they don't hide what happens under the hood (e.g memory).

Values Value types imply: • They can live on the
stack • RAII (Resource acquisition is initialization) style of memory management • Smart pointers are values too • mutability defined for a particular value, not for a class member

Ownership Follow these rules: 1. Each value in Rust has
a variable that’s called its owner. 2. There can only be one owner at a time. 3. When the owner goes out of scope, the value will be dropped. They can be borrowed: • mutably - as long as the value is mutable and it happens only once; • non-mutably - can happen multiple times.

References impl LoxImplementation for LoxVm { fn run(&mut self, source:
&str) -> Result<(), LoxError> { let chunk = compiler::compile(source).unwrap(); let stdout = stdout(); let handle = stdout.lock(); let mut writer = LineWriter::new(handle); bytecode::disassemble(&chunk, "Test", &mut writer)?; interpreter::trace(&chunk, &mut writer)?; } }

Lifetimes struct Vm<'a> { chunk: &'a Chunk, program_counter: usize, stack:
Vec<Value>, objects: Vec<ObjectReference>, } impl<'a> Vm<'a> { fn new(chunk: &'a Chunk) -> Vm<'a> { Vm { chunk, program_counter: 0, stack: vec![], objects: vec![], } } } References should live long enough. 'a denotes a lifetime.

Explicit lifetimes In most cases you won't need to specify
lifetimes: • if a function gets a value, it owns it and can do whatever it wants; • if a function gets a reference and only uses it. They are required when we need to show that a reference doesn't outlive the value it refers to: • reference stored in a data structure; • reference returned from a method/function.

Lexical lifetimes Earlier we saw what lexical scope is. Rust
uses it to determine lifetime. It is safe and fast, but sometimes too restrictive. pub fn compile(text: &str) -> Result<Chunk, Vec<CompilationError>> { let mut chunk = Chunk::default(); // Value created. chunk owns it. let tokens = scan_into_iterator(text); { let parser = Parser::new(&mut chunk, tokens); // parser borrows the value. let _ = parser.parse()?; } // parser goes out of scope. Ok(chunk) // chunk can be moved } Non-lexical lifetimes almost ready (#![feature(nll)] on nightly).

Tip: using .clone() is okay Ideally references should be preferred
to copies, but it's better to have code that works than code that doesn't compile. It's always possible to go back and remove copies once we learnt how to do.

Value vs reference types Lox has few different types, which
behave differently: #[derive(Debug, PartialEq, Clone)] pub enum Value { Nil, Boolean(bool), Number(f64), String(String), Callable(Callable), Instance(Instance), } Rust prefers value types. Uses references only when explicitly told to.

Lox class instances They are reference to objects in the
heap. We can have them, but we need to be explicit. #[derive(Debug, PartialEq)] pub struct _Instance { class: Rc<Class>, ﬁelds: FnvHashMap<Identiﬁer, Value>, } #[derive(Debug, PartialEq, Clone)] pub struct Instance(Rc<RefCell<_Instance>>);

Compile-time vs run-time impl Instance { fn ﬁnd_method(&self, property: Identiﬁer)
-> Option<Callable> { let class = &self.0.borrow().class; let method = class.methods.get(&property).cloned().or_else(|| { let superclass = class.superclass.clone(); superclass.and_then(|s| s.methods.get(&property).cloned()) }); method.map(|m| m.bind(self)) } } RefCell::borrow() is checked at runtime. It might panic if something already has a mutable borrow.

To recap Where does it live? Can be shared? Can
be mutated? Panic-free? & T Stack ✅ ❌ ✅ &mut T Stack ❌ ✅ ✅ Box<T> Heap ❌ ✅ ✅ Rc<T> Heap ✅ ❌ ✅ Rc<Cell<T>> Heap ✅ ✅ ✅ Rc<RefCell<T>> Heap ✅ ✅ ❌ Each type gives different guarantees. Pay only for what you need! Other types that add the guarantees needed for multi-threading.

Error handling Result<T, E> for errors that must be handled.
fn interpret_next(&mut self) -> Result<bool, RuntimeError> { /* ... */ match self.chunk.get(self.program_counter - 1) { OpCode::Negate => { match self.pop()? { Value::Number(n) => self.stack.push(Value::Number(-n)), _ => return Err(RuntimeError::TypeError), }; } /* ... */

panic! impl Parser { /// Peeks the ﬁrst *valid* token
in the iterator fn peek(&mut self) -> Option<&TokenWithContext> { self.skip_to_valid(); self.tokens.peek().map(|result| match result { Ok(ref token_with_context) => token_with_context, Err(_) => unreachable!("We already skipped errors"), }) } }

Traits pub trait LoxImplementation { fn run(&mut self, source: &str)
-> Result<(), RunError>; } impl LoxImplementation for TreeWalkInterpreter { fn run(&mut self, source: &str) -> Result<(), RunError> { /* ... */ } } impl LoxImplementation for LoxVm { fn run(&mut self, source: &str) -> Result<(), RunError> { /* ... */ } }

Traits - dynamic dispatch pub struct Runner { lox: Box<dyn
LoxImplementation>, } impl Runner { fn run_prompt(&mut self) -> Result<(), RunError> { let mut source = String::new(); loop { println!("> "); io::stdout().ﬂush().unwrap(); let _ = io::stdin().read_line(&mut source); self.lox.run(&source).unwrap_or_else(|error| { println!("{:?}", error); io::stdout().ﬂush().unwrap() }); source.clear(); } } }

Traits - dynamic dispatch • useful if we need to
deal with different implementations at run-time • it has a run-time cost In my case I don't want to switch implementations at run-time. We have the option to do everything statically.

Traits - static dispatch pub struct Runner<I: LoxImplementation> { lox:
I, } impl<I: LoxImplementation> Runner<I> { fn run_prompt(&mut self) -> Result<(), RunError> { let mut source = String::new(); loop { println!("> "); io::stdout().ﬂush().unwrap(); let _ = io::stdin().read_line(&mut source); self.lox.run(&source).unwrap_or_else(|error| { println!("{:?}", error); io::stdout().ﬂush().unwrap() }); source.clear(); } } }

impl Trait What if I want to return something implementing
a trait from a function? For this case only, there is some special syntax sugar. It's especially useful when returning closures or other types that can become very annoying to type. impl Trait works only if you return a single concrete type so it can statically dispatch calls to it.

impl Trait example impl<'a> Iterator for TokensIterator<'a> { type Item
= Result<TokenWithContext, ScannerError>; fn next(&mut self) -> Option<Self::Item> { self.scanner.scan_next() } } pub fn scan_into_iterator<'a>( source: &'a str, ) -> impl Iterator<Item = Result<TokenWithContext, ScannerError>> + 'a { TokensIterator { scanner: Scanner::initialize(source), } }

No boilerplate, lots of control #[derive(Debug, PartialEq, Clone)] pub enum
Value { Nil, Boolean(bool), Number(f64), String(String), Callable(Callable), Instance(Instance), } The compiler will happily write code for you, as long as you ask it to. This is based on macros so it's extensible and very flexible.

unsafe I've never had to use it ! It does
not switch the borrow checker off, it adds new features (e.g. raw pointers). Normally it's not needed but it has specific use cases: • interop, e.g. with C libraries; • interacting with hardware; • implementation of base libraries;

Tooling • cargo build (don't forget about --release) • cargo
check only runs the type checker • cargo test runs all the tests • cargo bench runs benchmarks • cargo fmt auto-formats the source code • cargo clippy lints and style checks • cargo ﬁx fixes the code affected by version upgrades

cargo.toml [package] name = "rulox" version = "0.1.0" authors =
["Mario Sangiorgio <[email protected]>"] [dependencies] itertools = "0.5.9" fnv = "1.0.6" num-traits = "0.2" num-derive = "0.2" [dev-dependencies] proptest = "0.7.0" [proﬁle.release] debug = true Everything else by convention.

Not only crates.io It's possible to depend on code not
published on crates.io [dependencies] rand = { git = "https://github.com/rust-lang-nursery/rand" } bar = { git = "https://github.com/foo/bar", branch = "baz" } hello_utils = { path = "hello_utils" }

Travis.ci on GitHub language: rust sudo: required rust: - stable
- beta - nightly matrix: allow_failures: - rust: nightly # Dependencies of kcov, used by coverage addons: apt: packages: - libcurl4-openssl-dev - libelf-dev - libdw-dev - binutils-dev - cmake sources: - kalakris-cmake cache: cargo before_script: ((cargo install cargo-travis && cargo install rustfmt) || true) script: - | cargo build && cargo test after_success: - cargo coveralls

Tests Just add a module in the same file with
the source code. #[cfg(test)] // Compiled only in test mode mod tests { use frontend::scanner::*; #[test] fn single_token() { let (tokens, _) = scan(&"+"); assert_eq!(tokens[0].token, Token::Plus); } }

Doctest Or embed them in the documentation impl Chunk {
/// Adds a new instruction to the chunk /// # Example /// ``` /// use rulox::vm::bytecode::*; /// let mut chunk = Chunk::default(); /// let line = 1; /// chunk.add_instruction(OpCode::Return, line); /// ``` pub fn add_instruction(&mut self, instruction: OpCode, line: Line) -> () { self.instructions.push(instruction); self.lines.push(line); } }

Was it a good choice? • Compared to Java: •
enum saved me from lots of casting. • pattern matching is much better than using a visitor • the error handling story is much better • Compared to C it feels like writing code in easy mode: • the book implements its own data structures and enum • I am confident I didn't messed up with pointers

! What if you used X instead? ! • modern
C++: it has a lot in common with Rust, but it doesn't enforce good practices. Error messages are way worse. • F#: Rust code can feel very functional. I would have written similar code but I wouldn't have cared as much about performance.

Has using Rust slowed me down? Whenever I thought in
another language and wrote Rust code, yes. The more I learnt the faster I became. Release builds are a bit slow, but they're rarely needed. Error messages try to be as helpful as possible and they are really good. I don't miss having a GC. Overall, a clear ownership model leads to a better design ⭐

Would I use it in production? Yes. Rust feels like
a great tool to build robust, reliable and fast software but: • Access to crates.io is pretty much required (rust-lang issue #44931) • I wouldn't rewrite everything in Rust just for the sake of it • It is evolving quickly, the newest features are in the unstable channel.

Thanks! • Rust book https:/ /doc.rust-lang.org/book/ • Crafting Interpreters http:/
/www.craftinginterpreters.com/ • My code https:/ /github.com/mariosangiorgio/rulox

A story about performance

Profiling Being based on LLVM all the normal tools* work
(once you add the debug symbols to the release build). *I tried: • Xcode Instruments • FlameGraph on dtrace output

Tree-walk interpreter benchmark fun fib(n) { if (n < 2)
return n; return fib(n - 1) + fib(n - 2); } var before = clock(); print fib(20); var after = clock(); print after - before; // Repeat for different input values My version ! faster for smaller values, jlox " faster for bigger values.

Some profiling My implementation: • profiled CPU usage: nothing interesting;
• profiled memory usage: almost flat (~1.3MB) jlox: • the bigger the values, the more memory it consumed !

It's always the GC The benchmark program keep calling functions,
which cause allocation and deallocation of environments (a generalization of stack frames to support closures). My interpreter was doing both actual work and memory clean up jlox was only allocating, waiting for java GC to eventually kick in. With a bit of -Xmx tuning I got the results I was looking for !

Property based testing

Property based testing proptest! { #[test] fn interpret_doesnt_crash(ref chunk in
arb_chunk(10, 20)) { let _ = interpret(chunk); } } proptest! { #[test] fn trace_doesnt_crash(ref chunk in arb_chunk(10, 20)) { let mut writer = LineWriter::new(sink()); let _ = trace(chunk, &mut writer); } } Note: this is not very meaningful and it will break once I add loops to the VM !!

Arb values fn arb_instruction(max_offset: usize) -> BoxedStrategy<OpCode> { prop_oneof![ (0..max_offset).prop_map(OpCode::Constant),
Just(OpCode::Return), Just(OpCode::Negate), prop_oneof![ Just(BinaryOp::Add), Just(BinaryOp::Subtract), Just(BinaryOp::Multiply), Just(BinaryOp::Divide), ].prop_map(OpCode::Binary), ].boxed() } A bit verbose, but not too bad.

A better plan 1. Generate random AST 2. Pretty print
them (Code is already there) 3. Interpret it with the two implementations (⾠ what if they get stuck in infinite loops?) 4. Compare the result

Learning Rust by Crafting Interpreters

Learning Rust by Crafting Interpreters

Other Decks in Programming

Featured

Transcript