Slide 1

Slide 1 text

Learning Rust by Crafting Interpreters Mario Sangiorgio

Slide 2

Slide 2 text

This presentation I'll cover what I learnt about Rust ! Lots of nice things but by no means exhaustive! There is not enough time to talk much about interpreters ! Code snippets copied and pasted from my project.

Slide 3

Slide 3 text

Why Rust? Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety — rust-lang.org and also Rust won first place for "most loved programming language" in the Stack Overflow Developer Survey in 2016, 2017, and 2018. — Wikipedia.org

Slide 4

Slide 4 text

My development environment • Visual Studio Code • Rust Language Server (rls) • Debugger (based on lldb) • cargo from the command line • other non-Rust-specific tools All runs fine on my old laptop

Slide 5

Slide 5 text

Crafting Interpreters Ongoing work by Bob Nystrom. Describes Lox, a toy programming language, and implements • a tree-walking interpreter in Java (complete) • a bytecode VM in C (in progress) Tries to keep things as simple as possible, but shows all the code. It also has nice illustrations.

Slide 6

Slide 6 text

Lox class Cake { taste() { var adjective = "delicious"; print "The " + this.flavor + " cake is " + adjective + "!"; } } var cake = Cake(); cake.flavor = "German chocolate"; cake.taste(); // Prints "The German chocolate cake is delicious!". This is an actual example from the book

Slide 7

Slide 7 text

A tree-walk interpreter fn run(&mut self, source: &str) -> Result<(), LoxError> { // Handcrafted recursive descent parser let statements = self.scan_and_parse(source).map_err(LoxError::Input)?; let lexical_scope = self.lexical_scope_resolver .resolve_all(&statements) .map_err(LoxError::LexicalScopesResolution)?; for statement in &statements { self.interpreter .execute(&lexical_scope, &statement) .map_err(LoxError::Runtime)?; } Ok(()) }

Slide 8

Slide 8 text

Lexical scoping var a = "global"; { fun showA() { print a; } showA(); var a = "block"; showA(); } Which a is captured depends only on the text of the program.

Slide 9

Slide 9 text

A bytecode virtual machine fn run(&mut self, source: &str) -> Result<(), RunError> { // Single-pass Pratt-parser emitting LoxVm bytecode let chunk = compiler::compile(source).map_err(|_| RunError::Error)?; let stdout = stdout(); let handle = stdout.lock(); let mut writer = LineWriter::new(handle); bytecode::disassemble(&chunk, "Test", &mut writer).map_err(|_| RunError::Error)?; interpreter::trace(&chunk, &mut writer).map_err(|_| RunError::Error)?; Ok(()) } ! Still work in progress. Produces lots of debug output.

Slide 10

Slide 10 text

How to write Rust code It is important to be somehow idiomatic. Attempting to "write Java or C in Rust" often won't even compile. My process is more like: 1. read the full chapter, trying to understand the concepts 2. think about how to represent them in Rust 3. go back to the code snippets and 'translate' them

Slide 11

Slide 11 text

Expressive and for system programming pub enum Expr { Literal(Literal), Identifier(Identifier), Unary(Box), Binary(Box), ... } You have very expressive constructs (e.g. enum) but they don't hide what happens under the hood (e.g memory).

Slide 12

Slide 12 text

Values Value types imply: • They can live on the stack • RAII (Resource acquisition is initialization) style of memory management • Smart pointers are values too • mutability defined for a particular value, not for a class member

Slide 13

Slide 13 text

Ownership Follow these rules: 1. Each value in Rust has a variable that’s called its owner. 2. There can only be one owner at a time. 3. When the owner goes out of scope, the value will be dropped. They can be borrowed: • mutably - as long as the value is mutable and it happens only once; • non-mutably - can happen multiple times.

Slide 14

Slide 14 text

References impl LoxImplementation for LoxVm { fn run(&mut self, source: &str) -> Result<(), LoxError> { let chunk = compiler::compile(source).unwrap(); let stdout = stdout(); let handle = stdout.lock(); let mut writer = LineWriter::new(handle); bytecode::disassemble(&chunk, "Test", &mut writer)?; interpreter::trace(&chunk, &mut writer)?; } }

Slide 15

Slide 15 text

Lifetimes struct Vm<'a> { chunk: &'a Chunk, program_counter: usize, stack: Vec, objects: Vec, } impl<'a> Vm<'a> { fn new(chunk: &'a Chunk) -> Vm<'a> { Vm { chunk, program_counter: 0, stack: vec![], objects: vec![], } } } References should live long enough. 'a denotes a lifetime.

Slide 16

Slide 16 text

Explicit lifetimes In most cases you won't need to specify lifetimes: • if a function gets a value, it owns it and can do whatever it wants; • if a function gets a reference and only uses it. They are required when we need to show that a reference doesn't outlive the value it refers to: • reference stored in a data structure; • reference returned from a method/function.

Slide 17

Slide 17 text

Lexical lifetimes Earlier we saw what lexical scope is. Rust uses it to determine lifetime. It is safe and fast, but sometimes too restrictive. pub fn compile(text: &str) -> Result> { let mut chunk = Chunk::default(); // Value created. chunk owns it. let tokens = scan_into_iterator(text); { let parser = Parser::new(&mut chunk, tokens); // parser borrows the value. let _ = parser.parse()?; } // parser goes out of scope. Ok(chunk) // chunk can be moved } Non-lexical lifetimes almost ready (#![feature(nll)] on nightly).

Slide 18

Slide 18 text

Tip: using .clone() is okay Ideally references should be preferred to copies, but it's better to have code that works than code that doesn't compile. It's always possible to go back and remove copies once we learnt how to do.

Slide 19

Slide 19 text

Value vs reference types Lox has few different types, which behave differently: #[derive(Debug, PartialEq, Clone)] pub enum Value { Nil, Boolean(bool), Number(f64), String(String), Callable(Callable), Instance(Instance), } Rust prefers value types. Uses references only when explicitly told to.

Slide 20

Slide 20 text

Lox class instances They are reference to objects in the heap. We can have them, but we need to be explicit. #[derive(Debug, PartialEq)] pub struct _Instance { class: Rc, fields: FnvHashMap, } #[derive(Debug, PartialEq, Clone)] pub struct Instance(Rc>);

Slide 21

Slide 21 text

Compile-time vs run-time impl Instance { fn find_method(&self, property: Identifier) -> Option { let class = &self.0.borrow().class; let method = class.methods.get(&property).cloned().or_else(|| { let superclass = class.superclass.clone(); superclass.and_then(|s| s.methods.get(&property).cloned()) }); method.map(|m| m.bind(self)) } } RefCell::borrow() is checked at runtime. It might panic if something already has a mutable borrow.

Slide 22

Slide 22 text

To recap Where does it live? Can be shared? Can be mutated? Panic-free? & T Stack ✅ ❌ ✅ &mut T Stack ❌ ✅ ✅ Box Heap ❌ ✅ ✅ Rc Heap ✅ ❌ ✅ Rc> Heap ✅ ✅ ✅ Rc> Heap ✅ ✅ ❌ Each type gives different guarantees. Pay only for what you need! Other types that add the guarantees needed for multi-threading.

Slide 23

Slide 23 text

Error handling Result for errors that must be handled. fn interpret_next(&mut self) -> Result { /* ... */ match self.chunk.get(self.program_counter - 1) { OpCode::Negate => { match self.pop()? { Value::Number(n) => self.stack.push(Value::Number(-n)), _ => return Err(RuntimeError::TypeError), }; } /* ... */

Slide 24

Slide 24 text

panic! impl Parser { /// Peeks the first *valid* token in the iterator fn peek(&mut self) -> Option<&TokenWithContext> { self.skip_to_valid(); self.tokens.peek().map(|result| match result { Ok(ref token_with_context) => token_with_context, Err(_) => unreachable!("We already skipped errors"), }) } }

Slide 25

Slide 25 text

Traits pub trait LoxImplementation { fn run(&mut self, source: &str) -> Result<(), RunError>; } impl LoxImplementation for TreeWalkInterpreter { fn run(&mut self, source: &str) -> Result<(), RunError> { /* ... */ } } impl LoxImplementation for LoxVm { fn run(&mut self, source: &str) -> Result<(), RunError> { /* ... */ } }

Slide 26

Slide 26 text

Traits - dynamic dispatch pub struct Runner { lox: Box, } impl Runner { fn run_prompt(&mut self) -> Result<(), RunError> { let mut source = String::new(); loop { println!("> "); io::stdout().flush().unwrap(); let _ = io::stdin().read_line(&mut source); self.lox.run(&source).unwrap_or_else(|error| { println!("{:?}", error); io::stdout().flush().unwrap() }); source.clear(); } } }

Slide 27

Slide 27 text

Traits - dynamic dispatch • useful if we need to deal with different implementations at run-time • it has a run-time cost In my case I don't want to switch implementations at run-time. We have the option to do everything statically.

Slide 28

Slide 28 text

Traits - static dispatch pub struct Runner { lox: I, } impl Runner { fn run_prompt(&mut self) -> Result<(), RunError> { let mut source = String::new(); loop { println!("> "); io::stdout().flush().unwrap(); let _ = io::stdin().read_line(&mut source); self.lox.run(&source).unwrap_or_else(|error| { println!("{:?}", error); io::stdout().flush().unwrap() }); source.clear(); } } }

Slide 29

Slide 29 text

impl Trait What if I want to return something implementing a trait from a function? For this case only, there is some special syntax sugar. It's especially useful when returning closures or other types that can become very annoying to type. impl Trait works only if you return a single concrete type so it can statically dispatch calls to it.

Slide 30

Slide 30 text

impl Trait example impl<'a> Iterator for TokensIterator<'a> { type Item = Result; fn next(&mut self) -> Option { self.scanner.scan_next() } } pub fn scan_into_iterator<'a>( source: &'a str, ) -> impl Iterator> + 'a { TokensIterator { scanner: Scanner::initialize(source), } }

Slide 31

Slide 31 text

No boilerplate, lots of control #[derive(Debug, PartialEq, Clone)] pub enum Value { Nil, Boolean(bool), Number(f64), String(String), Callable(Callable), Instance(Instance), } The compiler will happily write code for you, as long as you ask it to. This is based on macros so it's extensible and very flexible.

Slide 32

Slide 32 text

unsafe I've never had to use it ! It does not switch the borrow checker off, it adds new features (e.g. raw pointers). Normally it's not needed but it has specific use cases: • interop, e.g. with C libraries; • interacting with hardware; • implementation of base libraries;

Slide 33

Slide 33 text

Tooling • cargo build (don't forget about --release) • cargo check only runs the type checker • cargo test runs all the tests • cargo bench runs benchmarks • cargo fmt auto-formats the source code • cargo clippy lints and style checks • cargo fix fixes the code affected by version upgrades

Slide 34

Slide 34 text

cargo.toml [package] name = "rulox" version = "0.1.0" authors = ["Mario Sangiorgio "] [dependencies] itertools = "0.5.9" fnv = "1.0.6" num-traits = "0.2" num-derive = "0.2" [dev-dependencies] proptest = "0.7.0" [profile.release] debug = true Everything else by convention.

Slide 35

Slide 35 text

Not only crates.io It's possible to depend on code not published on crates.io [dependencies] rand = { git = "https://github.com/rust-lang-nursery/rand" } bar = { git = "https://github.com/foo/bar", branch = "baz" } hello_utils = { path = "hello_utils" }

Slide 36

Slide 36 text

Travis.ci on GitHub language: rust sudo: required rust: - stable - beta - nightly matrix: allow_failures: - rust: nightly # Dependencies of kcov, used by coverage addons: apt: packages: - libcurl4-openssl-dev - libelf-dev - libdw-dev - binutils-dev - cmake sources: - kalakris-cmake cache: cargo before_script: ((cargo install cargo-travis && cargo install rustfmt) || true) script: - | cargo build && cargo test after_success: - cargo coveralls

Slide 37

Slide 37 text

Tests Just add a module in the same file with the source code. #[cfg(test)] // Compiled only in test mode mod tests { use frontend::scanner::*; #[test] fn single_token() { let (tokens, _) = scan(&"+"); assert_eq!(tokens[0].token, Token::Plus); } }

Slide 38

Slide 38 text

Doctest Or embed them in the documentation impl Chunk { /// Adds a new instruction to the chunk /// # Example /// ``` /// use rulox::vm::bytecode::*; /// let mut chunk = Chunk::default(); /// let line = 1; /// chunk.add_instruction(OpCode::Return, line); /// ``` pub fn add_instruction(&mut self, instruction: OpCode, line: Line) -> () { self.instructions.push(instruction); self.lines.push(line); } }

Slide 39

Slide 39 text

Was it a good choice? • Compared to Java: • enum saved me from lots of casting. • pattern matching is much better than using a visitor • the error handling story is much better • Compared to C it feels like writing code in easy mode: • the book implements its own data structures and enum • I am confident I didn't messed up with pointers

Slide 40

Slide 40 text

! What if you used X instead? ! • modern C++: it has a lot in common with Rust, but it doesn't enforce good practices. Error messages are way worse. • F#: Rust code can feel very functional. I would have written similar code but I wouldn't have cared as much about performance.

Slide 41

Slide 41 text

Has using Rust slowed me down? Whenever I thought in another language and wrote Rust code, yes. The more I learnt the faster I became. Release builds are a bit slow, but they're rarely needed. Error messages try to be as helpful as possible and they are really good. I don't miss having a GC. Overall, a clear ownership model leads to a better design ⭐

Slide 42

Slide 42 text

Would I use it in production? Yes. Rust feels like a great tool to build robust, reliable and fast software but: • Access to crates.io is pretty much required (rust-lang issue #44931) • I wouldn't rewrite everything in Rust just for the sake of it • It is evolving quickly, the newest features are in the unstable channel.

Slide 43

Slide 43 text

Thanks! • Rust book https:/ /doc.rust-lang.org/book/ • Crafting Interpreters http:/ /www.craftinginterpreters.com/ • My code https:/ /github.com/mariosangiorgio/rulox

Slide 44

Slide 44 text

A story about performance

Slide 45

Slide 45 text

Profiling Being based on LLVM all the normal tools* work (once you add the debug symbols to the release build). *I tried: • Xcode Instruments • FlameGraph on dtrace output

Slide 46

Slide 46 text

Tree-walk interpreter benchmark fun fib(n) { if (n < 2) return n; return fib(n - 1) + fib(n - 2); } var before = clock(); print fib(20); var after = clock(); print after - before; // Repeat for different input values My version ! faster for smaller values, jlox " faster for bigger values.

Slide 47

Slide 47 text

Some profiling My implementation: • profiled CPU usage: nothing interesting; • profiled memory usage: almost flat (~1.3MB) jlox: • the bigger the values, the more memory it consumed !

Slide 48

Slide 48 text

It's always the GC The benchmark program keep calling functions, which cause allocation and deallocation of environments (a generalization of stack frames to support closures). My interpreter was doing both actual work and memory clean up jlox was only allocating, waiting for java GC to eventually kick in. With a bit of -Xmx tuning I got the results I was looking for !

Slide 49

Slide 49 text

Property based testing

Slide 50

Slide 50 text

Property based testing proptest! { #[test] fn interpret_doesnt_crash(ref chunk in arb_chunk(10, 20)) { let _ = interpret(chunk); } } proptest! { #[test] fn trace_doesnt_crash(ref chunk in arb_chunk(10, 20)) { let mut writer = LineWriter::new(sink()); let _ = trace(chunk, &mut writer); } } Note: this is not very meaningful and it will break once I add loops to the VM !!

Slide 51

Slide 51 text

Arb values fn arb_instruction(max_offset: usize) -> BoxedStrategy { prop_oneof![ (0..max_offset).prop_map(OpCode::Constant), Just(OpCode::Return), Just(OpCode::Negate), prop_oneof![ Just(BinaryOp::Add), Just(BinaryOp::Subtract), Just(BinaryOp::Multiply), Just(BinaryOp::Divide), ].prop_map(OpCode::Binary), ].boxed() } A bit verbose, but not too bad.

Slide 52

Slide 52 text

A better plan 1. Generate random AST 2. Pretty print them (Code is already there) 3. Interpret it with the two implementations (⾠ what if they get stuck in infinite loops?) 4. Compare the result