Slides from the first public presentation on lowRISC, a project to produce a fully open-source SoC. This presentation was given at ORConf in Munich on Sat 11th October 2014.
What is lowRISC? ● An open-source SoC with a RISC-V CPU ○ Initially targeting 40 or 28nm ○ Released under an open, permissive license ○ Novel security features ○ Programmable I/O ○ AMBA bus ○ Performance: run Linux ‘well’ ● A Community Interest Company (i.e., we are not-for- profit) ○ Intend to manufacture the SoC in volume and produce low-cost development boards
Who are we? ● Robert Mullins - Computer Laboratory, University of Cambridge, co- founder of Raspberry Pi ● Gavin Ferris - Dreamworks, Radioscape (co-founder), Aspect Capital (former CIO) ● Alex Bradbury - Computer Laboratory, University of Cambridge and Raspberry Pi Technical advisory board: ● Krste Asanovic (UC Berkeley) ● Julius Baxter (OpenRISC) ● Bunnie Huang (Hacker) ● Dominic Rizzo (Google ATAP) ● Michael Taylor (UCSD)
lowRISC motivation ● A belief in open-source hardware ● Encourage innovation and semiconductor startups ● Research platform ● The opportunity for contributors to see their HDL used in a mass produced SoC ○ Regular tape-outs
How are we going to do this? ● Received an initial private donation ● Work with collaborators (e.g. Berkeley) ● Additional funding (e.g. research councils) ● Community ● OpenCores IP and tools ● Build the core dev team. Just advertised and filled two positions
Timeline to V1 ● Release of an initial FPGA version. Next 6 months ● Production of a test chip. End of 2015 ● Tape out of production silicon. 2016 ● Produce low cost development boards. “Raspberry Pi for grown-ups” :)
Target market and philosophy ● 100-200k boards is ~1 month of Raspberry Pi sales ● Hackers, tinkerers, researchers, the OSHW community ● Target the embedded, connected world. IoT ● Security is essential. We would be negligent to not consider how to improve on security features available in shipping processors ○ Tagged memory ○ Traditional features: RNG, crypto accelerator, encrypted off-chip memory, secure boot ● Flexible IO. Flexibility of a Zynq-like platform but in software. ○ Vendors incentivized to make low level peripherals arbitrarily different for lock-in and ‘differentiation’.
Shellcode injection via buffer overflow What happens if we pass an argument larger than the buffer? int main(int argc, char **argv) { char buffer[512]; if (argc > 1) strcpy(buffer, argv[1]); return 0; }
Shellcode injection via buffer overflow Solution: NX bit. Mark pages as being executable/not executable. All stack pages are non-executable What about overflows on the heap?
And the attacker responds with new tricks... ● Return-to-libc (overwrite RA to a handy function like system(const char*)) ● Return-Oriented Programming (ROP) and variants: JOP etc
Respond with more software countermeasures ● ASLR: randomize locations of the stack, heap, functions ● Stack canaries: put a secret value on the stack and detect if it is overwritten ● Software sandboxing (e.g. Android, iPhone, browser sandboxes)
Time for a new hardware counter- measure ● We call this class of attacks control flow hijack attacks ● All of these attacks (so far) require violating spatial memory safety, i.e. writing beyond the bounds of an object ○ More specifically, they require overwriting a code pointer Aim: protect code pointers from overwrites
Does this problem matter? Vulnerabilities in the CVE database with ‘high’ severity Source: 25 Years of Vulnerabilities: 1998-2012 - Sourcefire www. sourcefire. com/25yearsofvulns
Solution: tagged memory ● 2 tag bits per word. Storage overhead 2/64=~3% ● Tag cache logically extends width of word to 66 bits ● Tag bits copied into L2 and L1 cache lines
Protecting the return address Apply tag bits to the return address. If the buffer overflows, an exception is triggered. Do the same for VTable pointers.
But what about use-after-free? ● A temporal memory safety issue ● Problem: we can protect the vtable pointer for the lifetime of an object, but if the object is used after it was freed, the attacker could control the contents of that memory location. ● Solution: Check presence of tag bits. Augment with existing segregated allocator techniques ● A good example of the effort attackers are willing to go to: http://blog.exodusintel.com/2013/11/26/browser- weakest-byte/
Other uses of tagged memory We implement general purpose tagged memory that can be configured for use in a wide range of different scenarios: ● Infinite memory watchpoints ● Better version of traditional canaries ● Garbage collection ● Accelerate AddressSanitizer/ThreadSanitizer/MemorySanitizer ○ If larger tags are required, update shadow memory in the exception handler ● Locks on every word ● Apply tag bits to instructions to mark valid targets of indirect branches
Compatibility and implementation tasks Requirements: ● Addition of tag memory cache and widening of cache lines ● New instructions to manipulate tags ● Compiler modifications to protect and check RA and vtable pointers ● Modify memory allocator to clear tags upon free, and modify memcpy and memmove to copy tags ● Update kernel virtual memory system to persist tag bits when moving a page to secondary storage Compatibility: metadata in binaries can be used to rewrite instructions at load time
Programmable I/O motivation ● Flexibility and ability to add completely new interfaces ● Ease to program (vs programmable logic) ● Off-load work from main core ○ Do more work close to I/O ○ Combine with tagged memory for more complex security policies/checks ● Filter I/O to only wake up the main core when needed ● Avoid writing HDL for all controllers and interfaces.
Related work ● Idea goes back at least to the CDC6600 multi-threaded I/O processors (‘peripheral processor units’) ● Motorola 68332, 68302 both had configurable, programmable ‘timer units’ ● TI PRUs ● XMOS ● NXP LPC4370 (M4 and 2xM0 for peripherals) ● ...
Minion architectural considerations ● I/O ‘shim’ ○ Bit-banging pins directly would be painful. ○ Provide a small amount of configurable logic, buffers, timers, and clocks to reduce overheads ○ Support routing physical pins to different cores ● Timing/events ○ Execute out of scratchpads to help provide bounded execution time ○ Precise timing (e.g. wait for counter) ● Considering multi-threaded operation and low-latency communication with main core (e.g. FIFO links) ● Minions are not coherent between themselves, but are coherent with main processors.
Roadmap ● Test chip: Rocket x (1 or 2) + IO + memory controller + tagged memory. Reusable research BGA package from UCSD ● V1: Complete, secure embedded SoC appropriate for headless applications. ● Ultimate ambition is an SoC with a broad set of features including GPU, appropriate for a mobile phone or set-top box. V1 is a stepping stone towards that goal. What is success for a project like lowRISC?
How you can help ● Join our efforts: http://www.lowrisc.org ● Join our announcement and discussion lists ● Help direct our plans ● FPGA version to come soon Questions?