Slide 1

Slide 1 text

WHAT HAVE THE ARCHITECTS EVER DONE FOR US? EVER DONE FOR US?

Slide 2

Slide 2 text

I'm not a software engineer
 ... but I write software daily
 
 I'm not a hardware engineer?
 ... I rarely write RTL
 
 I'm actually an architect
 ... defining the instruction set - HW/SW interface SOME CONTEXT 1

Slide 3

Slide 3 text

SOFTWARE INNOVATION ARCHITECTURE INNOVATION PROFILING/
 ANALYSIS IMPLEMENTATIONS EVOLUTION ME YOU YOU YOU 2

Slide 4

Slide 4 text

ELF loader Memory CPU CPU CPU CACHES CACHES CACHES CPU DEVICES DEVICES DEVICES DEVICES CACHES I/O Terminal Entry Binary Command
 Line Args SIMULATION 3

Slide 5

Slide 5 text

Read Instruction Decode Execute Write Reg/Mem Update PC Extract Fields Read Reg/Mem core piece of functional simulators speed is important, but ...
 so is readable, extendable code
 
 challenges:
 
 large frequently accessed decode table
 
 clean interface for the many instructions forms DECODE/EXECUTE LOOP 4

Slide 6

Slide 6 text

ANONYMOUS FUNCTIONS AND CLOSURES go makes handling functions trivial
 - integral to my decode/execute loop use closures to provide a clean/uniform interface
 
 hide all of the differences within surrounding function one interface for executing any instruction form Closure Anonymous Function Anonymous Function Closure ADD EOR 5

Slide 7

Slide 7 text

BASIC BLOCKS simple interfaces make complex algorithms simple
 
 programs are mostly linear
 - until we hit a {br, jmp, ret}
 - linear chunks a.k.a "basic blocks"
 
 surprised by performance from a "clean" implementation
 - 25-50x slowdown over native // Execute a basic block
 for _, v := range bb {
 v.Executor(c)
 } 0x00400de8 .... ldr w4, [x2] #4
 0x00400dec .... movz w3, #0xa121
 0x00400df0 .... add w1, w1, #0x1
 0x00400df4 .... movk w3, #0x7, lsl #16
 0x00400df8 .... eor w4, w1, w4
 0x00400dfc .... cmp w1, w3
 0x00400e00 .... eor w19, w19, w4
 0x00400e04 .... b.ne 400de8 6

Slide 8

Slide 8 text

SIMULATING SIMD SIMD (e.g. AVX/NEON) allow a single instruction to operate on multiple elements of wide registers
 
 
 used in go runtime & stdlib for performance
 
 
 simulating them feels ugly
 - need reflect & unsafe func Get8bSlice(v []uint64) []uint8 {
 length := len(v) * 8
 b := *(*[]uint8)(unsafe.Pointer(&v))
 (*reflect.SliceHeader)(unsafe.Pointer(&b)).Cap = length
 (*reflect.SliceHeader)(unsafe.Pointer(&b)).Len = length
 return b
 } 0 127 V0 V0.2D V0.4S V0.8H V0.16B 7

Slide 9

Slide 9 text

FINALLY compile & runtime performance stdlib, tools & dep-free binaries grokkable code, months later super helpful/inclusive community @maver
 [email protected] Gopher wrench image - credit Renee French Software Hardware Architecture 8