Slide 1

Slide 1 text

Bare Metal Gophers Can you write an OS Kernel in Go? Achilleas Anagnostopoulos github.com/achilleasa Sr. Software Engineer, Geckoboard

Slide 2

Slide 2 text

Ring-3 A little bit of theory: ring-based security Ring-1 Ring-0 Normal code (userland) OS Kernel Syscall

Slide 3

Slide 3 text

Running Go applications at Ring-0 ● Why would we want to do this? ○ Performance boost for some types of applications ○ Exclusive access to (shared) resources ○ Remove some of the abstraction layers between our code and the real hardware ● What’s the easiest way to do this? ○ Lots of interest around the concept of uni-kernels ○ Bundle your Go app with a minimal OS-in-a-package ○ Running on the cloud? How about a hypervisor-aware kernel?

Slide 4

Slide 4 text

Is Go suitable for such a task? ● Arguments against ○ Go uses a GC ○ You need to re-implement the entire runtime ● Arguments in favor ○ Compile-time checks ○ Bounds-checking for slices ○ Interfaces

Slide 5

Slide 5 text

Let’s build something simple ● A simple 32-bit hello world “Kernel” in Go! ○ Target arch x86 ○ No paging ○ Minimal ASM code for low-level initialization ● Our host machine runs a 64-bit OS ○ Go toolchain supports cross-compilation out of the box ● Run on VirtualBox (or alternatively, qemu or real hardware)

Slide 6

Slide 6 text

How do we load our kernel to memory? ● We need to use a bootloader ○ GRUB2 ● How does the kernel communicate with the bootloader? ○ Kernel must define a special header that begins with a magic value ○ Header must be defined in the first 4K of the kernel image We need to tell the linker where to place each section of the binary ● We can do this with GNU ld

Slide 7

Slide 7 text

How does a linker script look like? ENTRY(_rt0_entry) SECTIONS { . = 1M; /* load kernel at 1M */ .multiboot :{ *(.multiboot_header ) } /* executable code */ .text BLOCK(4K) : ALIGN(4K) { *(.text) } /* read-only data */ .rodata BLOCK(4K) : ALIGN(4K){ *(.rodata) } /* read-write data (initialized) */ .data BLOCK(4K) : ALIGN(4K){ *(.data) } /* read-write data (not initialized) */ .bss BLOCK(4K) : ALIGN(4K){ *(COMMON) *(.bss) } } 0x00 ... ... Kernel image memory layout _rt0_entry 1 Mb

Slide 8

Slide 8 text

The kernel is loaded to memory; what’s next? ● There is no stack ● Streaming SIMD Extensions (SSE) are disabled ○ SIMD → Single Instruction; Multiple Data ○ Allows us to perform an operation to multiple values concurrently ● CPU is in 32-bit protected mode

Slide 9

Slide 9 text

Accessing memory Flat memory model 0x00 0x01 0x02 0x03 0x04 ... 4Gb pointer (uintptr) Segmented addressing 0x00 0x01 0x02 0x03 0x04 ... 4Gb gs:0x02 gs Offset 0x02 Segment register Offset

Slide 10

Slide 10 text

What happens when a Go function runs? ● A small bit of code (prologue) executes before the actual function ● Stack Growth Check Code calls foo() - Fetch pointer to current g - If SP < g.stackguard0 { runtime.GrowStack() } - Jump to foo() code type g struct { stack stack stackguard0 uintptr … } type stack struct { lo uintptr hi uintptr } $GOROOT/src/runtime/runtime2.go

Slide 11

Slide 11 text

Stack growth check: behind the scenes GOOS=linux GOARCH=386 go build objdump -d output (Intel syntax) gs:0x00 Ptr to ? Thread Control Block (TCB) [TCB - 0x04] → Ptr to g func foo() { print(“hello”) } 0808aae0 : 01 mov ecx,DWORD PTR gs:0x0 02 mov ecx,DWORD PTR [ ecx-0x4] 03 cmp esp,DWORD PTR [ ecx+0x8] 04 jbe 909ab19 (grow stack) Current g 0x00 stack.lo 0x04 stack.hi 0x08 stackguard0 What things could possibly go wrong here?

Slide 12

Slide 12 text

Bootstrap code: allocate stack and initialize g ● Reserve 16K in .bss (uninitialized data) section of the kernel image ○ Load stack pointer with the address to the end this block ● Setup gs register according to the TLS ABI ● Populate a g struct ○ Runtime package defines a g instance called g0 ○ Set g0.stack.hi / g0.stack.lo to our stack block ○ Set g0.stackguard0 = g0.stack.lo → bypass stack growth checks Now we can safely jump to the Go code

Slide 13

Slide 13 text

Overriding the Go build process ● go build cross-compiles everything for us ○ Including the Go runtime ● We must perform a separate link step ● Idea: intercept go build steps, tweak and execute them ○ -n flag to go build outputs the build script for our package

Slide 14

Slide 14 text

GOARCH=386 GOOS=linux go build -n 2>&1 | sed \ -e "1s|^|set -e\n|" \ -e "1s|^|export GOOS=linux\n|" \ -e "1s|^|export GOARCH=386\n|" \ -e "1s|^|WORK='$(BUILD_ABS_DIR)'\n|" \ -e "s|-extld|-linkmode=external -tmpdir='$(BUILD_ABS_DIR)' -extldflags='-nostdlib' -extld|g" \ | sh Let’s automate the work!

Slide 15

Slide 15 text

Linking the final kernel image ● Use GNU ld to link ○ The object files produced by nasm ○ The go.o file produced by the modified go build step ● We invoke the linker and... build/arch/x86/asm/rt0.o: In function `_rt0_entry': arch/x86/asm/rt0.s:61: undefined reference to `runtime.g0' ● Why did this happen? ○ Objcopy to the rescue! $ objcopy --globalize-symbol runtime.g0 \ $(BUILD_DIR)/go.o $(BUILD_DIR)/go.o

Slide 16

Slide 16 text

Success! $ ls -logh build/*.bin -rwxr-xr-x 1 1.0M Aug 17 11:40 build/ kernel-x86.bin

Slide 17

Slide 17 text

Screen output without an OS ● VGA 80x25 text-mode → Linear framebuffer located at 0xb8000 ‘H’ attr ‘i’ attr ... attr attr attr ... attr ... ... ... ... ... ... ... attr attr ... attr R:0, C:0 → fb + 0 2 bytes per character R:1, C:0 → fb + 160 R:0, C:79 → fb + 158 ● Attribute byte ○ 4 bits for fg / bg color ○ 16 available colors

Slide 18

Slide 18 text

Before we begin coding: limitations ● Unsupported Go features ○ Maps ○ Interfaces ○ Go-routines ○ defer ● Go memory allocator is not available ○ Calls to the allocator will cause a triple-fault No variable should escape to the heap

Slide 19

Slide 19 text

Let’s write some code!

Slide 20

Slide 20 text

How to overcome these limitations ● Bootstrap the Go memory allocator ○ Allocator is implemented entirely in Go ○ Requires some OS-specific hooks ○ Our Kernel must provide its own implementation for these hooks ● Unlock more Go features ○ Maps, interfaces and defer ○ Package init() functions Can we actually DO this without triggering memory allocations?

Slide 21

Slide 21 text

Bare Metal Gophers We enabled SSE. Let’s do some math!

Slide 22

Slide 22 text

Final remarks YES! But... ● It is harder compared to other languages ● Initial code must be designed to avoid memory allocations ● Things get easier once the allocator is up and running Slides and code for this talk https://github.com/achilleasa/bare-metal-gophers Bonus! A work-in-progress 64-bit Kernel using the concepts from this talk https://github.com/achilleasa/gopher-os P.S. We are hiring: https://geckoboard.com/careers Can you write a OS Kernel in Go?