Slide 1

Slide 1 text

- My summer vacation - ࡾ୐༔հ / Pepabo R&D Institute, GMO Pepabo, Inc. 2018.10.04 Fukuoka.go#12 Road to your goroutines

Slide 2

Slide 2 text

ϓϦϯγύϧΤϯδχΞ ࡾ୐༔հ!NPOPDISPNFHBOF (.0ϖύϘגࣜձࣾϖύϘݚڀॴ IUUQTCMPHNPOPDISPNFHBOFDPN

Slide 3

Slide 3 text

1. Motivation 2. Overview of goroutines and scheduler 3. Road to your goroutines 3 Agenda

Slide 4

Slide 4 text

1. Motivation

Slide 5

Slide 5 text

• Now, “The platinum searcher” v3 is under development. • In v3, speeding up will be done by reviewing the algorithm and Goroutine use. • So, I thought I need to know Goroutine more deeply in my summer vacation. • Go 5 Motivation

Slide 6

Slide 6 text

2. Overview of goroutines and scheduler

Slide 7

Slide 7 text

Overview 7 ( . 1 ( ( -PDBM3VO2VFVF (MPCBM3VO2VFVF ( ( SVOUJNFTZTNPO . TDIFEVMF FYFDVUF (GO HPFYJU • G: Goroutine • M: OS thread • P: Processor (Scheduling context) • It provides M:N Scheduler (Some goroutines:Some threads). • It has local run queue. • We can think of Goroutines as application-level threads.
 OS thread behavior like a worker for goroutine using run queue. 3FGIUUQTTQFBLFSEFDLDPNSFUFSWJTJPOHPSVOUJNFTDIFEVMFS

Slide 8

Slide 8 text

3. Road to your goroutines

Slide 9

Slide 9 text

Roadmap

Slide 10

Slide 10 text

Roadmap 10 _rt0_amd64_linux !"" _rt0_amd64 !"" runtime.rt0_go #"" runtime.osinit #"" runtime.schedinit #"" runtime.newproc(runtime.main) $ !"" runtime.newproc1 $ !"" runtime.runqput ---------------------------- # Create runtime.main goroutine #"" runtime.mstart $ !"" runtime.mstart1 $ !"" runtime.schedule --------------------------- # Fetch runtime.main goroutine $ !"" runtime.execute $ !"" runtime.gogo $ !"" runtime.main $ #"" runtime.newm(runtime.sysmon) # Create thread for sysmon $ #"" runtime.init # Create forcegc helper goroutine $ #"" runtime.gcenable # Create background sweeper goroutine $ #"" main.init # Create finalizer goroutine $ #"" main.main $ $ #"" runtime.newproc(main.work-1) $ $ $ !"" runtime.newproc1 $ $ $ !"" runtime.runqput # Create your goroutine $ $ $ $ $ !"" runtime.newproc(main.work-2) $ $ !"" runtime.newproc1 $ $ !"" runtime.runqput # Create your goroutine $ !"" exit(0) !"" runtime.mexit

Slide 11

Slide 11 text

Let’s Go

Slide 12

Slide 12 text

12 Target application package main import ( "fmt" "sync" ) func work(i int, wg *sync.WaitGroup) { fmt.Printf("hello, world in goroutine %d\n", i) wg.Done() } func main() { var wg sync.WaitGroup wg.Add(2) go work(1, &wg) go work(2, &wg) wg.Wait() fmt.Println("hello, world in goroutine main") }

Slide 13

Slide 13 text

13 Build and run $ go env GOARCH="amd64" GOOS=“linux" $ go version go version go1.11 linux/amd64 # Build as engineer friendly $ go build -gcflags '-N -l’ # Run by gdb $ GOMAXPROCS=1 gdb work

Slide 14

Slide 14 text

14 Entry point (gdb) info file Symbols from "/home/vagrant/go/src/work/work". Local exec file: `/home/vagrant/go/src/work/work', file type elf64-x86-64. Entry point: 0x451f70 0x0000000000401000 - 0x0000000000486c79 is .text 0x0000000000487000 - 0x00000000004cb415 is .rodata 0x00000000004cb5c0 - 0x00000000004cc124 is .typelink 0x00000000004cc128 - 0x00000000004cc168 is .itablink 0x00000000004cc168 - 0x00000000004cc168 is .gosymtab 0x00000000004cc180 - 0x0000000000539e8f is .gopclntab 0x000000000053a000 - 0x0000000000546b9c is .noptrdata 0x0000000000546ba0 - 0x000000000054d850 is .data 0x000000000054d860 - 0x0000000000569ef0 is .bss 0x0000000000569f00 - 0x000000000056c638 is .noptrbss 0x0000000000400f9c - 0x0000000000401000 is .note.go.buildid

Slide 15

Slide 15 text

15 Entry point -> _rt0_amd64 -> runtime.rt0_go (gdb) x 0x451f70 0x451f70 <_rt0_amd64_linux>: 0xffc6cbe9 (gdb) disas 0x451f70 Dump of assembler code for function _rt0_amd64_linux: 0x0000000000451f70 <+0>: jmpq 0x44e640 <_rt0_amd64> End of assembler dump. Dump of assembler code for function _rt0_amd64: 0x000000000044e640 <+0>: mov (%rsp),%rdi 0x000000000044e644 <+4>: lea 0x8(%rsp),%rsi 0x000000000044e649 <+9>: jmpq 0x44e650 End of assembler dump.

Slide 16

Slide 16 text

16 runtime.rt0_go (gdb) disas 0x44e650 Dump of assembler code for function runtime.rt0_go: (snip) 0x000000000044e760 <+272>: callq 0x4250a0 0x000000000044e765 <+277>: callq 0x429890 0x000000000044e76a <+282>: lea 0x7af4f(%rip),%rax # 0x4c96c0 0x000000000044e771 <+289>: push %rax 0x000000000044e772 <+290>: pushq $0x0 0x000000000044e774 <+292>: callq 0x430240 0x000000000044e779 <+297>: pop %rax 0x000000000044e77a <+298>: pop %rax 0x000000000044e77b <+299>: callq 0x42b6f0 (snip) • Code: runtime/asm_amd64.s • runtime.osinit -> runtime.schedinit -> runtime.newproc(0, runtime.mainPC) -> runtime.mstart

Slide 17

Slide 17 text

runtime.osinit 17 func osinit() { ncpu = getproccount() } • Code: runtime/os_linux.go • Getting number of logical CPUs core. (= runtime.NumCPU)

Slide 18

Slide 18 text

runtime.schedinit 18 • Code: runtime/proc.go • Overwrite procs by GOMAXPROCS if it passed • Initialize new P’s (snip) procs := ncpu if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 { procs = n } if procresize(procs) != nil { throw("unknown runnable goroutine during bootstrap") } (snip)

Slide 19

Slide 19 text

runtime.newproc(0, runtime.mainPC) 19 (gdb) x 0x4c96c0 0x4c96c0 : 0x004285c0 (gdb) x 0x004285c0 0x4285c0 : 0x0c8b4864 func newproc(siz int32, fn *funcval) { argp := add(unsafe.Pointer(&fn), sys.PtrSize) gp := getg() pc := getcallerpc() systemstack(func() { newproc1(fn, (*uint8)(argp), siz, gp, pc) }) } • Code: runtime/proc.go • runtime.mainPC is label for runtime.main

Slide 20

Slide 20 text

runtime.newproc1 20 • Code: runtime/proc.go • Create a new “g” (for runtime.main) and put on the queue. func newproc1(fn *funcval, argp *uint8, narg int32, callergp *g, callerpc uintptr) { (snip) _g_ := getg() (snip) _p_ := _g_.m.p.ptr() newg := gfget(_p_) (snip) gostartcallfn(&newg.sched, fn) (snip) runqput(_p_, newg, true) (snip) }

Slide 21

Slide 21 text

New “g” 21 YD YD YD TUBDLHVBSE )JHI -PX YDE TIFETQ YDF TULUPQTQ Stack (2048bytes) TULUPQ 4UBDL 4UBDLHVBSE Sched (type gobuf) type gobuf struct { sp uintptr // 824633919448 -> 0xc0000307d8 pc uintptr // 4359616 -> 0x4285c0 g guintptr // newg ctxt unsafe.Pointer // 0x4c96c0 ret sys.Uintreg. // 0 lr uintptr // 0 bp uintptr // 0 } • Runtime allocates 2kB stack to “g”. • “g” has gobuf named “sched”. • “sched” has goroutines state. (e.g. stack pointer, program counter…)

Slide 22

Slide 22 text

runtime.mstart -> runtime.mstart1 -> runtime.schedule 22 func mstart() { (snip) mstart1() (snip) } func mstart1() { (snip) schedule() } func schedule() { (snip) gp, inheritTime = runqget(_g_.m.p.ptr()) (snip) execute(gp, inheritTime) } • Code: runtime/proc.go • Get the “g” (for runtime.main) and execute it.

Slide 23

Slide 23 text

23 func execute(gp *g, inheritTime bool) { (snip) gogo(&gp.sched) } runtime.execute • Code: runtime/proc.go • Call runtime.gogo with “g”.sched.

Slide 24

Slide 24 text

24 // void gogo(Gobuf*) // restore state from Gobuf; longjmp TEXT runtime·gogo(SB), NOSPLIT, $16-8 MOVQ buf+0(FP), BX MOVQ gobuf_g(BX), DX MOVQ 0(DX), CX get_tls(CX) MOVQ DX, g(CX) MOVQ gobuf_sp(BX), SP MOVQ gobuf_ret(BX), AX MOVQ gobuf_ctxt(BX), DX MOVQ gobuf_bp(BX), BP MOVQ $0, gobuf_sp(BX) MOVQ $0, gobuf_ret(BX) MOVQ $0, gobuf_ctxt(BX) MOVQ $0, gobuf_bp(BX) MOVQ gobuf_pc(BX), BX JMP BX runtime.gogo (gdb) i registers rax 0x0 0 // <- gobuf.ret rbx 0x4285c0 4359616 // <- gobuf.pc # runtime.main rcx 0xc000030000 824633917440 // <- g rdx 0x4c96c0 5019328 // <- gobuf.ctxt # runtime.mainPC rsi 0x0 0 rdi 0x1 1 rbp 0x0 0x0 // <- gobuf.bp rsp 0xc0000307d8 0xc0000307d8 // <- gobuf.sp r8 0x1c6 454 r9 0x0 0 r10 0x8 8 r11 0x202 514 r12 0xffffffffffffffff -1 r13 0x2 2 r14 0x1 1 r15 0x400 1024 (snip) • Code: runtime/asm_amd64.s • JMP to BX register (= gobuf.pc = runtime.main)

Slide 25

Slide 25 text

• Runtime creates a new “g” (for runtime.main) and put on the queue. • Runtime get the “g” (for runtime.main) and execute it. 25 Current status

Slide 26

Slide 26 text

26 runtime.main (i) func main() { (snip) systemstack(func() { newm(sysmon, nil) }) (snip) runtime_init() (snip) gcenable() (snip) … • newm(sysmon, nil) call clone syscall with sysmon function. (Make thread.) • runtime.init() creates forcegc helper goroutine. (runtime.forcegchelper) • gcenable() creates background sweeper goroutine. (runtime.bgsweep)

Slide 27

Slide 27 text

27 runtime.main (ii) • main.init() (-> fmt.init -> os.init…) creates finalizer goroutine. (rruntime.runfinq) fn := main_init fn() (snip) 0x0000000000428774 <+436>: lea 0x94a25(%rip),%rdx # 0x4bd1a0 0x000000000042877b <+443>: callq *%rax (snip) $ objdump -S work (gdb) x 0x4bd1a0 0x4bd1a0: 0x00486c10 (gdb) disas 0x00486c10 Dump of assembler code for function main.init:

Slide 28

Slide 28 text

28 runtime.main (iii) • Runtime finally calls main.main() !!! fn := main_main fn() (snip) 0x00000000004287be <+510>: lea 0x949e3(%rip),%rdx # 0x4bd1a8 0x00000000004287c5 <+517>: callq *%rax (snip) $ objdump -S work (gdb) x 0x4bd1a8 0x4bd1a8: 0x00486ad0 (gdb) disas 0x00486ad0 Dump of assembler code for function main.main:

Slide 29

Slide 29 text

29 main.main (gdb) disas 0x00486ad0 Dump of assembler code for function main.main: (snip) 0x0000000000486b31 <+97>: mov 0x38(%rsp),%rax 0x0000000000486b36 <+102>: mov %rax,0x18(%rsp) 0x0000000000486b3b <+107>: movq $0x1,0x10(%rsp) 0x0000000000486b44 <+116>: movl $0x10,(%rsp) 0x0000000000486b4b <+123>: lea 0x36436(%rip),%rax # 0x4bcf88 0x0000000000486b52 <+130>: mov %rax,0x8(%rsp) 0x0000000000486b57 <+135>: callq 0x430240 0x0000000000486b5c <+140>: mov 0x38(%rsp),%rax 0x0000000000486b61 <+145>: mov %rax,0x18(%rsp) 0x0000000000486b66 <+150>: movq $0x2,0x10(%rsp) 0x0000000000486b6f <+159>: movl $0x10,(%rsp) 0x0000000000486b76 <+166>: lea 0x3640b(%rip),%rax # 0x4bcf88 0x0000000000486b7d <+173>: mov %rax,0x8(%rsp) 0x0000000000486b82 <+178>: callq 0x430240 0x0000000000486b87 <+183>: mov 0x38(%rsp),%rax 0x0000000000486b8c <+188>: mov %rax,(%rsp) 0x0000000000486b90 <+192>: callq 0x45fc10 (snip)

Slide 30

Slide 30 text

30 main.main 0x0000000000486b4b <+123>: lea 0x36436(%rip),%rax # 0x4bcf88 0x0000000000486b52 <+130>: mov %rax,0x8(%rsp) 0x0000000000486b57 <+135>: callq 0x430240 • “go work(1, &wg)” become runtime.newproc(8, main.work) in build step. (gdb) x 0x4bcf88 0x4bcf88: 0x004869a0 (gdb) disas 0x004869a0 Dump of assembler code for function main.work: go work(1, &wg) =

Slide 31

Slide 31 text

• runtime.main creates a thread for sysmon. • runtime.main creates three goroutines. • runtime.main calls main.main. • main.main creates goroutines for main.work using runtime.newproc. 31 Current status

Slide 32

Slide 32 text

How run your goroutines?

Slide 33

Slide 33 text

• Because, G >> M > P • Scheduler switches • long running goroutines. • running system call (I/O based) goroutines. • networking, receive/send channel, time.Sleep, etc… • Exit goroutines. 33 Switching goroutines

Slide 34

Slide 34 text

Long running goroutines

Slide 35

Slide 35 text

Detect long running goroutines 35 func sysmon() { (snip) for { (snip) // retake P's blocked in syscalls // and preempt long running G's if retake(now) != 0 { idle = 0 } else { idle++ } (snip) } } func retake(now int64) uint32 { (snip) for i := 0; i < len(allp); i++ { (snip) if s == _Psyscall { (snip) } else if s == _Prunning { (snip) preemptone(_p_) } } (snip) } • Code: runtime/proc.go • Thread for runtime.sysmon detects long running goroutines.

Slide 36

Slide 36 text

Mark long running goroutines 36 • Code: runtime/proc.go, stack.go • To cause split stack check failure, stakguard make greater than any real sp. func preemptone(_p_ *p) bool { (snip) gp.preempt = true (snip) gp.stackguard0 = stackPreempt return true } const ( uintptrMask = 1<<(8*sys.PtrSize) - 1 stackPreempt = uintptrMask & -1314 ) // p/x 18446744073709550302 // ->. 0xfffffffffffffade TULUPQ 4UBDL 4UBDLHVBSE TUBDL1SFFNQU

Slide 37

Slide 37 text

37 Switch long running goroutines (gdb) disas 0x04869a0 Dump of assembler code for function main.work: (snip) 0x00000000004869a9 <+9>: lea -0x10(%rsp),%rax 0x00000000004869ae <+14>: cmp 0x10(%rcx),%rax 0x00000000004869b2 <+18>: jbe 0x486abb (snip) 0x0000000000486abb <+283>: callq 0x44e9d0 (gdb) i registers rcx 0xc000030000 824633917440 // <- g rsp 0xc0000307d8 0xc0000307d8 // <- gobuf.sp IJ MP TUBDLHVBSE 0GGTFUCZUFT Y SDY HSDY • Compiler injects stack check in each func. • If stack pointer below or equal than stackguard0 then jump to runtime.morestack_noctxt.

Slide 38

Slide 38 text

Switch long running goroutines 38 TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0 MOVL $0, DX JMP runtime·morestack(SB) TEXT runtime·morestack(SB),NOSPLIT,$0-0 (snip) CALL runtime·newstack(SB) func newstack() { (snip) preempt := atomic.Loaduintptr(&gp.stackguard0) == stackPreempt (snip) if preempt { (snip) gopreempt_m(gp) // never return } } • Code: runtime/asm_amd64.s, runtime/stack.go

Slide 39

Slide 39 text

Switch long running goroutines 39 • Code: runtime/proc.go func gopreempt_m(gp *g) { (snip) goschedImpl(gp) } func goschedImpl(gp *g) { (snip) casgstatus(gp, _Grunning, _Grunnable) dropg() lock(&sched.lock) globrunqput(gp) unlock(&sched.lock) schedule() } • Scheduler preempt long running goroutines, and put into global run queue. • After that, schduler find runnable goroutines by calling runtime.schedule.

Slide 40

Slide 40 text

running system call goroutines

Slide 41

Slide 41 text

Mark running system call goroutines 41 TEXT ·Syscall(SB),NOSPLIT,$0-56 CALL runtime·entersyscall(SB) MOVQ a1+8(FP), DI MOVQ a2+16(FP), SI MOVQ a3+24(FP), DX MOVQ $0, R10 MOVQ $0, R8 MOVQ $0, R9 MOVQ trap+0(FP), AX // syscall entry SYSCALL CMPQ AX, $0xfffffffffffff001 JLS ok MOVQ $-1, r1+32(FP) MOVQ $0, r2+40(FP) NEGQ AX MOVQ AX, err+48(FP) CALL runtime·exitsyscall(SB) RET • Code: syscall/asm_linux_amd64.s

Slide 42

Slide 42 text

Mark running system call goroutines 42 func entersyscall() { reentersyscall(getcallerpc(), getcallersp()) } func reentersyscall(pc, sp uintptr) { (snip) save(pc, sp) (snip) casgstatus(_g_, _Grunning, _Gsyscall) (snip) atomic.Store(&_g_.m.p.ptr().status, _Psyscall) (snip) } • Code: runtime/proc.go

Slide 43

Slide 43 text

Detect running system call goroutines 43 func sysmon() { (snip) for { (snip) // retake P's blocked in syscalls // and preempt long running G's if retake(now) != 0 { idle = 0 } else { idle++ } (snip) } } func retake(now int64) uint32 { (snip) for i := 0; i < len(allp); i++ { (snip) if s == _Psyscall { (snip) handoffp(_p_) } else if s == _Prunning { (snip) } } (snip) } • Code: runtime/proc.go • Thread for runtime.sysmon detects running system call goroutines.

Slide 44

Slide 44 text

44 Handoff running system call goroutines func startm(_p_ *p, spinning bool) { (snip) newm(fn, _p_) return (snip) } func newm(fn func(), _p_ *p) { mp := allocm(_p_, fn) (snip) newm1(mp) } func newm1(mp *m) { (snip) newosproc(mp) (snip) } func newosproc(mp *m) { (snip) ret := clone(cloneFlags, stk, unsafe.Pointer(mp), unsafe.Pointer(mp.g0), unsafe.Pointer(funcPC(mstart))) (snip) } • Code: runtime/proc.go, runtime/ os_linux.go • runtime.handoffp call runtime.startm. • Runtime.allocm disassociate p and the current m. (runtime.releasep) • p associate new m. • The m associate new thread for runtime.mstart. (schduler find runnable goroutines.)

Slide 45

Slide 45 text

networking, receive/send channel, time.Sleep, etc… -> Skip in this session.

Slide 46

Slide 46 text

Exit goroutines

Slide 47

Slide 47 text

Stack for goroutine 47 (gdb) info goroutines 1 waiting runtime.gopark 2 runnable runtime.forcegchelper 3 waiting runtime.gopark 4 runnable runtime.runfinq 5 runnable main.work * 6 running main.work (gdb) goroutine 6 bt #0 main.work (i=2, wg=0xc0000120c0) #1 0x00000000004507d1 in runtime.goexit () (snip) TULUPQ 4UBDL 4UBDLHVBSE YESVOUJNFHPFYJU • Code: runtime/proc.go • runtime.newproc1 push runtime.goexit into new “g” stack.

Slide 48

Slide 48 text

Exit goroutines 48 TEXT runtime·goexit(SB),NOSPLIT,$0-0 BYTE $0x90 // NOP CALL runtime·goexit1(SB) func goexit1() { (snip) mcall(goexit0) } func goexit0(gp *g) { (snip) casgstatus(gp, _Grunning, _Gdead) (snip) dropg() (snip) gfput(_g_.m.p.ptr(), gp) (snip) schedule() } • When goroutine created by runtime.newproc1 exit, runtime.goexit is called. • After that, schduler find runnable goroutines by calling runtime.schedule. • Code: runtime/asm_amd64.s • Code: runtime/proc.go

Slide 49

Slide 49 text

Summary

Slide 50

Slide 50 text

• We understood assembly. • We understood how goroutine works by reading runtime code and assembly. • The Go scheduler turn I/O blocking task into CPU bound task, so, it seems to match with the platinum searcher and servers. • At the same time, if we use too much of the goroutine with systemcall, it becomes a bottleneck. • Therefore, I should review the algorithm “and” Goroutine use. • Go 50 Summary

Slide 51

Slide 51 text

• Scalable Go Scheduler Design Doc • https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit • A Quick Guide to Go's Assembler • https://golang.org/doc/asm • Debugging Go Code with GDB • https://golang.org/doc/gdb 51 See also

Slide 52

Slide 52 text

No content