Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Road to your goroutines

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Road to your goroutines

Avatar for monochromegane

monochromegane

October 04, 2018
Tweet

More Decks by monochromegane

Other Decks in Technology

Transcript

  1. - My summer vacation - ࡾ୐༔հ / Pepabo R&D Institute,

    GMO Pepabo, Inc. 2018.10.04 Fukuoka.go#12 Road to your goroutines
  2. • Now, “The platinum searcher” v3 is under development. •

    In v3, speeding up will be done by reviewing the algorithm and Goroutine use. • So, I thought I need to know Goroutine more deeply in my summer vacation. • Go 5 Motivation
  3. Overview 7 ( . 1 ( ( -PDBM3VO2VFVF (MPCBM3VO2VFVF (

    ( SVOUJNFTZTNPO . TDIFEVMF FYFDVUF (GO HPFYJU • G: Goroutine • M: OS thread • P: Processor (Scheduling context) • It provides M:N Scheduler (Some goroutines:Some threads). • It has local run queue. • We can think of Goroutines as application-level threads.
 OS thread behavior like a worker for goroutine using run queue. 3FGIUUQTTQFBLFSEFDLDPNSFUFSWJTJPOHPSVOUJNFTDIFEVMFS
  4. Roadmap 10 _rt0_amd64_linux !"" _rt0_amd64 !"" runtime.rt0_go #"" runtime.osinit #""

    runtime.schedinit #"" runtime.newproc(runtime.main) $ !"" runtime.newproc1 $ !"" runtime.runqput ---------------------------- # Create runtime.main goroutine #"" runtime.mstart $ !"" runtime.mstart1 $ !"" runtime.schedule --------------------------- # Fetch runtime.main goroutine $ !"" runtime.execute $ !"" runtime.gogo $ !"" runtime.main $ #"" runtime.newm(runtime.sysmon) # Create thread for sysmon $ #"" runtime.init # Create forcegc helper goroutine $ #"" runtime.gcenable # Create background sweeper goroutine $ #"" main.init # Create finalizer goroutine $ #"" main.main $ $ #"" runtime.newproc(main.work-1) $ $ $ !"" runtime.newproc1 $ $ $ !"" runtime.runqput # Create your goroutine $ $ $ $ $ !"" runtime.newproc(main.work-2) $ $ !"" runtime.newproc1 $ $ !"" runtime.runqput # Create your goroutine $ !"" exit(0) !"" runtime.mexit
  5. 12 Target application package main import ( "fmt" "sync" )

    func work(i int, wg *sync.WaitGroup) { fmt.Printf("hello, world in goroutine %d\n", i) wg.Done() } func main() { var wg sync.WaitGroup wg.Add(2) go work(1, &wg) go work(2, &wg) wg.Wait() fmt.Println("hello, world in goroutine main") }
  6. 13 Build and run $ go env GOARCH="amd64" GOOS=“linux" $

    go version go version go1.11 linux/amd64 # Build as engineer friendly $ go build -gcflags '-N -l’ # Run by gdb $ GOMAXPROCS=1 gdb work
  7. 14 Entry point (gdb) info file Symbols from "/home/vagrant/go/src/work/work". Local

    exec file: `/home/vagrant/go/src/work/work', file type elf64-x86-64. Entry point: 0x451f70 0x0000000000401000 - 0x0000000000486c79 is .text 0x0000000000487000 - 0x00000000004cb415 is .rodata 0x00000000004cb5c0 - 0x00000000004cc124 is .typelink 0x00000000004cc128 - 0x00000000004cc168 is .itablink 0x00000000004cc168 - 0x00000000004cc168 is .gosymtab 0x00000000004cc180 - 0x0000000000539e8f is .gopclntab 0x000000000053a000 - 0x0000000000546b9c is .noptrdata 0x0000000000546ba0 - 0x000000000054d850 is .data 0x000000000054d860 - 0x0000000000569ef0 is .bss 0x0000000000569f00 - 0x000000000056c638 is .noptrbss 0x0000000000400f9c - 0x0000000000401000 is .note.go.buildid
  8. 15 Entry point -> _rt0_amd64 -> runtime.rt0_go (gdb) x 0x451f70

    0x451f70 <_rt0_amd64_linux>: 0xffc6cbe9 (gdb) disas 0x451f70 Dump of assembler code for function _rt0_amd64_linux: 0x0000000000451f70 <+0>: jmpq 0x44e640 <_rt0_amd64> End of assembler dump. Dump of assembler code for function _rt0_amd64: 0x000000000044e640 <+0>: mov (%rsp),%rdi 0x000000000044e644 <+4>: lea 0x8(%rsp),%rsi 0x000000000044e649 <+9>: jmpq 0x44e650 <runtime.rt0_go> End of assembler dump.
  9. 16 runtime.rt0_go (gdb) disas 0x44e650 Dump of assembler code for

    function runtime.rt0_go: (snip) 0x000000000044e760 <+272>: callq 0x4250a0 <runtime.osinit> 0x000000000044e765 <+277>: callq 0x429890 <runtime.schedinit> 0x000000000044e76a <+282>: lea 0x7af4f(%rip),%rax # 0x4c96c0 <runtime.mainPC> 0x000000000044e771 <+289>: push %rax 0x000000000044e772 <+290>: pushq $0x0 0x000000000044e774 <+292>: callq 0x430240 <runtime.newproc> 0x000000000044e779 <+297>: pop %rax 0x000000000044e77a <+298>: pop %rax 0x000000000044e77b <+299>: callq 0x42b6f0 <runtime.mstart> (snip) • Code: runtime/asm_amd64.s • runtime.osinit -> runtime.schedinit -> runtime.newproc(0, runtime.mainPC) -> runtime.mstart
  10. runtime.osinit 17 func osinit() { ncpu = getproccount() } •

    Code: runtime/os_linux.go • Getting number of logical CPUs core. (= runtime.NumCPU)
  11. runtime.schedinit 18 • Code: runtime/proc.go • Overwrite procs by GOMAXPROCS

    if it passed • Initialize new P’s (snip) procs := ncpu if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 { procs = n } if procresize(procs) != nil { throw("unknown runnable goroutine during bootstrap") } (snip)
  12. runtime.newproc(0, runtime.mainPC) 19 (gdb) x 0x4c96c0 0x4c96c0 <runtime.mainPC>: 0x004285c0 (gdb)

    x 0x004285c0 0x4285c0 <runtime.main>: 0x0c8b4864 func newproc(siz int32, fn *funcval) { argp := add(unsafe.Pointer(&fn), sys.PtrSize) gp := getg() pc := getcallerpc() systemstack(func() { newproc1(fn, (*uint8)(argp), siz, gp, pc) }) } • Code: runtime/proc.go • runtime.mainPC is label for runtime.main
  13. runtime.newproc1 20 • Code: runtime/proc.go • Create a new “g”

    (for runtime.main) and put on the queue. func newproc1(fn *funcval, argp *uint8, narg int32, callergp *g, callerpc uintptr) { (snip) _g_ := getg() (snip) _p_ := _g_.m.p.ptr() newg := gfget(_p_) (snip) gostartcallfn(&newg.sched, fn) (snip) runqput(_p_, newg, true) (snip) }
  14. New “g” 21 YD YD YD TUBDLHVBSE )JHI -PX YDE

    TIFETQ YDF TULUPQTQ Stack (2048bytes) TULUPQ 4UBDL 4UBDLHVBSE   Sched (type gobuf) type gobuf struct { sp uintptr // 824633919448 -> 0xc0000307d8 pc uintptr // 4359616 -> 0x4285c0 <runtime.main> g guintptr // newg ctxt unsafe.Pointer // 0x4c96c0 <runtime.mainPC> ret sys.Uintreg. // 0 lr uintptr // 0 bp uintptr // 0 } • Runtime allocates 2kB stack to “g”. • “g” has gobuf named “sched”. • “sched” has goroutines state. (e.g. stack pointer, program counter…)
  15. runtime.mstart -> runtime.mstart1 -> runtime.schedule 22 func mstart() { (snip)

    mstart1() (snip) } func mstart1() { (snip) schedule() } func schedule() { (snip) gp, inheritTime = runqget(_g_.m.p.ptr()) (snip) execute(gp, inheritTime) } • Code: runtime/proc.go • Get the “g” (for runtime.main) and execute it.
  16. 23 func execute(gp *g, inheritTime bool) { (snip) gogo(&gp.sched) }

    runtime.execute • Code: runtime/proc.go • Call runtime.gogo with “g”.sched.
  17. 24 // void gogo(Gobuf*) // restore state from Gobuf; longjmp

    TEXT runtime·gogo(SB), NOSPLIT, $16-8 MOVQ buf+0(FP), BX MOVQ gobuf_g(BX), DX MOVQ 0(DX), CX get_tls(CX) MOVQ DX, g(CX) MOVQ gobuf_sp(BX), SP MOVQ gobuf_ret(BX), AX MOVQ gobuf_ctxt(BX), DX MOVQ gobuf_bp(BX), BP MOVQ $0, gobuf_sp(BX) MOVQ $0, gobuf_ret(BX) MOVQ $0, gobuf_ctxt(BX) MOVQ $0, gobuf_bp(BX) MOVQ gobuf_pc(BX), BX JMP BX runtime.gogo (gdb) i registers rax 0x0 0 // <- gobuf.ret rbx 0x4285c0 4359616 // <- gobuf.pc # runtime.main rcx 0xc000030000 824633917440 // <- g rdx 0x4c96c0 5019328 // <- gobuf.ctxt # runtime.mainPC rsi 0x0 0 rdi 0x1 1 rbp 0x0 0x0 // <- gobuf.bp rsp 0xc0000307d8 0xc0000307d8 // <- gobuf.sp r8 0x1c6 454 r9 0x0 0 r10 0x8 8 r11 0x202 514 r12 0xffffffffffffffff -1 r13 0x2 2 r14 0x1 1 r15 0x400 1024 (snip) • Code: runtime/asm_amd64.s • JMP to BX register (= gobuf.pc = runtime.main)
  18. • Runtime creates a new “g” (for runtime.main) and put

    on the queue. • Runtime get the “g” (for runtime.main) and execute it. 25 Current status
  19. 26 runtime.main (i) func main() { (snip) systemstack(func() { newm(sysmon,

    nil) }) (snip) runtime_init() (snip) gcenable() (snip) … • newm(sysmon, nil) call clone syscall with sysmon function. (Make thread.) • runtime.init() creates forcegc helper goroutine. (runtime.forcegchelper) • gcenable() creates background sweeper goroutine. (runtime.bgsweep)
  20. 27 runtime.main (ii) • main.init() (-> fmt.init -> os.init…) creates

    finalizer goroutine. (rruntime.runfinq) fn := main_init fn() (snip) 0x0000000000428774 <+436>: lea 0x94a25(%rip),%rdx # 0x4bd1a0 0x000000000042877b <+443>: callq *%rax (snip) $ objdump -S work (gdb) x 0x4bd1a0 0x4bd1a0: 0x00486c10 (gdb) disas 0x00486c10 Dump of assembler code for function main.init:
  21. 28 runtime.main (iii) • Runtime finally calls main.main() !!! fn

    := main_main fn() (snip) 0x00000000004287be <+510>: lea 0x949e3(%rip),%rdx # 0x4bd1a8 0x00000000004287c5 <+517>: callq *%rax (snip) $ objdump -S work (gdb) x 0x4bd1a8 0x4bd1a8: 0x00486ad0 (gdb) disas 0x00486ad0 Dump of assembler code for function main.main:
  22. 29 main.main (gdb) disas 0x00486ad0 Dump of assembler code for

    function main.main: (snip) 0x0000000000486b31 <+97>: mov 0x38(%rsp),%rax 0x0000000000486b36 <+102>: mov %rax,0x18(%rsp) 0x0000000000486b3b <+107>: movq $0x1,0x10(%rsp) 0x0000000000486b44 <+116>: movl $0x10,(%rsp) 0x0000000000486b4b <+123>: lea 0x36436(%rip),%rax # 0x4bcf88 0x0000000000486b52 <+130>: mov %rax,0x8(%rsp) 0x0000000000486b57 <+135>: callq 0x430240 <runtime.newproc> 0x0000000000486b5c <+140>: mov 0x38(%rsp),%rax 0x0000000000486b61 <+145>: mov %rax,0x18(%rsp) 0x0000000000486b66 <+150>: movq $0x2,0x10(%rsp) 0x0000000000486b6f <+159>: movl $0x10,(%rsp) 0x0000000000486b76 <+166>: lea 0x3640b(%rip),%rax # 0x4bcf88 0x0000000000486b7d <+173>: mov %rax,0x8(%rsp) 0x0000000000486b82 <+178>: callq 0x430240 <runtime.newproc> 0x0000000000486b87 <+183>: mov 0x38(%rsp),%rax 0x0000000000486b8c <+188>: mov %rax,(%rsp) 0x0000000000486b90 <+192>: callq 0x45fc10 <sync.(*WaitGroup).Wait> (snip)
  23. 30 main.main 0x0000000000486b4b <+123>: lea 0x36436(%rip),%rax # 0x4bcf88 0x0000000000486b52 <+130>:

    mov %rax,0x8(%rsp) 0x0000000000486b57 <+135>: callq 0x430240 <runtime.newproc> • “go work(1, &wg)” become runtime.newproc(8, main.work) in build step. (gdb) x 0x4bcf88 0x4bcf88: 0x004869a0 (gdb) disas 0x004869a0 Dump of assembler code for function main.work: go work(1, &wg) =
  24. • runtime.main creates a thread for sysmon. • runtime.main creates

    three goroutines. • runtime.main calls main.main. • main.main creates goroutines for main.work using runtime.newproc. 31 Current status
  25. • Because, G >> M > P • Scheduler switches

    • long running goroutines. • running system call (I/O based) goroutines. • networking, receive/send channel, time.Sleep, etc… • Exit goroutines. 33 Switching goroutines
  26. Detect long running goroutines 35 func sysmon() { (snip) for

    { (snip) // retake P's blocked in syscalls // and preempt long running G's if retake(now) != 0 { idle = 0 } else { idle++ } (snip) } } func retake(now int64) uint32 { (snip) for i := 0; i < len(allp); i++ { (snip) if s == _Psyscall { (snip) } else if s == _Prunning { (snip) preemptone(_p_) } } (snip) } • Code: runtime/proc.go • Thread for runtime.sysmon detects long running goroutines.
  27. Mark long running goroutines 36 • Code: runtime/proc.go, stack.go •

    To cause split stack check failure, stakguard make greater than any real sp. func preemptone(_p_ *p) bool { (snip) gp.preempt = true (snip) gp.stackguard0 = stackPreempt return true } const ( uintptrMask = 1<<(8*sys.PtrSize) - 1 stackPreempt = uintptrMask & -1314 ) // p/x 18446744073709550302 // ->. 0xfffffffffffffade TULUPQ 4UBDL 4UBDLHVBSE   TUBDL1SFFNQU
  28. 37 Switch long running goroutines (gdb) disas 0x04869a0 Dump of

    assembler code for function main.work: (snip) 0x00000000004869a9 <+9>: lea -0x10(%rsp),%rax 0x00000000004869ae <+14>: cmp 0x10(%rcx),%rax 0x00000000004869b2 <+18>: jbe 0x486abb <main.work+283> (snip) 0x0000000000486abb <+283>: callq 0x44e9d0 <runtime.morestack_noctxt> (gdb) i registers rcx 0xc000030000 824633917440 // <- g rsp 0xc0000307d8 0xc0000307d8 // <- gobuf.sp IJ MP TUBDLHVBSE 0GGTFUCZUFT Y SDY HSDY • Compiler injects stack check in each func. • If stack pointer below or equal than stackguard0 then jump to runtime.morestack_noctxt.
  29. Switch long running goroutines 38 TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0 MOVL $0, DX

    JMP runtime·morestack(SB) TEXT runtime·morestack(SB),NOSPLIT,$0-0 (snip) CALL runtime·newstack(SB) func newstack() { (snip) preempt := atomic.Loaduintptr(&gp.stackguard0) == stackPreempt (snip) if preempt { (snip) gopreempt_m(gp) // never return } } • Code: runtime/asm_amd64.s, runtime/stack.go
  30. Switch long running goroutines 39 • Code: runtime/proc.go func gopreempt_m(gp

    *g) { (snip) goschedImpl(gp) } func goschedImpl(gp *g) { (snip) casgstatus(gp, _Grunning, _Grunnable) dropg() lock(&sched.lock) globrunqput(gp) unlock(&sched.lock) schedule() } • Scheduler preempt long running goroutines, and put into global run queue. • After that, schduler find runnable goroutines by calling runtime.schedule.
  31. Mark running system call goroutines 41 TEXT ·Syscall(SB),NOSPLIT,$0-56 CALL runtime·entersyscall(SB)

    MOVQ a1+8(FP), DI MOVQ a2+16(FP), SI MOVQ a3+24(FP), DX MOVQ $0, R10 MOVQ $0, R8 MOVQ $0, R9 MOVQ trap+0(FP), AX // syscall entry SYSCALL CMPQ AX, $0xfffffffffffff001 JLS ok MOVQ $-1, r1+32(FP) MOVQ $0, r2+40(FP) NEGQ AX MOVQ AX, err+48(FP) CALL runtime·exitsyscall(SB) RET • Code: syscall/asm_linux_amd64.s
  32. Mark running system call goroutines 42 func entersyscall() { reentersyscall(getcallerpc(),

    getcallersp()) } func reentersyscall(pc, sp uintptr) { (snip) save(pc, sp) (snip) casgstatus(_g_, _Grunning, _Gsyscall) (snip) atomic.Store(&_g_.m.p.ptr().status, _Psyscall) (snip) } • Code: runtime/proc.go
  33. Detect running system call goroutines 43 func sysmon() { (snip)

    for { (snip) // retake P's blocked in syscalls // and preempt long running G's if retake(now) != 0 { idle = 0 } else { idle++ } (snip) } } func retake(now int64) uint32 { (snip) for i := 0; i < len(allp); i++ { (snip) if s == _Psyscall { (snip) handoffp(_p_) } else if s == _Prunning { (snip) } } (snip) } • Code: runtime/proc.go • Thread for runtime.sysmon detects running system call goroutines.
  34. 44 Handoff running system call goroutines func startm(_p_ *p, spinning

    bool) { (snip) newm(fn, _p_) return (snip) } func newm(fn func(), _p_ *p) { mp := allocm(_p_, fn) (snip) newm1(mp) } func newm1(mp *m) { (snip) newosproc(mp) (snip) } func newosproc(mp *m) { (snip) ret := clone(cloneFlags, stk, unsafe.Pointer(mp), unsafe.Pointer(mp.g0), unsafe.Pointer(funcPC(mstart))) (snip) } • Code: runtime/proc.go, runtime/ os_linux.go • runtime.handoffp call runtime.startm. • Runtime.allocm disassociate p and the current m. (runtime.releasep) • p associate new m. • The m associate new thread for runtime.mstart. (schduler find runnable goroutines.)
  35. Stack for goroutine 47 (gdb) info goroutines 1 waiting runtime.gopark

    2 runnable runtime.forcegchelper 3 waiting runtime.gopark 4 runnable runtime.runfinq 5 runnable main.work * 6 running main.work (gdb) goroutine 6 bt #0 main.work (i=2, wg=0xc0000120c0) #1 0x00000000004507d1 in runtime.goexit () (snip) TULUPQ 4UBDL 4UBDLHVBSE   YESVOUJNFHPFYJU  • Code: runtime/proc.go • runtime.newproc1 push runtime.goexit into new “g” stack.
  36. Exit goroutines 48 TEXT runtime·goexit(SB),NOSPLIT,$0-0 BYTE $0x90 // NOP CALL

    runtime·goexit1(SB) func goexit1() { (snip) mcall(goexit0) } func goexit0(gp *g) { (snip) casgstatus(gp, _Grunning, _Gdead) (snip) dropg() (snip) gfput(_g_.m.p.ptr(), gp) (snip) schedule() } • When goroutine created by runtime.newproc1 exit, runtime.goexit is called. • After that, schduler find runnable goroutines by calling runtime.schedule. • Code: runtime/asm_amd64.s • Code: runtime/proc.go
  37. • We understood assembly. • We understood how goroutine works

    by reading runtime code and assembly. • The Go scheduler turn I/O blocking task into CPU bound task, so, it seems to match with the platinum searcher and servers. • At the same time, if we use too much of the goroutine with systemcall, it becomes a bottleneck. • Therefore, I should review the algorithm “and” Goroutine use. • Go 50 Summary
  38. • Scalable Go Scheduler Design Doc • https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit • A

    Quick Guide to Go's Assembler • https://golang.org/doc/asm • Debugging Go Code with GDB • https://golang.org/doc/gdb 51 See also