Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Road to your goroutines

Road to your goroutines

monochromegane

October 04, 2018
Tweet

More Decks by monochromegane

Other Decks in Technology

Transcript

  1. - My summer vacation -
    ࡾ୐༔հ / Pepabo R&D Institute, GMO Pepabo, Inc.
    2018.10.04 Fukuoka.go#12
    Road to your goroutines

    View full-size slide

  2. ϓϦϯγύϧΤϯδχΞ
    ࡾ୐༔հ!NPOPDISPNFHBOF
    (.0ϖύϘגࣜձࣾϖύϘݚڀॴ
    IUUQTCMPHNPOPDISPNFHBOFDPN

    View full-size slide

  3. 1. Motivation
    2. Overview of goroutines and scheduler
    3. Road to your goroutines
    3
    Agenda

    View full-size slide

  4. 1.
    Motivation

    View full-size slide

  5. • Now, “The platinum searcher” v3 is under development.
    • In v3, speeding up will be done by reviewing the algorithm and Goroutine
    use.
    • So, I thought I need to know Goroutine more deeply in my summer
    vacation.
    • Go
    5
    Motivation

    View full-size slide

  6. 2.
    Overview
    of
    goroutines and scheduler

    View full-size slide

  7. Overview
    7
    (
    .
    1 ( (
    -PDBM3VO2VFVF
    (MPCBM3VO2VFVF
    ( (
    SVOUJNFTZTNPO
    .
    TDIFEVMF
    FYFDVUF
    (GO
    HPFYJU
    • G: Goroutine
    • M: OS thread
    • P: Processor (Scheduling context)
    • It provides M:N Scheduler (Some
    goroutines:Some threads).
    • It has local run queue.
    • We can think of Goroutines as
    application-level threads.

    OS thread behavior like a worker for
    goroutine using run queue.
    3FGIUUQTTQFBLFSEFDLDPNSFUFSWJTJPOHPSVOUJNFTDIFEVMFS

    View full-size slide

  8. 3.
    Road to your goroutines

    View full-size slide

  9. Roadmap
    10
    _rt0_amd64_linux
    !"" _rt0_amd64
    !"" runtime.rt0_go
    #"" runtime.osinit
    #"" runtime.schedinit
    #"" runtime.newproc(runtime.main)
    $ !"" runtime.newproc1
    $ !"" runtime.runqput ---------------------------- # Create runtime.main goroutine
    #"" runtime.mstart
    $ !"" runtime.mstart1
    $ !"" runtime.schedule --------------------------- # Fetch runtime.main goroutine
    $ !"" runtime.execute
    $ !"" runtime.gogo
    $ !"" runtime.main
    $ #"" runtime.newm(runtime.sysmon) # Create thread for sysmon
    $ #"" runtime.init # Create forcegc helper goroutine
    $ #"" runtime.gcenable # Create background sweeper goroutine
    $ #"" main.init # Create finalizer goroutine
    $ #"" main.main
    $ $ #"" runtime.newproc(main.work-1)
    $ $ $ !"" runtime.newproc1
    $ $ $ !"" runtime.runqput # Create your goroutine
    $ $ $
    $ $ !"" runtime.newproc(main.work-2)
    $ $ !"" runtime.newproc1
    $ $ !"" runtime.runqput # Create your goroutine
    $ !"" exit(0)
    !"" runtime.mexit

    View full-size slide

  10. 12
    Target application
    package main
    import (
    "fmt"
    "sync"
    )
    func work(i int, wg *sync.WaitGroup) {
    fmt.Printf("hello, world in goroutine %d\n", i)
    wg.Done()
    }
    func main() {
    var wg sync.WaitGroup
    wg.Add(2)
    go work(1, &wg)
    go work(2, &wg)
    wg.Wait()
    fmt.Println("hello, world in goroutine main")
    }

    View full-size slide

  11. 13
    Build and run
    $ go env
    GOARCH="amd64"
    GOOS=“linux"
    $ go version
    go version go1.11 linux/amd64
    # Build as engineer friendly
    $ go build -gcflags '-N -l’
    # Run by gdb
    $ GOMAXPROCS=1 gdb work

    View full-size slide

  12. 14
    Entry point
    (gdb) info file
    Symbols from "/home/vagrant/go/src/work/work".
    Local exec file:
    `/home/vagrant/go/src/work/work', file type elf64-x86-64.
    Entry point: 0x451f70
    0x0000000000401000 - 0x0000000000486c79 is .text
    0x0000000000487000 - 0x00000000004cb415 is .rodata
    0x00000000004cb5c0 - 0x00000000004cc124 is .typelink
    0x00000000004cc128 - 0x00000000004cc168 is .itablink
    0x00000000004cc168 - 0x00000000004cc168 is .gosymtab
    0x00000000004cc180 - 0x0000000000539e8f is .gopclntab
    0x000000000053a000 - 0x0000000000546b9c is .noptrdata
    0x0000000000546ba0 - 0x000000000054d850 is .data
    0x000000000054d860 - 0x0000000000569ef0 is .bss
    0x0000000000569f00 - 0x000000000056c638 is .noptrbss
    0x0000000000400f9c - 0x0000000000401000 is .note.go.buildid

    View full-size slide

  13. 15
    Entry point -> _rt0_amd64 -> runtime.rt0_go
    (gdb) x 0x451f70
    0x451f70 <_rt0_amd64_linux>: 0xffc6cbe9
    (gdb) disas 0x451f70
    Dump of assembler code for function _rt0_amd64_linux:
    0x0000000000451f70 <+0>: jmpq 0x44e640 <_rt0_amd64>
    End of assembler dump.
    Dump of assembler code for function _rt0_amd64:
    0x000000000044e640 <+0>: mov (%rsp),%rdi
    0x000000000044e644 <+4>: lea 0x8(%rsp),%rsi
    0x000000000044e649 <+9>: jmpq 0x44e650
    End of assembler dump.

    View full-size slide

  14. 16
    runtime.rt0_go
    (gdb) disas 0x44e650
    Dump of assembler code for function runtime.rt0_go:
    (snip)
    0x000000000044e760 <+272>: callq 0x4250a0
    0x000000000044e765 <+277>: callq 0x429890
    0x000000000044e76a <+282>: lea 0x7af4f(%rip),%rax # 0x4c96c0
    0x000000000044e771 <+289>: push %rax
    0x000000000044e772 <+290>: pushq $0x0
    0x000000000044e774 <+292>: callq 0x430240
    0x000000000044e779 <+297>: pop %rax
    0x000000000044e77a <+298>: pop %rax
    0x000000000044e77b <+299>: callq 0x42b6f0
    (snip)
    • Code: runtime/asm_amd64.s
    • runtime.osinit -> runtime.schedinit -> runtime.newproc(0, runtime.mainPC) ->
    runtime.mstart

    View full-size slide

  15. runtime.osinit
    17
    func osinit() {
    ncpu = getproccount()
    }
    • Code: runtime/os_linux.go
    • Getting number of logical CPUs core. (= runtime.NumCPU)

    View full-size slide

  16. runtime.schedinit
    18
    • Code: runtime/proc.go
    • Overwrite procs by GOMAXPROCS if it passed
    • Initialize new P’s
    (snip)
    procs := ncpu
    if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
    procs = n
    }
    if procresize(procs) != nil {
    throw("unknown runnable goroutine during bootstrap")
    }
    (snip)

    View full-size slide

  17. runtime.newproc(0, runtime.mainPC)
    19
    (gdb) x 0x4c96c0
    0x4c96c0 : 0x004285c0
    (gdb) x 0x004285c0
    0x4285c0 : 0x0c8b4864
    func newproc(siz int32, fn *funcval) {
    argp := add(unsafe.Pointer(&fn), sys.PtrSize)
    gp := getg()
    pc := getcallerpc()
    systemstack(func() {
    newproc1(fn, (*uint8)(argp), siz, gp, pc)
    })
    }
    • Code: runtime/proc.go
    • runtime.mainPC is label for runtime.main

    View full-size slide

  18. runtime.newproc1
    20
    • Code: runtime/proc.go
    • Create a new “g” (for runtime.main) and put on the queue.
    func newproc1(fn *funcval, argp *uint8, narg int32, callergp *g, callerpc uintptr) {
    (snip)
    _g_ := getg()
    (snip)
    _p_ := _g_.m.p.ptr()
    newg := gfget(_p_)
    (snip)
    gostartcallfn(&newg.sched, fn)
    (snip)
    runqput(_p_, newg, true)
    (snip)
    }

    View full-size slide

  19. New “g”
    21
    YD
    YD
    YD
    TUBDLHVBSE
    )JHI
    -PX
    YDE
    TIFETQ
    YDF
    TULUPQTQ
    Stack (2048bytes)
    TULUPQ
    4UBDL
    4UBDLHVBSE


    Sched (type gobuf)
    type gobuf struct {
    sp uintptr // 824633919448 -> 0xc0000307d8
    pc uintptr // 4359616 -> 0x4285c0
    g guintptr // newg
    ctxt unsafe.Pointer // 0x4c96c0
    ret sys.Uintreg. // 0
    lr uintptr // 0
    bp uintptr // 0
    }
    • Runtime allocates 2kB stack to “g”.
    • “g” has gobuf named “sched”.
    • “sched” has goroutines state. (e.g. stack pointer,
    program counter…)

    View full-size slide

  20. runtime.mstart -> runtime.mstart1 -> runtime.schedule
    22
    func mstart() {
    (snip)
    mstart1()
    (snip)
    }
    func mstart1() {
    (snip)
    schedule()
    }
    func schedule() {
    (snip)
    gp, inheritTime = runqget(_g_.m.p.ptr())
    (snip)
    execute(gp, inheritTime)
    }
    • Code: runtime/proc.go
    • Get the “g” (for runtime.main) and execute it.

    View full-size slide

  21. 23
    func execute(gp *g, inheritTime bool) {
    (snip)
    gogo(&gp.sched)
    }
    runtime.execute
    • Code: runtime/proc.go
    • Call runtime.gogo with “g”.sched.

    View full-size slide

  22. 24
    // void gogo(Gobuf*)
    // restore state from Gobuf; longjmp
    TEXT runtime·gogo(SB), NOSPLIT, $16-8
    MOVQ buf+0(FP), BX
    MOVQ gobuf_g(BX), DX
    MOVQ 0(DX), CX
    get_tls(CX)
    MOVQ DX, g(CX)
    MOVQ gobuf_sp(BX), SP
    MOVQ gobuf_ret(BX), AX
    MOVQ gobuf_ctxt(BX), DX
    MOVQ gobuf_bp(BX), BP
    MOVQ $0, gobuf_sp(BX)
    MOVQ $0, gobuf_ret(BX)
    MOVQ $0, gobuf_ctxt(BX)
    MOVQ $0, gobuf_bp(BX)
    MOVQ gobuf_pc(BX), BX
    JMP BX
    runtime.gogo
    (gdb) i registers
    rax 0x0 0 // <- gobuf.ret
    rbx 0x4285c0 4359616 // <- gobuf.pc # runtime.main
    rcx 0xc000030000 824633917440 // <- g
    rdx 0x4c96c0 5019328 // <- gobuf.ctxt # runtime.mainPC
    rsi 0x0 0
    rdi 0x1 1
    rbp 0x0 0x0 // <- gobuf.bp
    rsp 0xc0000307d8 0xc0000307d8 // <- gobuf.sp
    r8 0x1c6 454
    r9 0x0 0
    r10 0x8 8
    r11 0x202 514
    r12 0xffffffffffffffff -1
    r13 0x2 2
    r14 0x1 1
    r15 0x400 1024
    (snip)
    • Code: runtime/asm_amd64.s
    • JMP to BX register (= gobuf.pc = runtime.main)

    View full-size slide

  23. • Runtime creates a new “g” (for runtime.main) and put on the queue.
    • Runtime get the “g” (for runtime.main) and execute it.
    25
    Current status

    View full-size slide

  24. 26
    runtime.main (i)
    func main() {
    (snip)
    systemstack(func() {
    newm(sysmon, nil)
    })
    (snip)
    runtime_init()
    (snip)
    gcenable()
    (snip)

    • newm(sysmon, nil) call clone syscall with
    sysmon function. (Make thread.)
    • runtime.init() creates forcegc helper goroutine. (runtime.forcegchelper)
    • gcenable() creates background sweeper goroutine. (runtime.bgsweep)

    View full-size slide

  25. 27
    runtime.main (ii)
    • main.init() (-> fmt.init -> os.init…) creates finalizer goroutine.
    (rruntime.runfinq)
    fn := main_init
    fn() (snip)
    0x0000000000428774 <+436>: lea 0x94a25(%rip),%rdx # 0x4bd1a0
    0x000000000042877b <+443>: callq *%rax
    (snip)
    $ objdump -S work
    (gdb) x 0x4bd1a0
    0x4bd1a0: 0x00486c10
    (gdb) disas 0x00486c10
    Dump of assembler code for function main.init:

    View full-size slide

  26. 28
    runtime.main (iii)
    • Runtime finally calls main.main() !!!
    fn := main_main
    fn() (snip)
    0x00000000004287be <+510>: lea 0x949e3(%rip),%rdx # 0x4bd1a8
    0x00000000004287c5 <+517>: callq *%rax
    (snip)
    $ objdump -S work
    (gdb) x 0x4bd1a8
    0x4bd1a8: 0x00486ad0
    (gdb) disas 0x00486ad0
    Dump of assembler code for function main.main:

    View full-size slide

  27. 29
    main.main
    (gdb) disas 0x00486ad0
    Dump of assembler code for function main.main:
    (snip)
    0x0000000000486b31 <+97>: mov 0x38(%rsp),%rax
    0x0000000000486b36 <+102>: mov %rax,0x18(%rsp)
    0x0000000000486b3b <+107>: movq $0x1,0x10(%rsp)
    0x0000000000486b44 <+116>: movl $0x10,(%rsp)
    0x0000000000486b4b <+123>: lea 0x36436(%rip),%rax # 0x4bcf88
    0x0000000000486b52 <+130>: mov %rax,0x8(%rsp)
    0x0000000000486b57 <+135>: callq 0x430240
    0x0000000000486b5c <+140>: mov 0x38(%rsp),%rax
    0x0000000000486b61 <+145>: mov %rax,0x18(%rsp)
    0x0000000000486b66 <+150>: movq $0x2,0x10(%rsp)
    0x0000000000486b6f <+159>: movl $0x10,(%rsp)
    0x0000000000486b76 <+166>: lea 0x3640b(%rip),%rax # 0x4bcf88
    0x0000000000486b7d <+173>: mov %rax,0x8(%rsp)
    0x0000000000486b82 <+178>: callq 0x430240
    0x0000000000486b87 <+183>: mov 0x38(%rsp),%rax
    0x0000000000486b8c <+188>: mov %rax,(%rsp)
    0x0000000000486b90 <+192>: callq 0x45fc10
    (snip)

    View full-size slide

  28. 30
    main.main
    0x0000000000486b4b <+123>: lea 0x36436(%rip),%rax # 0x4bcf88
    0x0000000000486b52 <+130>: mov %rax,0x8(%rsp)
    0x0000000000486b57 <+135>: callq 0x430240
    • “go work(1, &wg)” become runtime.newproc(8, main.work) in build step.
    (gdb) x 0x4bcf88
    0x4bcf88: 0x004869a0
    (gdb) disas 0x004869a0
    Dump of assembler code for function main.work:
    go work(1, &wg)
    =

    View full-size slide

  29. • runtime.main creates a thread for sysmon.
    • runtime.main creates three goroutines.
    • runtime.main calls main.main.
    • main.main creates goroutines for main.work using runtime.newproc.
    31
    Current status

    View full-size slide

  30. How run your goroutines?

    View full-size slide

  31. • Because, G >> M > P
    • Scheduler switches
    • long running goroutines.
    • running system call (I/O based) goroutines.
    • networking, receive/send channel, time.Sleep, etc…
    • Exit goroutines.
    33
    Switching goroutines

    View full-size slide

  32. Long running goroutines

    View full-size slide

  33. Detect long running goroutines
    35
    func sysmon() {
    (snip)
    for {
    (snip)
    // retake P's blocked in syscalls
    // and preempt long running G's
    if retake(now) != 0 {
    idle = 0
    } else {
    idle++
    }
    (snip)
    }
    }
    func retake(now int64) uint32 {
    (snip)
    for i := 0; i < len(allp); i++ {
    (snip)
    if s == _Psyscall {
    (snip)
    } else if s == _Prunning {
    (snip)
    preemptone(_p_)
    }
    }
    (snip)
    }
    • Code: runtime/proc.go
    • Thread for runtime.sysmon detects long running goroutines.

    View full-size slide

  34. Mark long running goroutines
    36
    • Code: runtime/proc.go, stack.go
    • To cause split stack check failure, stakguard make greater than any real sp.
    func preemptone(_p_ *p) bool {
    (snip)
    gp.preempt = true
    (snip)
    gp.stackguard0 = stackPreempt
    return true
    }
    const (
    uintptrMask = 1<<(8*sys.PtrSize) - 1
    stackPreempt = uintptrMask & -1314
    )
    // p/x 18446744073709550302
    // ->. 0xfffffffffffffade
    TULUPQ
    4UBDL
    4UBDLHVBSE


    TUBDL1SFFNQU

    View full-size slide

  35. 37
    Switch long running goroutines
    (gdb) disas 0x04869a0
    Dump of assembler code for function main.work:
    (snip)
    0x00000000004869a9 <+9>: lea -0x10(%rsp),%rax
    0x00000000004869ae <+14>: cmp 0x10(%rcx),%rax
    0x00000000004869b2 <+18>: jbe 0x486abb
    (snip)
    0x0000000000486abb <+283>: callq 0x44e9d0
    (gdb) i registers
    rcx 0xc000030000 824633917440 // <- g
    rsp 0xc0000307d8 0xc0000307d8 // <- gobuf.sp
    IJ
    MP
    TUBDLHVBSE
    0GGTFUCZUFT
    Y SDY

    HSDY
    • Compiler injects stack check in each func.
    • If stack pointer below or equal than stackguard0 then jump to runtime.morestack_noctxt.

    View full-size slide

  36. Switch long running goroutines
    38
    TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0
    MOVL $0, DX
    JMP runtime·morestack(SB)
    TEXT runtime·morestack(SB),NOSPLIT,$0-0
    (snip)
    CALL runtime·newstack(SB)
    func newstack() {
    (snip)
    preempt := atomic.Loaduintptr(&gp.stackguard0) == stackPreempt
    (snip)
    if preempt {
    (snip)
    gopreempt_m(gp) // never return
    }
    }
    • Code: runtime/asm_amd64.s, runtime/stack.go

    View full-size slide

  37. Switch long running goroutines
    39
    • Code: runtime/proc.go
    func gopreempt_m(gp *g) {
    (snip)
    goschedImpl(gp)
    }
    func goschedImpl(gp *g) {
    (snip)
    casgstatus(gp, _Grunning, _Grunnable)
    dropg()
    lock(&sched.lock)
    globrunqput(gp)
    unlock(&sched.lock)
    schedule()
    }
    • Scheduler preempt long running
    goroutines, and put into global run
    queue.
    • After that, schduler find runnable
    goroutines by calling
    runtime.schedule.

    View full-size slide

  38. running system call goroutines

    View full-size slide

  39. Mark running system call goroutines
    41
    TEXT ·Syscall(SB),NOSPLIT,$0-56
    CALL runtime·entersyscall(SB)
    MOVQ a1+8(FP), DI
    MOVQ a2+16(FP), SI
    MOVQ a3+24(FP), DX
    MOVQ $0, R10
    MOVQ $0, R8
    MOVQ $0, R9
    MOVQ trap+0(FP), AX // syscall entry
    SYSCALL
    CMPQ AX, $0xfffffffffffff001
    JLS ok
    MOVQ $-1, r1+32(FP)
    MOVQ $0, r2+40(FP)
    NEGQ AX
    MOVQ AX, err+48(FP)
    CALL runtime·exitsyscall(SB)
    RET
    • Code: syscall/asm_linux_amd64.s

    View full-size slide

  40. Mark running system call goroutines
    42
    func entersyscall() {
    reentersyscall(getcallerpc(), getcallersp())
    }
    func reentersyscall(pc, sp uintptr) {
    (snip)
    save(pc, sp)
    (snip)
    casgstatus(_g_, _Grunning, _Gsyscall)
    (snip)
    atomic.Store(&_g_.m.p.ptr().status, _Psyscall)
    (snip)
    }
    • Code: runtime/proc.go

    View full-size slide

  41. Detect running system call goroutines
    43
    func sysmon() {
    (snip)
    for {
    (snip)
    // retake P's blocked in syscalls
    // and preempt long running G's
    if retake(now) != 0 {
    idle = 0
    } else {
    idle++
    }
    (snip)
    }
    }
    func retake(now int64) uint32 {
    (snip)
    for i := 0; i < len(allp); i++ {
    (snip)
    if s == _Psyscall {
    (snip)
    handoffp(_p_)
    } else if s == _Prunning {
    (snip)
    }
    }
    (snip)
    }
    • Code: runtime/proc.go
    • Thread for runtime.sysmon detects running system call goroutines.

    View full-size slide

  42. 44
    Handoff running system call goroutines
    func startm(_p_ *p, spinning bool) {
    (snip)
    newm(fn, _p_)
    return
    (snip)
    }
    func newm(fn func(), _p_ *p) {
    mp := allocm(_p_, fn)
    (snip)
    newm1(mp)
    }
    func newm1(mp *m) {
    (snip)
    newosproc(mp)
    (snip)
    }
    func newosproc(mp *m) {
    (snip)
    ret := clone(cloneFlags, stk, unsafe.Pointer(mp),
    unsafe.Pointer(mp.g0), unsafe.Pointer(funcPC(mstart)))
    (snip)
    }
    • Code: runtime/proc.go, runtime/
    os_linux.go
    • runtime.handoffp call runtime.startm.
    • Runtime.allocm disassociate p and
    the current m. (runtime.releasep)
    • p associate new m.
    • The m associate new thread for
    runtime.mstart. (schduler find
    runnable goroutines.)

    View full-size slide

  43. networking, receive/send channel,
    time.Sleep, etc…
    -> Skip in this session.

    View full-size slide

  44. Exit goroutines

    View full-size slide

  45. Stack for goroutine
    47
    (gdb) info goroutines
    1 waiting runtime.gopark
    2 runnable runtime.forcegchelper
    3 waiting runtime.gopark
    4 runnable runtime.runfinq
    5 runnable main.work
    * 6 running main.work
    (gdb) goroutine 6 bt
    #0 main.work (i=2, wg=0xc0000120c0)
    #1 0x00000000004507d1 in runtime.goexit ()
    (snip)
    TULUPQ
    4UBDL
    4UBDLHVBSE


    YESVOUJNFHPFYJU
    • Code: runtime/proc.go
    • runtime.newproc1 push runtime.goexit into new “g” stack.

    View full-size slide

  46. Exit goroutines
    48
    TEXT runtime·goexit(SB),NOSPLIT,$0-0
    BYTE $0x90 // NOP
    CALL runtime·goexit1(SB)
    func goexit1() {
    (snip)
    mcall(goexit0)
    }
    func goexit0(gp *g) {
    (snip)
    casgstatus(gp, _Grunning, _Gdead)
    (snip)
    dropg()
    (snip)
    gfput(_g_.m.p.ptr(), gp)
    (snip)
    schedule()
    }
    • When goroutine created by runtime.newproc1 exit, runtime.goexit is called.
    • After that, schduler find runnable goroutines by calling runtime.schedule.
    • Code: runtime/asm_amd64.s
    • Code: runtime/proc.go

    View full-size slide

  47. • We understood assembly.
    • We understood how goroutine works by reading runtime code and
    assembly.
    • The Go scheduler turn I/O blocking task into CPU bound task, so, it seems
    to match with the platinum searcher and servers.
    • At the same time, if we use too much of the goroutine with systemcall, it
    becomes a bottleneck.
    • Therefore, I should review the algorithm “and” Goroutine use.
    • Go
    50
    Summary

    View full-size slide

  48. • Scalable Go Scheduler Design Doc
    • https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit
    • A Quick Guide to Go's Assembler
    • https://golang.org/doc/asm
    • Debugging Go Code with GDB
    • https://golang.org/doc/gdb
    51
    See also

    View full-size slide