Upgrade to Pro — share decks privately, control downloads, hide ads and more …

用 Go 語言打造多台機器 Scale 架構

Bo-Yi Wu
September 08, 2020

用 Go 語言打造多台機器 Scale 架構

由於公司內部有分多個網路環境架構,各自有不同的限制,以及背後都有各自的運算伺服器資源,那該如何用設計同一份 Go 語言架構來進行部署,讓使用者可以將檔案上傳,並自動部署到後端任意運算伺服器處理,最後將結果傳回前端給使用者。

Bo-Yi Wu

September 08, 2020
Tweet

More Decks by Bo-Yi Wu

Other Decks in Technology

Transcript

  1. ⽤用 Go 語⾔言
    打造多台機器 Scale 架構
    Bo-Yi Wu
    2020/09/08

    View Slide

  2. About me
    • Software Engineer in Mediatek
    • Member of Drone CI/CD Platform
    • Member of Gitea Platform
    • Member of Gin Golang Framework
    • Maintain Some GitHub Actions Plugins.
    • Teacher of Udemy Platform: Golang + Drone

    View Slide

  3. NeuroPilot
    MediaTek Ecosystem for AI Development
    https://neuropilot.mediatek.com/

    View Slide

  4. 專案需求
    • 客⼾戶單機版 (Docker 版本)
    • 內建簡易易的 Queue 機制
    • 公司內部架構 (軟體 + 硬體)
    • 多台 Queue 機制 + 硬體模擬
    每個 Job 吃 2core 8GB 記憶體

    View Slide

  5. 為什什麼選 Go 語⾔言
    • 公司環境限制
    • 保護程式邏輯
    • 跨平台編譯 (Windows, Linux)
    • 強⼤大 Concurrency

    View Slide

  6. 客⼾戶單機版

    View Slide

  7. 導入 Queue 機制
    3BCCJU.2
    /42

    View Slide

  8. Service 部分元件
    • Database: SQLite (不需要 MySQL, Postgres)
    • Cache: Memory (不需要 Redis)
    • Queue: ⾃自⾏行行開發

    View Slide

  9. 客⼾戶 IT 環境

    View Slide

  10. 如何實作簡易易的
    Queue 機制
    每個 Job 吃 2core 8GB 記憶體

    View Slide

  11. 先了了解
    Channel Blocking

    View Slide

  12. https://utcc.utoronto.ca/~cks/space/blog/programming/GoConcurrencyStillNotEasy

    View Slide

  13. Limit Concurrency Issue

    View Slide

  14. found := make(chan int)
    limitCh := make(chan struct{}, concurrencyProcesses)
    for i := 0; i < jobCount; i++ {
    limitCh go func(val int) {
    defer func() {
    wg.Done()
    }()
    found }(i)
    }
    jobCount = 100
    concurrencyProcesses = 10

    View Slide

  15. found := make(chan int)
    limitCh := make(chan struct{}, concurrencyProcesses)
    for i := 0; i < jobCount; i++ {
    limitCh go func(val int) {
    defer func() {
    wg.Done()
    }()
    found }(i)
    }
    jobCount = 100
    concurrencyProcesses = 10

    View Slide

  16. 解決⽅方案
    將 limitCh 丟到背景處理理?

    View Slide

  17. found := make(chan int)
    limitCh := make(chan struct{}, concurrencyProcesses)
    for i := 0; i < jobCount; i++ {
    go func() {
    limitCh }()
    go func(val int) {
    defer func() {
    wg.Done()
    }()
    found }(i)
    }
    jobCount = 100
    concurrencyProcesses = 10

    View Slide

  18. found := make(chan int)
    limitCh := make(chan struct{}, concurrencyProcesses)
    for i := 0; i < jobCount; i++ {
    go func() {
    limitCh }()
    go func(val int) {
    defer func() {
    wg.Done()
    }()
    found }(i)
    }
    無法解決 Limit Concurrency
    jobCount = 100
    concurrencyProcesses = 10

    View Slide

  19. 解決⽅方案
    重新改寫架構

    View Slide

  20. found := make(chan int)
    queue := make(chan int)
    go func(queue chanfor i := 0; i < jobCount; i++ {
    queue }
    close(queue)
    }(queue)
    for i := 0; i < concurrencyProcesses; i++ {
    go func(queue for val := range queue {
    defer wg.Done()
    found }
    }(queue, found)
    }
    jobCount = 100
    concurrencyProcesses = 10

    View Slide

  21. Internal Queue
    單機版

    View Slide

  22. View Slide

  23. Setup Consumer

    View Slide

  24. type Consumer struct {
    inputChan chan int
    jobsChan chan int
    }
    const PoolSize = 200
    func main() {
    // create the consumer
    consumer := Consumer{
    inputChan: make(chan int, 1),
    jobsChan: make(chan int, PoolSize),
    }
    }

    View Slide

  25. View Slide

  26. func (c *Consumer) queue(input int) {
    fmt.Println("send input value:", input)
    c.jobsChan }
    func (c *Consumer) worker(num int) {
    for job := range c.jobsChan {
    fmt.Println("worker:", num, " job value:", job)
    }
    }
    for i := 0; i < WorkerSize; i++ {
    go consumer.worker(i)
    }

    View Slide

  27. rewrite queue func
    func (c *Consumer) queue(input int) bool {
    fmt.Println("send input value:", input)
    select {
    case c.jobsChan return true
    default:
    return false
    }
    }
    避免使⽤用者⼤大量量送資料進來來

    View Slide

  28. Shutdown with
    Sigterm Handling

    View Slide

  29. func WithContextFunc(ctx context.Context, f func()) context.Context {
    ctx, cancel := context.WithCancel(ctx)
    go func() {
    c := make(chan os.Signal)
    signal.Notify(c, syscall.SIGINT, syscall.SIGTERM)
    defer signal.Stop(c)
    select {
    case case f()
    cancel()
    }
    }()
    return ctx
    }

    View Slide

  30. func (c Consumer) startConsumer(ctx context.Context) {
    for {
    select {
    case job := if ctx.Err() != nil {
    close(c.jobsChan)
    return
    }
    c.jobsChan case close(c.jobsChan)
    return
    }
    }
    }
    select 不保證讀取 Channel 的順序性

    View Slide

  31. Cancel by ctx.Done() event
    func (c *Consumer) worker(num int) {
    for job := range c.jobsChan {
    fmt.Println("worker:", num, " job value:", job)
    }
    }
    Channel 關閉後,還是可以讀取資料到結束

    View Slide

  32. Graceful shutdown
    with worker
    sync.WaitGroup

    View Slide

  33. View Slide

  34. wg := &sync.WaitGroup{}
    wg.Add(WorkerSize)
    // Start [PoolSize] workers
    for i := 0; i < WorkerSize; i++ {
    go consumer.worker(i)
    }

    View Slide

  35. WaitGroup
    WaitGroup
    WaitGroup
    WaitGroup

    View Slide

  36. func (c Consumer) worker(wg *sync.WaitGroup) {
    defer wg.Done()
    for job := range c.jobsChan {
    // handle the job event
    }
    }

    View Slide

  37. Add WaitGroup
    after Cancel Function

    View Slide

  38. func WithContextFunc(ctx context.Context, f func()) context.Context {
    ctx, cancel := context.WithCancel(ctx)
    go func() {
    c := make(chan os.Signal)
    signal.Notify(c, syscall.SIGINT, syscall.SIGTERM)
    defer signal.Stop(c)
    select {
    case case cancel()
    f()
    }
    }()
    return ctx
    }
    Add WaitGroup after Cancel Function

    View Slide

  39. wg := &sync.WaitGroup{}
    wg.Add(numberOfWorkers)
    ctx := signal.WithContextFunc(
    context.Background(),
    func() {
    wg.Wait()
    close(finishChan)
    },
    )
    go consumer.startConsumer(ctx)

    View Slide

  40. End of Program
    select {
    case case err := if err != nil {
    return err
    }
    }

    View Slide

  41. 單機版限制
    系統資源不⾜足

    View Slide

  42. 系統架構

    View Slide

  43. Server - Agent

    View Slide

  44. 4FSWFS᪑"HFOUߔ௨ํࣜ
    https://github.com/hashicorp/go-retryablehttp

    View Slide

  45. r := e.Group("/rpc")
    r.Use(rpc.Check())
    {
    r.POST("/v1/healthz", web.RPCHeartbeat)
    r.POST("/v1/request", web.RPCRquest)
    r.POST("/v1/accept", web.RPCAccept)
    r.POST("/v1/details", web.RPCDetails)
    r.POST("/v1/updateStatus", web.RPCUpdateStatus)
    r.POST("/v1/upload", web.RPCUploadBytes)
    r.POST("/v1/reset", web.RPCResetStatus)
    }
    Check RPC Secret

    View Slide

  46. /rpc/v1/accept
    Update jobs set version = (oldVersion + 1)
    where machine = "fooBar" and version = oldVersion

    View Slide

  47. Create multiple worker

    View Slide

  48. if r.Capacity != 0 {
    var g errgroup.Group
    for i := 0; i < r.Capacity; i++ {
    g.Go(func() error {
    return r.start(ctx, 0)
    })
    time.Sleep(1 * time.Second)
    }
    return g.Wait()
    }
    單機版設定多個 Worker

    View Slide

  49. for {
    var (
    id int64
    err error
    )
    if id, err = r.request(ctx); err != nil {
    time.Sleep(1 * time.Second)
    continue
    }
    go func() {
    if err := r.start(ctx, id); err != nil {
    log.Error().Err(err).Msg("runner: cannot start the job")
    }
    }()
    }
    公司內部 + Submit Job

    View Slide

  50. Break for and select loop
    func (r *Runner) start(ctx context.Context, id int64) error {
    LOOP:
    for {
    select {
    case return ctx.Err()
    default:
    r.poll(ctx, id)
    if r.Capacity == 0 {
    break LOOP
    }
    }
    time.Sleep(1 * time.Second)
    }
    return nil
    }

    View Slide

  51. 即時取消正在執⾏行行的任務?

    View Slide

  52. View Slide

  53. View Slide

  54. View Slide

  55. Context with Cancel or Timeout
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()
    timeout, cancel := context.WithTimeout(ctx, 60*time.Minute)
    defer cancel()
    Job03 context

    View Slide

  56. Context with Cancel or Timeout
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()
    timeout, cancel := context.WithTimeout(ctx, 60*time.Minute)
    defer cancel()
    Job03 context
    Job05 context

    View Slide

  57. Watch the Cancel event (Agent)
    go func() {
    done, _ := r.Manager.Watch(ctx, id)
    if done {
    cancel()
    }
    }()

    View Slide

  58. Handle cancel event on Server
    subscribers: make(map[chan struct{}]int64),
    cancelled: make(map[int64]time.Time),

    View Slide

  59. User cancel running job
    c.Lock()
    c.cancelled[id] = time.Now().Add(time.Minute * 5)
    for subscriber, build := range c.subscribers {
    if id == build {
    close(subscriber)
    }
    }
    c.Unlock()

    View Slide

  60. Agent subscribe the cancel event
    for {
    select {
    case return false, ctx.Err()
    case c.Lock()
    _, ok := c.cancelled[id]
    c.Unlock()
    if ok {
    return true, nil
    }
    case return true, nil
    }
    }

    View Slide

  61. case c.Lock()
    _, ok := c.cancelled[id]
    c.Unlock()
    if ok {
    return true, nil
    }

    View Slide

  62. case c.Lock()
    _, ok := c.cancelled[id]
    c.Unlock()
    if ok {
    return true, nil
    }
    1 Cancel

    View Slide

  63. case c.Lock()
    _, ok := c.cancelled[id]
    c.Unlock()
    if ok {
    return true, nil
    }
    1
    2 Reconnect Server
    Cancel

    View Slide

  64. 感謝參參與

    View Slide