pt&Goroutines

 pt&Goroutines

pt(the_platinum_searcher) を高速化するために Goroutines まわりで試したことを発表しました。
http://connpass.com/event/6370/

Cd3d2cb2dadf5488935fe0ddaea7938a?s=128

monochromegane

May 31, 2014
Tweet

Transcript

  1. pt &Goroutine - GoCon 2014 spring -

  2. MIYAKE Yusuke (@monochromegane)

  3. GMO Pepabo, Inc.

  4. grep ͯ͠·͔͢ʁ

  5. grep?

  6. ack?

  7. ag?

  8. pt The Platinum Searcher

  9. Written in Golang

  10. Mac OSX Linux Windows

  11. UTF-8 EUC-JP Shift-JIS

  12. AND

  13. fast ! ack go 6.24s user 1.06s system 99% cpu

    7.304 total # ack ag go 0.88s user 1.39s system 221% cpu 1.027 total # ag pt go 1.09s user 1.01s system 235% cpu 0.892 total # pt
  14. How?

  15. Goroutine & Channel

  16. ͍ͬ͠ΐʹߴ଎Խͯ͠Έ·͠ΐ͏

  17. 1. ϑΝΠϧΛݕࡧͯ͠(find) 2. จࣈྻΛݕࡧͯ͠(grep) 3. ݁ՌΛදࣔ͢Δ(print) ύλʔϯݕࡧͱ͸

  18. Approach-0 ! ॱ൪ʹ

  19. find

  20. find grep

  21. find grep print

  22. // find find := find.Find{Option: self.Option} find.Do(self.Root) ! // grep

    grep := grep.Grep{ Files: find.Files, // result Pattern: self.Pattern, Option: self.Option} grep.Do() ! // print print := print.Print{ Matches: grep.Matches, // result Pattern: self.Pattern, Option: self.Option} print.Do()
  23. > the_simple_searcher go $GOROOT > /dev/null

  24. 0.79 seconds

  25. Approach-1 ! ฒߦʹ

  26. Goroutine

  27. • GoݴޠͰฒߦॲཧΛ࣮ݱ͢Δ • εϨουɺίϧʔνϯͱ͸ҧ͏ • Concurrency(ฒߦ)ͱParallelism(ฒྻ) • ܰྔ • go

    f()
  28. find grep print go go go

  29. Channel

  30. • Goroutineؒͷϝοηʔδϯά • ஋ͷૹड৴ • όοϑΝʹΑΔϒϩοΫ

  31. find grep print go go go

  32. $IBOFM find grep print $IBOFM go go go

  33. $IBOFM find grep print $IBOFM go go go

  34. // channel files := make(chan *string, self.Option.Cap) matches := make(chan

    *grep.Match, self.Option.Cap) done := make(chan bool) ! // find find := find.Find{Files: files, Option: self.Option} go find.Do(self.Root) ! // grep grep := grep.Grep{ Files: files, Matches: matches, Pattern: self.Pattern, Option: self.Option} go grep.Do() ! // print print := print.Print{ Done: done, Matches: matches, Pattern: self.Pattern, Option: self.Option} go print.Do() ! <-done // block
  35. walkFunc := func(path string, info os.FileInfo, err error) error {

    if info.IsDir() { return nil } self.Files <- &path // send return nil } ! filepath.Walk(root, walkFunc) close(self.Files) // close
  36. for file := range self.Files { // receive ( <-self.Files

    ) fh, err := os.Open(*file) if err != nil { panic(err) } ! f := bufio.NewReader(fh) ! var buf []byte var lineNum = 1 for { buf, _, err = f.ReadLine() if err != nil { break } line := string(buf) if strings.Contains(line, self.Pattern) { self.Matches <- &Match{*file, lineNum, line} // send } lineNum++ } fh.Close() } close(self.Matches) // close
  37. for match := range self.Matches { // receive fmt.Printf("%s:%d:%s\n", match.Path,

    match.Num, match.Match) } self.Done <- true // send
  38. > the_simple_searcher go $GOROOT > /dev/null

  39. 0.79 -> 0.87 seconds

  40. ?

  41. buffer

  42. • Channelͷड෇༰ྔ • ch := make(chan ܕ, ༰ྔ) • ༰ྔ·Ͱ͸ड෇

    • ༰ྔ௒͑Δͱૹ৴ଆ͸ड෇଴ͪ • ड৴͢Δͱ༰ྔ͕ͻͱۭͭ͘ • ༰ྔ͕0ͷ৔߹ɺৗʹ଴ͭ
  43. // channel with buffer files := make(chan *string, self.Option.Cap) matches

    := make(chan *grep.Match, self.Option.Cap) done := make(chan bool) // always wait
  44. > the_simple_searcher go $GOROOT > /dev/null

  45. 0.79 -> 0.8 seconds

  46. Approach-2 ! ΋ͬͱฒߦʹ

  47. $IBOFM find grep print $IBOFM go go go

  48. $IBOFM find grep print $IBOFM go go go grep grep

    grep
  49. var wg sync.WaitGroup for file := range self.Files { wg.Add(1)

    // goroutineͷىಈ਺ΛΠϯΫϦϝϯτ (தུ) go func(self *Grep, file *string) { defer wg.Done() // goroutine͕׬ྃͨ͠Βىಈ਺ΛσΫϦϝϯτ for { ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ(தུ) } fh.Close() ! }(self, file) // ΫϩʔδϟΛgoroutineʹ͢Δͱ͖͸ม਺ͷڞ༗ʹ஫ҙ ! } wg.Wait() // ෆಛఆ਺ͷgoroutine͕શͯऴྃ͢ΔͷΛ଴ͭ close(self.Matches)
  50. > the_simple_searcher go $GOROOT > /dev/null

  51. panic ! too many open files

  52. var wg sync.WaitGroup sem := make(chan bool, self.Option.Cap) // ىಈ͢Δgoroutineͷ਺Λ੍ޚ͢Δchannel

    for file := range self.Files { sem <- true // goroutineͷىಈ਺(channelͷbuffer)͕͍ͬͺ͍ͳΒ଴ͭ wg.Add(1) (தུ) go func(self *Grep, file *string) { defer wg.Done() for { ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ(தུ) } fh.Close() <-sem // ಉ࣌ىಈ਺channelͷbufferʹۭ͖Λͭ͘Δ ! }(self, file) ! } wg.Wait() close(self.Matches)
  53. > the_simple_searcher go $GOROOT > /dev/null

  54. 0.79 -> 0.8 seconds

  55. Approach-3 ! ฒྻʹ

  56. GOMAXPROCS

  57. • Goroutineͷฒྻ౓ • σϑΥϧτ͸1 • runtime.NumCPU()ͰίΞ਺Λऔಘ • runtime.GOMAXPROCS()Ͱฒྻ౓Λઃఆ

  58. > the_simple_searcher go $GOROOT > /dev/null

  59. 0.79 -> 0.55 ! seconds

  60. benchmark ! • Mac OSX(10.9.3) • CPU: 2.5GHz Core i5(2Core)

    • Memory: 8GB • Go: 1.2.2
  61. #V⒎FS (0."9130$4 "QQSPBDI

  62. ฒߦԽͯ͠ͳ͍ͷͰ ฒྻԽͯ͠΋มΘΒͣ ίΞ਺Ҏ্ͷࢦఆ͸ ޮՌͳ͠ ଌఆͯ͠ௐ੔͠ͳ͍ͱ ৔߹ʹΑͬͯ͸஗͘ͳΔ #V⒎FS (0."9130$4 "QQSPBDI

  63. –Rob Pike • Concurrency is powerful. • Concurrency is not

    parallelism. • Concurrency enables parallelism. • Concurrency makes parallelism (and scaling and everything else) easy.
  64. એ఻ ϖύϘͰ͸ΤϯδχΞΛืू͍ͯ͠·͢ɻ ڞʹαʔϏεΛੜΈग़͠ҭͯͯ͘ΕΔ৽͍͠஥ؒ Λ଴͍ͬͯ·͢ɻ ! http://pepabo.com/recruit/career/engineer/

  65. ͓ΘΓ