$30 off During Our Annual Pro Sale. View Details »

ゼロから作る Protocol Buffer のパーサーとレキサー / Writing Protocol Buffer Parser/Lexer in Go from scratch

ゼロから作る Protocol Buffer のパーサーとレキサー / Writing Protocol Buffer Parser/Lexer in Go from scratch

yoheimuta

April 23, 2022
Tweet

More Decks by yoheimuta

Other Decks in Programming

Transcript

  1. Yohei Yoshimuta θϩ͔Β࡞Δ Protocol Buffer ͷ ύʔαʔͱϨΩαʔ Go Conference 23

    April 2022 ϋογϡλά: #gocon #goconB
  2. ࣗݾ঺հ

  3. ࣗݾ঺հʢॴଐʣ ύϥϨϧ: • Ϣʔβʔͷ 70 % ͕ Z ੈ୅ •

    1 ೔ͷ௨࿩࣌ؒ͸ฏۉ 3 ࣌ؒ Go ͷར༻࣮੷: • WebSocket: github.com/kuiperbelt/kuiperbelt • σʔλసૹ: Cloud DataFlow Ͱ MySQL ͔Β BigQuery ΁
  4. σʔλߏ଄ͷෆҰக ଟ͘ͷϓϩάϥϛϯά՝୊͸ɺ͋ΔσʔλΛผͷߏ଄Λ࣋ͬͨσʔλʹஔ͖ ׵͑Δ͜ͱͰ͋Δ • ςΩετΛߦʹ෼ׂ • όΠφϦͷγϦΞϥΠζɾσγϦΞϥΠζ • ਖ਼نදݱ •

    ϨΩαʔ • ύʔαʔ ग़య: Lexical Scanning in Go: https://talks.golang.org/2011/lex.slide
  5. ϨΩαʔʢࣈ۟ղੳثʣ จࣈετϦʔϜΛτʔΫϯʹஔ͖׵͑Δ enum MachineState { option allow_alias = true ;

    STOPPED = 0 ; RUNNING = 1 ; } enum MachineState { option allow_alias = true; STOPPED = 0; RUNNING = 1; }
  6. ϨΩαʔʢࣈ۟ղੳثʣ จࣈετϦʔϜΛτʔΫϯʹஔ͖׵͑Δ enum MachineState { option allow_alias = true ;

    STOPPED = 0 ; RUNNING = 1 ; } enum MachineState { option allow_alias = true; STOPPED = 0; RUNNING = 1; }
  7. ύʔαʔʢߏจղੳثʣ τʔΫϯΛ໦ߏ଄ͷσʔλʹஔ͖׵͑Δ enum MachineState { option allow_alias = true ;

    STOPPED = 0 ; RUNNING = 1 ; } Enum Option Field Field Name = MachineState Name = allow_alias Value = true Name =RUNNING Value = 0 Name =STOPPED Value = 1
  8. ύʔαʔʢߏจղੳثʣ τʔΫϯΛ໦ߏ଄ͷσʔλʹஔ͖׵͑Δ enum MachineState { option allow_alias = true ;

    STOPPED = 0 ; RUNNING = 1 ; } Enum Option Field Field Name = MachineState Name = allow_alias Value = true Name =RUNNING Value = 0 Name =STOPPED Value = 1
  9. ࣮૷͸೉͍͜͠ͱ΋ ෳࡶͳςΩετ͸ͦΕࣗମಠࣗͷϧʔϧ΍ߏจ͕͋ΓɺυΩϡϝϯτԽ͞Ε͍ͯ ͳ͍ߏจ΋͠͹͠͹ ్தͰखʹෛ͑ͳ͘ͳΔ͜ͱ΋͋Δ Go ͸ύʔαʔΛࣗ࡞͢Δे෼ͳ؀ڥΛఏڙͯ͘͠Ε͍ͯΔʂ …ͨͩ͠ɺ৻ॏͳઃܭ͸ඞཁ Protocol Buffer εΩʔϚʹߜͬͯύʔαʔͷઃܭͱ࣮૷Λݟ͍͖ͯ·͠ΐ͏

  10. ࠓ೔࿩͢͜ͱ • ύʔαʔͷϢʔεέʔε • ύʔαʔΛͳͥࣗ࡞͠Α͏ͱࢥͬͨͷ͔ • Ͳ͏͍͏ઃܭɾΞϓϩʔνΛऔͬͨͷ͔ • ࣮૷ৄࡉ •

    Ԡ༻తͳ՝୊ • ϨΩαʔͷόοϑΝઃܭ • ந৅ߏจ໦ͷΠϯλʔϑΣʔεઃܭ
  11. ϨΩαʔͱύʔαʔ͸ॏཁͳߏ੒ཁૉ • ίϯύΠϥ ίϯύΠϥͷϑϩϯτΤϯυ ? ? ? ? ? ?

    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  12. ίϯύΠϥͷϑϩϯτΤϯυ ϨΩαʔͱύʔαʔ͸ॏཁͳߏ੒ཁૉ • ίϯύΠϥ • ੩తղੳ ? ? ? ?

    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  13. ίϯύΠϥͷϑϩϯτΤϯυ ϨΩαʔͱύʔαʔ͸ॏཁͳߏ੒ཁૉ • ίϯύΠϥ • ੩తղੳ • ίʔυੜ੒ ? ?

    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  14. ίϯύΠϥͷϑϩϯτΤϯυ ϨΩαʔͱύʔαʔ͸ॏཁͳߏ੒ཁૉ • ίϯύΠϥ • ੩తղੳ • ίʔυੜ੒ • σʔλϑΝΠϧ

    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  15. Protocol Buffer εΩʔϚ Protocol Buffers ͸ Google ͕։ൃͨ͠ݴޠɾϓϥοτϑΥʔϜʹґଘ͠ͳ͍ɺߏ ଄Խ͞ΕͨσʔλΛγϦΞϥΠζ͢Δ࢓૊Έ XML

    ΍ JSON ͷΑ͏ʹ࢖͑Δ͕ɺΑΓখ͘͞ɾૣ͘ɾ؆୯ ཉ͍͠σʔλͷߏ଄ʢεΩʔϚʣΛࣄલʹఆ͓ٛͯ͘͠ message Person { required string name = 1; required int32 id = 2; optional string email = 3; } ग़య: Protocol Bu ff ers | Google Developers: https://developers.google.com/protocol-bu ff ers
  16. ݹ͍ύʔαʔ ਖ਼نදݱΛ࢖ͬͨύʔαʔ ϓϩδΣΫτʹಛԽͨ͠੩తղੳπʔϧͷͨΊʹ։ൃͨ͠ • Ϧϯλʔ

  17. ݹ͍ύʔαʔ ਖ਼نදݱΛ࢖ͬͨύʔαʔ ϓϩδΣΫτʹಛԽͨ͠੩తղੳπʔϧͷͨΊʹ։ൃͨ͠ • Ϧϯλʔ • ίʔυδΣωϨʔλʔʢόϦσʔγϣϯʣ

  18. ݹ͍ύʔαʔ ਖ਼نදݱΛ࢖ͬͨύʔαʔ ϓϩδΣΫτʹಛԽͨ͠੩తղੳπʔϧͷͨΊʹ։ൃͨ͠ • Ϧϯλʔ • ίʔυδΣωϨʔλʔʢόϦσʔγϣϯʣ • υΩϡϝϯτδΣωϨʔλʔ

  19. ৽͍͠ύʔαʔ ݹ͍ύʔαʔΛஔ͖׵͔͑ͨͬͨ ଟ͘ͷ໰୊Λ๊͍͑ͯͨ: • ͢΂ͯͷจ๏نଇʹରԠ͍ͯ͠ͳ͍

  20. ৽͍͠ύʔαʔ ݹ͍ύʔαʔΛஔ͖׵͔͑ͨͬͨ ଟ͘ͷ໰୊Λ๊͍͑ͯͨ: • ͢΂ͯͷจ๏نଇʹରԠ͍ͯ͠ͳ͍ • ίʔυ͕֦ுͮ͠Β͍

  21. ϨΩαʔɺύʔαʔɺΞφϥΠ βʔͷ໾ׂΛ໌֬ʹͯ͠ɺઃܭ ͠௚ͨ͠ go-protoparser ? ? ? ? ? ?

    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Protocol Buffer ? ? ? ? ? ? ? ? ? ? ? ? ? ? go-protoparser protolint
  22. Ͳ͏͍͏ઃܭɾΞϓϩʔνΛऔͬ ͨͷ͔

  23. ࠓ೔͸ Protocol Buffer εΩʔϚͷύʔαʔʹয఺Λ౰ͯΔ syntax = "proto3"; package tutorial; message

    Outer { // A message is an aggregate a set of typed fi elds. message Inner { // Level 2 int64 ival = 1; } repeated Inner inner_message = 2; EnumAllowingAlias enum_ fi eld =3; map<int32, string> my_map = 4; }
  24. Parse ؔ਺ ͜ͷؔ਺͸จࣈετϦʔϜΛҾ਺ʹऔͬͯɺந৅ߏจ໦Λฦ͢ io.Reader ΠϯλʔϑΣʔεΛ࢖͑͹ɺ༷ʑͳೖྗʹରԠͰ͖Δ func Parse(input io.Reader) (*parser.Proto, error)

  25. Proto ܕ εΩʔϚ͸࣍ͷจ๏نଇʹै͏: • syntax จ͔Β͸͡·Δ • ͦͷޙ͸ɺimport จɺpackage จɺoption

    จɺmessage จɺenum จɺ service จɺemptyStatement จͷͲΕ͔Λ܁Γฦ͢ // Proto represents a protocol buffer de fi nition. type Proto struct { Syntax *Synta x ProtoBody []interface{} }
  26. Proto ܕ syntax = "proto3"; package tutorial; message Outer {

    message Inner { // Level 2 int64 ival = 1; } repeated Inner inner_message = 2; EnumAllowingAlias enum_ fi eld =3; map<int32, string> my_map = 4; } Proto Syntax Package Message Message Field Field MapField Field ProtoBody
  27. Ͳ͏΍ͬͯύʔε͢Δʁ Ξϓϩʔν͸ෳ਺͋Δ: • yacc/lex ͷΑ͏ͳπʔϧΛ࢖͏ • ਖ਼نදݱΛ࢖͏

  28. πʔϧ πʔϧΛ࢖͏͜ͱࣗମ͸໰୊ͳ͍͕: • ͦΕઐ༻ͷݴޠΛֶͿඞཁ͕ൃੜ͕ͪ͠

  29. πʔϧ πʔϧΛ࢖͏͜ͱࣗମ͸໰୊ͳ͍͕: • ͦΕઐ༻ͷݴޠΛֶͿඞཁ͕ൃੜ͕ͪ͠ • ༻్͕߹Θͳ͍͜ͱ΋͋Δ • ͲͷΑ͏ͳจ๏نଇʹͰ΋ରԠͰ͖Δͱ͍͏Θ͚Ͱ͸ͳ͍

  30. ਖ਼نදݱ ࠷ॳʹ࡞ͬͨύʔαʔͷΞϓϩʔν • Մಡੑ͕௿͍

  31. ਖ਼نදݱ ࠷ॳʹ࡞ͬͨύʔαʔͷΞϓϩʔν • Մಡੑ͕௿͍ • ͢΂ͯͷจ๏نଇΛ໢ཏͰ͖ͳ͚Ε͹݁ہɺঢ়ଶ؅ཧ͕ඞཁʹͳΔ

  32. ਖ਼نදݱ ࠷ॳʹ࡞ͬͨύʔαʔͷΞϓϩʔν • Մಡੑ͕௿͍ • ͢΂ͯͷจ๏نଇΛ໢ཏͰ͖ͳ͚Ε͹݁ہɺঢ়ଶ؅ཧ͕ඞཁʹͳΔ • ࠶ؼతͳϚονϯά͸೉͍͠

  33. ϨΩαʔͱύʔαʔΛࣗ࡞͠Α͏ ϨΩαʔͱύʔαʔʹ෼͚࣮ͯ૷͢Δͱɺෳࡶͳจ๏نଇ΋ѻ͍΍͘͢ ͳΔ͜ͱ͕஌ΒΕ͍ͯΔ จ๏نଇͷ؍఺͔Β͸ɺϨΩαʔ͸τʔΧφΠβʔͱ΋ݺ͹ΕΔɻύʔ αʔ͸γϯλοΫεΞφϥΠβʔͱ΋ݺ͹ΕΔɻ ίϯύΠϥ΍ΠϯλϓϦλ͸௨ৗ͜ΕΒΛผʑͷϓϩηεͱͯ͠ѻ͏ɻ͜ ͷઃܭ͸ͦͷޙͷ࣮૷Λָʹͯ͘͠ΕΔ

  34. Parser io.Reader Lexer Proto ύʔαʔͷॳظԽ ύʔαʔ͸ϨΩαʔʹґଘ͢Δ ϨΩαʔ͸จࣈετϦʔϜʹґଘ͢Δ // Parse parses

    a Protocol Buffer fi le. func Parse(input io.Reader) (*parser.Proto, error) { p := parser.NewParser( lexer.NewLexer( input, ), ) return p.ParseProto() }
  35. ࣮૷ৄࡉʢϨΩαʔʣ

  36. Parser io.Reader Lexer Proto

  37. ࣈ۟ղੳʹΑͬͯಘΒΕͨΞΠςϜ ࣈ۟ղੳͷ݁Ռ͸ 2 ͭͷϑΟʔϧυͰද͢ • ܕ: ྫ͑͹ Number • ஋:

    ྫ͑͹ “100” // Lexer is a tokenizer. type Lexer struct { Token Token // Token is the lexical type. Text string // Text is the lexical value. It has a cool name "lexeme."
  38. Token ܕ τʔΫϯ͸ͨͩͷ Integer ఆ਺ iota Λ࢖ͬͯɺ͢΂ͯͷτʔΫϯΛྻڍ͢Δ type Token int

    // Token represents a lexical type. const ( // The result of Scan is one of these tokens. TILLEGAL Token = iota // Special tokens TEO F TIDENT // Identi fi ers
  39. TINTLIT // Literals TFLOATLI T TBOOLLI T TSTRLI T //

    Misc characters TSEMICOLON // ; TCOLON // : TEQUALS // = TQUOTE // " or ' ... TSYNTAX // Keywords TPACKAG E TMESSAG E ...
  40. ϨΩαʔͷΤϯτϦʔؔ਺ ύʔαʔ͸ lexer.Next() ͷݺͼग़͠ͱ lexer ͷΞΠςϜͷࢀরɺΛ܁Γ ฦ͠ߦ͏ // Next scans

    the read buffer. func (lex *Lexer) Next() { var err error lex.Token, lex.Text, lex.Pos, err = lex.scan() if err != nil { log.Printf(`lexer encountered the error "%v"`, err) } }
  41. bufio.Reader Lexer ͸จࣈετϦʔϜΛ rune ୯ҐͰૢ࡞ ͢Δ ͦͷͨΊʹɺLexer ͸ io.Reader Λ

    bufio.Reader Ͱϥοϓ͍ͯ͠Δ bufio.Reader ͸όοϑΝϦϯάͱςΩετ I/O ͷͨΊͷϔϧύʔؔ਺Λఏڙͯ͘͠ΕΔ // Lexer is a tokenizer. type Lexer struct { ... r *bu fi o.Reade r } // NewLexer creates a new lexer. func NewLexer(r io.Reader) *Lexer { l := &Lexer{ r: bu fi o.NewReader(r), } return l }
  42. read ؔ਺ var eof = rune(0) func (l *Lexer) read()

    rune { ch, _, err := l.r.ReadRune() if err != nil { return eo f } return c h }
  43. peek ؔ਺ func (l *Lexer) peek() rune { ch :=

    l.read() l.r.UnreadRune() return c h }
  44. ࣮ࡍʹࣈ۟ղੳΛߦ͏ؔ਺ εςʔτϚγϯ͸ switch จΛ ࢖࣮ͬͯ૷Ͱ͖Δ • ࠷ॳͷ rune Λ peek

    ͢Δ func (l *Lexer) scan() (Token, string, error) { ch := l.peek()
  45. ࣮ࡍʹࣈ۟ղੳΛߦ͏ؔ਺ εςʔτϚγϯ͸ switch จΛ ࢖࣮ͬͯ૷Ͱ͖Δ • ࠷ॳͷ rune Λ peek

    ͢Δ • ࣍ͷΞΫγϣϯΛܾΊΔ func (l *Lexer) scan() (Token, string, error) { ch := l.peek() switch { case unicode.IsSpace(ch): case ch == eof: case isQuote(ch):
  46. ࣮ࡍʹࣈ۟ղੳΛߦ͏ؔ਺ εςʔτϚγϯ͸ switch จΛ ࢖࣮ͬͯ૷Ͱ͖Δ • ࠷ॳͷ rune Λ peek

    ͢Δ • ࣍ͷΞΫγϣϯΛܾΊΔ • ͦͷΞΫγϣϯΛ࣮ߦ͢Δ func (l *Lexer) scan() (Token, string, error) { ch := l.peek() switch { case unicode.IsSpace(ch): l.read() return l.scan() case ch == eof: return TEOF, "", nil case isQuote(ch): lit, err := l.scanStrLit() if err != nil { return TILLEGAL, "", err } return TSTRLIT, lit, nil
  47. ࣝผࢠΛಡΈऔΔ // ident = letter { letter | decimalDigit |

    "_" } func (l *Lexer) scanIdent() string { ident := string(l.read()) for { next := l.peek() switch { case isLetter(next), isDecimalDigit(next), next == '_': ident += string(s.read()) default: return iden t }
  48. จࣈͱ਺ࣈΛಡΈऔΔ // letter = "A" … "Z" | "a" …

    "z" func isLetter(r rune) bool { // ref. https://en.wikipedia.org/wiki/List_of_Unicode_characters#Basic_Latin if r < 'A' { return false } if r > 'z' { return false } if r > 'Z' && r < 'a' { return false } return true } // decimalDigit = "0" … "9" func isDecimalDigit(r rune) bool { return '0' <= r && r <= '9 ' }
  49. σϞ: εΩʔϚΛࣈ۟ղੳ “Lex” ͢Δ import ( "fmt" "os" "github.com/yoheimuta/go-protoparser/v4/lexer" )

    func main() { lex := lexer.NewLexer(os.Stdin) for !lex.IsEOF() { lex.Next() fmt.Println("[", lex.Text, "]") } }
  50. σϞ: εΩʔϚΛࣈ۟ղੳ “Lex” ͢Δ import ( "fmt" "os" "github.com/yoheimuta/go-protoparser/v4/lexer" )

    func main() { lex := lexer.NewLexer(os.Stdin) for !lex.IsEOF() { lex.Next() fmt.Println("[", lex.Text, "]") } }
  51. ࣮૷ৄࡉʢύʔαʔʣ

  52. Parser io.Reader Lexer Proto

  53. ύʔεΛ͸͡ΊΔ ࠷ॳʹ syntax จΛύʔε͠ ͯɺͦͷޙʹ body ͦΕͧΕΛผͷؔ਺ʹ෼͚ͯ ͓͘ͱϢχοτςετ͕ॻ͖ ΍͘͢ͳΔ //

    ParseProto parses the proto. // proto = syntax { import | package | option | topLevelDef | emptyStatement } func (p *Parser) ParseProto() (*Proto, error) { syntax, err := p.ParseSyntax() if err != nil { return nil, err } protoBody, err := p.parseProtoBody() if err != nil { return nil, err } return &Proto{ Syntax: syntax, ProtoBody: protoBody, }, nil }
  54. Syntax Protocol Buffer εΩʔϚʹ͸ 2 ͭͷόʔδϣϯ͕͋Δ • syntax = "proto3"

    • syntax = "proto2" // Syntax is used to de fi ne the protobuf version. type Syntax struct { ProtobufVersion string }
  55. 4 ͭͷεςοϓͰεΩʔϚΛύʔε͢Δ 1. ܁Γฦ͠ࣈ۟ղੳ “Lex” ͢Δ

  56. func (p *Parser) ParseSyntax() (*Syntax, error) { p.lex.Next() if p.lex.Token

    != lexer.TSYNTAX { return nil, p.unexpected("syntax") // unexpected is a helper function to report the error } p.lex.Next() if p.lex.Token != lexer.TEQUALS { return nil, p.unexpected("=") } p.lex.Next() if p.lex.Token != lexer.TQUOTE { return nil, p.unexpected("quote") } p.lex.Next() if p.lex.Text != "proto3" && p.lex.Text != "proto2" { return nil, p.unexpected("proto3 or proto2") } version := p.lex.Tex t p.lex.Next() if p.lex.Token != lexer.TQUOTE { return nil, p.unexpected("quote") } p.lex.Next() if p.lex.Token != lexer.TSEMICOLON { return nil, p.unexpected(";") } return &Syntax{ ProtobufVersion: version }, nil }
  57. 4 ͭͷεςοϓͰεΩʔϚΛύʔε͢Δ 1. ܁Γฦ͠ࣈ۟ղੳ “Lex” ͢Δ 2. ઌಡΈ “Lookahead” ͢Δ

  58. ઌಡΈʢϧοΫΞϔουʣύʔαʔ yacc ͕ੜ੒͢ΔύʔαʔΛؚΊɺଟ͘ͷύʔα͸ 1 ͭτʔΫϯΛઌಡ Έ͢Δ͜ͱͰɺ࣍ʹ࣮ߦ͢ΔΞΫγϣϯΛܾఆ͢Δ ϓϩάϥϛϯάݴޠͷઃܭऀ͸จ๏Λఆٛ͢Δͱ͖ʹɺ͜ΕΛߟྀ͠ ͍ͯΔ ޮ཰తͳ্ʹɺ࣮૷΋؆୯ʂ

  59. parseProtoBody 1 ͭτʔΫϯΛઌಡΈ͢Δ // protoBody = { import | package

    | option | topLevelDef | emptyStatement } // topLevelDef = message | enum | service | extend func (p *Parser) parseProtoBody() ([]interface{}, error) { var protoBody []interface{} for { if p.IsEOF() { return protoBody, nil } var stmt interface{} p.lex.Next() token := p.lex.Toke n switch token {
  60. 4 ͭͷεςοϓͰεΩʔϚΛύʔε͢Δ 1. ܁Γฦ͠ࣈ۟ղੳ “Lex” ͢Δ 2. ઌಡΈ “Lookahead” ͢Δ

    3. εςʔτΛભҠ͢Δ
  61. parseProtoBody 1 ͭτʔΫϯΛઌಡΈ͢Δ εςʔτ͕֬ఆͨ͠ΒɺͦΕ ʹରԠ͢Δؔ਺ΛݺͿ case scanner.TIMPORT: importValue, err :=

    p.ParseImport() if err != nil { return nil, er r } stmt = importValu e case scanner.TPACKAGE: packageValue, err := p.ParsePackage() if err != nil { return nil, er r } stmt = packageValue
  62. parseProtoBody 1 ͭτʔΫϯΛઌಡΈ͢Δ εςʔτ͕֬ఆͨ͠ΒɺͦΕ ʹରԠ͢Δؔ਺ΛݺͿ case scanner.TOPTION: option, err :=

    p.ParseOption() if err != nil { return nil, er r } stmt = optio n case scanner.TMESSAGE: message, err := p.ParseMessage() if err != nil { return nil, er r } stmt = message
  63. parseProtoBody 1 ͭτʔΫϯΛઌಡΈ͢Δ εςʔτ͕֬ఆͨ͠ΒɺͦΕ ʹରԠ͢Δؔ਺ΛݺͿ case scanner.TENUM: enum, err :=

    p.ParseEnum() if err != nil { return nil, er r } stmt = enu m case scanner.TSERVICE: service, err := p.ParseService() if err != nil { return nil, er r } stmt = service
  64. 4 ͭͷεςοϓͰεΩʔϚΛύʔε͢Δ 1. ܁Γฦ͠ࣈ۟ղੳ “Lex” ͢Δ 2. ઌಡΈ “Lookahead” ͢Δ

    3. εςʔτΛભҠ͢Δ 4. ந৅ߏจ໦ͷϊʔυΛ૊ΈཱͯΔ
  65. parseProtoBody ͦΕͧΕͷจʹରԠ͢Δϊʔ υ͕ἧͬͨΒ Proto ߏ଄ମʹ ֨ೲ͢Δ case scanner.TEXTEND: extend, err

    := p.ParseExtend() if err != nil { return nil, er r } stmt = exten d } protoBody = append(protoBody, extend) return &Proto{ Syntax: syntax, ProtoBody: protoBody, }, nil
  66. σϞ: εΩʔϚΛύʔε͢Δ syntax = "proto3"; package tutorial; message Outer {

    message Inner { // Level 2 int64 ival = 1; } repeated Inner inner_message = 2; EnumAllowingAlias enum_ fi eld =3; map<int32, string> my_map = 4; }
  67. σϞ: εΩʔϚΛύʔε͢Δ syntax = "proto3"; package tutorial; message Outer {

    message Inner { // Level 2 int64 ival = 1; } repeated Inner inner_message = 2; EnumAllowingAlias enum_ fi eld =3; map<int32, string> my_map = 4; }
  68. ·ͱΊ ϨΩαʔͱύʔαʔΛ෼͚ͯઃܭͨ͜͠ͱͰɺෳࡶͳจ๏نଇͰ΋ѻ ͍΍͘͢ͳͬͨ ϨΩαʔͱύʔαʔͦΕͧΕͷεςʔτϚγϯ͸খ͘͞ɺݟ௨͠Α͘ ࣮૷Ͱ͖ΔΑ͏ʹͳͬͨ ͦΕͧΕͷΞΫγϣϯ΋খ͘͞ɺςετΛॻ͖΍͘͢ͳͬͨ

  69. Ԡ༻తͳ՝୊

  70. ໰୊ ࠷ॳʹެ։ͨ͠ go-protoparser ʹ͸ 2 ͭͷ໰୊͕͋ͬͨ: 1. όοΫτϥοΩϯά͕Ͱ͖ͳ͍ͱίʔυ͕ෳࡶʹͳΓ͕ͪ • ϨΩαʔʹόοϑΝػߏΛ࣋ͨͤͯɺઌಡΈ͕ؒҧͬͯͨΒ࠷ॳ

    ͔Β΍Γ௚ͤΔΑ͏ʹͨ͠ɻৄࡉ͸ϦϙδτϦΛࢀর
  71. ໰୊ ࠷ॳʹެ։ͨ͠ go-protoparser ʹ͸ 2 ͭͷ໰୊͕͋ͬͨ: 1. όοΫτϥοΩϯά͕Ͱ͖ͳ͍ͱίʔυ͕ෳࡶʹͳΓ͕ͪ • ϨΩαʔʹόοϑΝػߏΛ࣋ͨͤͯɺઌಡΈ͕ؒҧͬͯͨΒ࠷ॳ

    ͔Β΍Γ௚ͤΔΑ͏ʹͨ͠ɻৄࡉ͸ϦϙδτϦΛࢀর 2. ந৅ߏจ໦͕࢖͍ͮΒ͍ • ੩తղੳثΛΧελϚΠζ͢Δͱ͖ʹ΋େࣄͳࢹ఺ • ࠓ೔͸ͪ͜ΒΛৄ͘͠ݟ͍ͯ͘
  72. Parser io.Reader Lexer Proto Analyzer

  73. ந৅ߏจ໦ͷΠϯλʔϑΣʔεͷ໰୊ Ϧϯλʔ͸ ProtoBody Λ૸ࠪͯ͠ɺཁૉͷܕΛಛఆ͔ͯ͠ΒϦϯτ ϧʔϧΛద༻͢Δඞཁ͕͋Δ // Proto represents a protocol

    buffer de fi nition. type Proto struct { Syntax *Synta x // ProtoBody is a slice of sum type consisted of *Import, *Package, *Option, *Message, *Enum, *Service, *Extend and *EmptyStatement. ProtoBody []interface{} }
  74. ܕ Switch ܕ Switch Ͱܕ Assertion Λ௚ྻ ʹࢦఆ͢Δ͜ͱ͕Ͱ͖Δ • ϘΠϥʔϓϨʔτ͕ଟ͍

    • ؒҧ͑΍͍͢ for _, s := range src { switch t := s.(type) { case *parser.Import: case *parser.Package: case *parser.Option: case *parser.Message:
  75. Visitor ύλʔϯ σβΠϯύλʔϯͷҰͭ • Visitor ύλʔϯ͸ɺଟ͘ͷΫϥε ͔Β੒ΔΦϒδΣΫτͰߏ੒͞Ε ͨߏ଄͕͋ͬͯɺͦΕʹର͢Δ৽ ͍͠ॲཧΛఆ͍ٛͨ͠ͱ͖ʹ࢖͏ Proto

    Syntax Package Message Message Field Field MapField Field ProtoBody
  76. Visitor ύλʔϯ σβΠϯύλʔϯͷҰͭ • Visitor ύλʔϯ͸ɺଟ͘ͷΫϥε ͔Β੒ΔΦϒδΣΫτͰߏ੒͞Ε ͨߏ଄͕͋ͬͯɺͦΕʹର͢Δ৽ ͍͠ॲཧΛఆ͍ٛͨ͠ͱ͖ʹ࢖͏ •

    Visitor ͸ΦϒδΣΫτߏ଄ͷཁૉ ʹର͢ΔॲཧΛද͢ Visitor VisitSyntax(Syntax) VisitMessage(Message) LintingVisitor VisitSyntax(Syntax) VisitMessage(Message) FormatingVisitor VisitSyntax(Syntax) VisitMessage(Message) Visitee Accept(Visitor) Syntax Accept(Visitor) Message Accept(Visitor)
  77. Visitor ύλʔϯΛಋೖ͢Δ 4 εςοϓ 1. ύʔαʔͰ Visitor ΠϯλʔϑΣʔεΛఆٛ͢Δ Visitor VisitSyntax(Syntax)

    VisitMessage(Message) LintingVisitor VisitSyntax(Syntax) VisitMessage(Message) FormatingVisitor VisitSyntax(Syntax) VisitMessage(Message) Visitee Accept(Visitor) Syntax Accept(Visitor) Message Accept(Visitor)
  78. Visitor ΠϯλʔϑΣʔε Ϧϯλʔ͸͜ͷ Visitor Λ࣮૷͢Δ͜ͱʹͳΔ // Visitor is for dispatching

    Protocol Buffer elements. type Visitor interface { VisitComment(*Comment) VisitEnum(*Enum) (next bool) VisitField(*Field) (next bool) VisitImport(*Import) (next bool) VisitMessage(*Message) (next bool) VisitPackage(*Package) (next bool) VisitSyntax(*Syntax) (next bool)
  79. Visitor ύλʔϯΛಋೖ͢Δ 4 εςοϓ 1. ύʔαʔͰ Visitor ΠϯλʔϑΣʔεΛఆٛ͢Δ 2. ύʔαʔͰ

    Visitee ΠϯλʔϑΣʔεΛఆٛ͢Δ Visitor VisitSyntax(Syntax) VisitMessage(Message) LintingVisitor VisitSyntax(Syntax) VisitMessage(Message) FormatingVisitor VisitSyntax(Syntax) VisitMessage(Message) Visitee Accept(Visitor) Syntax Accept(Visitor) Message Accept(Visitor)
  80. Visitee ΠϯλʔϑΣʔε ந৅ߏจ໦͸͜ͷ Visitee Λ࣮૷͢Δ͜ͱʹͳΔ // Visitee is implemented by

    all Protocol Buffer elements. type Visitee interface { Accept(v Visitor) } // Proto represents a protocol buffer de fi nition. type Proto struct { Syntax *Synta x // ProtoBody is a slice of sum type consisted of *Import, *Package, *Option, *Message, *Enum, *Service, *Extend and *EmptyStatement. ProtoBody []Visite e }
  81. Visitor ύλʔϯΛಋೖ͢Δ 4 εςοϓ 1. ύʔαʔͰ Visitor ΠϯλʔϑΣʔεΛఆٛ͢Δ 2. ύʔαʔͰ

    Visitee ΠϯλʔϑΣʔεΛఆٛ͢Δ 3. ந৅ߏจ໦͕ Visitee ΠϯλʔϑΣʔεΛ࣮૷͢Δ Visitor VisitSyntax(Syntax) VisitMessage(Message) LintingVisitor VisitSyntax(Syntax) VisitMessage(Message) FormatingVisitor VisitSyntax(Syntax) VisitMessage(Message) Visitee Accept(Visitor) Syntax Accept(Visitor) Message Accept(Visitor)
  82. Visitee ࣮૷ ϧʔτϊʔυʢProtoʣͷ Accept ؔ਺ ͕ɺϦϯλʔʹΑͬͯݺ͹ΕΔ Accept ؔ਺Λ௨ͯ͠ Visitor ͕఻ൖ͞

    ΕΔ // Accept dispatches the call to the visitor. func (p *Proto) Accept(v Visitor) { p.Syntax.Accept(v) for _, body := range p.ProtoBody { body.Accept(v) } }
  83. // Accept dispatches the call to the visitor. func (s

    *Syntax) Accept(v Visitor) { if !v.VisitSyntax(s) { return } for _, comment := range s.Comments { comment.Accept(v) } } ͦΕͧΕͷϊʔυ͸ରԠ͢Δ Visit ؔ ਺ΛݺͿ Ҿ͖ଓ͖ɺAccept ؔ਺Λ௨ͯ͠ Visitor ͕఻ൖ͞ΕΔ Visitee ࣮૷
  84. Visitor ύλʔϯΛಋೖ͢Δ 4 εςοϓ 1. ύʔαʔͰ Visitor ΠϯλʔϑΣʔεΛఆٛ͢Δ 2. ύʔαʔͰ

    Visitee ΠϯλʔϑΣʔεΛఆٛ͢Δ 3. ந৅ߏจ໦͕ Visitee ΠϯλʔϑΣʔεΛ࣮૷͢Δ 4. ੩తղੳثʢex. Ϧϯλʔʣ͕ Visitor Πϯλʔ ϑΣʔεΛ࣮૷͢Δ Visitor VisitSyntax(Syntax) VisitMessage(Message) LintingVisitor VisitSyntax(Syntax) VisitMessage(Message) FormatingVisitor VisitSyntax(Syntax) VisitMessage(Message) Visitee Accept(Visitor) Syntax Accept(Visitor) Message Accept(Visitor)
  85. // LintingVisitor represents a visitor representing various lint operations. type

    LintingVisitor struct{} func (v * LintingVisitor) VisitSyntax(s *parser.Syntax) bool { if s.ProtobufVersion != 3 { v.AddFailuref(s.Meta.Pos, "Syntax should be 3 but was %q.", s.ProtobufVersion) } return false }
  86. // LintingVisitor represents a visitor representing various lint operations. type

    LintingVisitor struct{} func (v * LintingVisitor) VisitSyntax(s *parser.Syntax) bool { if s.ProtobufVersion != 3 { v.AddFailuref(s.Meta.Pos, "Syntax should be 3 but was %q.", s.ProtobufVersion) } return false } func (v * LintingVisitor) VisitMessage(message *parser.Message) bool { name := message.MessageNam e if !strs.IsUpperCamelCase(name) { expected := strs.ToUpperCamelCase(name) v.AddFailuref(message.Meta.Pos, "Message name %q must be UpperCamelCase like %q", name, expected) } return true }
  87. Visit! proto := parser.Parse(reader) linter := LintingVisitor{} proto.Accept(linter) formatter :=

    FormatingVisitor{} proto.Accept(formatter)
  88. Visit! proto := parser.Parse(reader) linter := LintingVisitor{} proto.Accept(linter) formatter :=

    FormatingVisitor{} proto.Accept(formatter)
  89. ͜ΕͰ͓͠·͍ ศརͳந৅ߏจ໦ͷΠϯλʔϑΣʔεΛඋ͑ͨɺຊ֨తͳύʔαʔͱ ϨΩαʔ͕Ͱ͖ͨ Go ඪ४ϥΠϒϥϦ͚ͩͰɺ֎෦ϥΠϒϥϦ͸࢖ͬͯͳ͍ Visitor ύλʔϯ͸ύʔαʔΛΑΓ࢖͍΍ͯ͘͘͢͠ΕΔ ந৅ߏจ໦Λ࢖ͬͨॲཧͷ௥Ճ͸৽͍͠ Visitor ͷ࣮૷ͰࡁΉͷͰɺ

    ύʔαʔͷίʔυ͸ม͑ͳ͍͍ͯ͘
  90. ৄ͍͠৘ใ ϨΩαʔͱύʔαʔ: github.com/yoheimuta/go-protoparser Ϧϯλʔ: github.com/yoheimuta/protolint ίϯύΠϥ: ίϯύΠϥ―ݪཧɾٕ๏ɾπʔϧ Visitor ύλʔϯ: ΦϒδΣΫτࢦ޲ʹ͓͚Δ࠶ར༻ͷͨΊͷσβΠϯύλʔϯ

  91. Thank you @yoheimuta great@parallelcorp.com The Go gopher was designed by

    Renée French. Illustrations by tottie.