Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ゼロから作る Protocol Buffer のパーサーとレキサー / Writing Protocol Buffer Parser/Lexer in Go from scratch

yoheimuta
April 23, 2022

ゼロから作る Protocol Buffer のパーサーとレキサー / Writing Protocol Buffer Parser/Lexer in Go from scratch

yoheimuta

April 23, 2022
Tweet

More Decks by yoheimuta

Other Decks in Programming

Transcript

  1. Yohei Yoshimuta
    θϩ͔Β࡞Δ Protocol Buffer ͷ
    ύʔαʔͱϨΩαʔ
    Go Conference
    23 April 2022
    ϋογϡλά: #gocon #goconB

    View Slide

  2. ࣗݾ঺հ

    View Slide

  3. ࣗݾ঺հʢॴଐʣ
    ύϥϨϧ:

    • Ϣʔβʔͷ 70 % ͕ Z ੈ୅

    • 1 ೔ͷ௨࿩࣌ؒ͸ฏۉ 3 ࣌ؒ

    Go ͷར༻࣮੷:
    • WebSocket: github.com/kuiperbelt/kuiperbelt

    • σʔλసૹ: Cloud DataFlow Ͱ MySQL ͔Β
    BigQuery ΁

    View Slide

  4. σʔλߏ଄ͷෆҰக
    ଟ͘ͷϓϩάϥϛϯά՝୊͸ɺ͋ΔσʔλΛผͷߏ଄Λ࣋ͬͨσʔλʹஔ͖
    ׵͑Δ͜ͱͰ͋Δ

    • ςΩετΛߦʹ෼ׂ

    • όΠφϦͷγϦΞϥΠζɾσγϦΞϥΠζ

    • ਖ਼نදݱ

    • ϨΩαʔ
    • ύʔαʔ
    ग़య: Lexical Scanning in Go: https://talks.golang.org/2011/lex.slide

    View Slide

  5. ϨΩαʔʢࣈ۟ղੳثʣ
    จࣈετϦʔϜΛτʔΫϯʹஔ͖׵͑Δ
    enum MachineState {
    option allow_alias = true ;
    STOPPED = 0 ;
    RUNNING = 1 ;
    }
    enum MachineState {
    option allow_alias = true;
    STOPPED = 0;
    RUNNING = 1;
    }

    View Slide

  6. ϨΩαʔʢࣈ۟ղੳثʣ
    จࣈετϦʔϜΛτʔΫϯʹஔ͖׵͑Δ
    enum MachineState {
    option allow_alias = true ;
    STOPPED = 0 ;
    RUNNING = 1 ;
    }
    enum MachineState {
    option allow_alias = true;
    STOPPED = 0;
    RUNNING = 1;
    }

    View Slide

  7. ύʔαʔʢߏจղੳثʣ
    τʔΫϯΛ໦ߏ଄ͷσʔλʹஔ͖׵͑Δ
    enum MachineState {
    option allow_alias = true ;
    STOPPED = 0 ;
    RUNNING = 1 ;
    }
    Enum
    Option Field Field
    Name = MachineState
    Name = allow_alias
    Value = true
    Name =RUNNING
    Value = 0
    Name =STOPPED
    Value = 1

    View Slide

  8. ύʔαʔʢߏจղੳثʣ
    τʔΫϯΛ໦ߏ଄ͷσʔλʹஔ͖׵͑Δ
    enum MachineState {
    option allow_alias = true ;
    STOPPED = 0 ;
    RUNNING = 1 ;
    }
    Enum
    Option Field Field
    Name = MachineState
    Name = allow_alias
    Value = true
    Name =RUNNING
    Value = 0
    Name =STOPPED
    Value = 1

    View Slide

  9. ࣮૷͸೉͍͜͠ͱ΋
    ෳࡶͳςΩετ͸ͦΕࣗମಠࣗͷϧʔϧ΍ߏจ͕͋ΓɺυΩϡϝϯτԽ͞Ε͍ͯ
    ͳ͍ߏจ΋͠͹͠͹

    ్தͰखʹෛ͑ͳ͘ͳΔ͜ͱ΋͋Δ

    Go ͸ύʔαʔΛࣗ࡞͢Δे෼ͳ؀ڥΛఏڙͯ͘͠Ε͍ͯΔʂ

    …ͨͩ͠ɺ৻ॏͳઃܭ͸ඞཁ

    Protocol Buffer εΩʔϚʹߜͬͯύʔαʔͷઃܭͱ࣮૷Λݟ͍͖ͯ·͠ΐ͏

    View Slide

  10. ࠓ೔࿩͢͜ͱ
    • ύʔαʔͷϢʔεέʔε

    • ύʔαʔΛͳͥࣗ࡞͠Α͏ͱࢥͬͨͷ͔

    • Ͳ͏͍͏ઃܭɾΞϓϩʔνΛऔͬͨͷ͔

    • ࣮૷ৄࡉ

    • Ԡ༻తͳ՝୊

    • ϨΩαʔͷόοϑΝઃܭ

    • ந৅ߏจ໦ͷΠϯλʔϑΣʔεઃܭ

    View Slide

  11. ϨΩαʔͱύʔαʔ͸ॏཁͳߏ੒ཁૉ

    • ίϯύΠϥ
    ίϯύΠϥͷϑϩϯτΤϯυ
    ? ? ? ?
    ? ? ? ? ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ?
    ? ? ? ?

    View Slide

  12. ίϯύΠϥͷϑϩϯτΤϯυ
    ϨΩαʔͱύʔαʔ͸ॏཁͳߏ੒ཁૉ

    • ίϯύΠϥ

    • ੩తղੳ
    ? ? ? ?
    ? ? ? ? ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ?
    ? ? ? ?

    View Slide

  13. ίϯύΠϥͷϑϩϯτΤϯυ
    ϨΩαʔͱύʔαʔ͸ॏཁͳߏ੒ཁૉ

    • ίϯύΠϥ

    • ੩తղੳ

    • ίʔυੜ੒
    ? ? ? ?
    ? ? ? ? ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ?
    ? ? ? ?

    View Slide

  14. ίϯύΠϥͷϑϩϯτΤϯυ
    ϨΩαʔͱύʔαʔ͸ॏཁͳߏ੒ཁૉ

    • ίϯύΠϥ

    • ੩తղੳ

    • ίʔυੜ੒

    • σʔλϑΝΠϧ
    ? ? ? ?
    ? ? ? ? ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ?
    ? ? ? ?

    View Slide

  15. Protocol Buffer εΩʔϚ
    Protocol Buffers ͸ Google ͕։ൃͨ͠ݴޠɾϓϥοτϑΥʔϜʹґଘ͠ͳ͍ɺߏ
    ଄Խ͞ΕͨσʔλΛγϦΞϥΠζ͢Δ࢓૊Έ

    XML ΍ JSON ͷΑ͏ʹ࢖͑Δ͕ɺΑΓখ͘͞ɾૣ͘ɾ؆୯

    ཉ͍͠σʔλͷߏ଄ʢεΩʔϚʣΛࣄલʹఆ͓ٛͯ͘͠
    message Person {
    required string name = 1;
    required int32 id = 2;
    optional string email = 3;
    }
    ग़య: Protocol Bu
    ff
    ers | Google Developers: https://developers.google.com/protocol-bu
    ff
    ers

    View Slide

  16. ݹ͍ύʔαʔ
    ਖ਼نදݱΛ࢖ͬͨύʔαʔ

    ϓϩδΣΫτʹಛԽͨ͠੩తղੳπʔϧͷͨΊʹ։ൃͨ͠

    • Ϧϯλʔ

    View Slide

  17. ݹ͍ύʔαʔ
    ਖ਼نදݱΛ࢖ͬͨύʔαʔ

    ϓϩδΣΫτʹಛԽͨ͠੩తղੳπʔϧͷͨΊʹ։ൃͨ͠

    • Ϧϯλʔ

    • ίʔυδΣωϨʔλʔʢόϦσʔγϣϯʣ

    View Slide

  18. ݹ͍ύʔαʔ
    ਖ਼نදݱΛ࢖ͬͨύʔαʔ

    ϓϩδΣΫτʹಛԽͨ͠੩తղੳπʔϧͷͨΊʹ։ൃͨ͠

    • Ϧϯλʔ

    • ίʔυδΣωϨʔλʔʢόϦσʔγϣϯʣ

    • υΩϡϝϯτδΣωϨʔλʔ

    View Slide

  19. ৽͍͠ύʔαʔ
    ݹ͍ύʔαʔΛஔ͖׵͔͑ͨͬͨ

    ଟ͘ͷ໰୊Λ๊͍͑ͯͨ:

    • ͢΂ͯͷจ๏نଇʹରԠ͍ͯ͠ͳ͍

    View Slide

  20. ৽͍͠ύʔαʔ
    ݹ͍ύʔαʔΛஔ͖׵͔͑ͨͬͨ

    ଟ͘ͷ໰୊Λ๊͍͑ͯͨ:

    • ͢΂ͯͷจ๏نଇʹରԠ͍ͯ͠ͳ͍

    • ίʔυ͕֦ுͮ͠Β͍

    View Slide

  21. ϨΩαʔɺύʔαʔɺΞφϥΠ
    βʔͷ໾ׂΛ໌֬ʹͯ͠ɺઃܭ
    ͠௚ͨ͠
    go-protoparser
    ? ? ? ?
    ? ? ? ? ? ? ? ? ? ? ?
    ? ? ? ? ? ? ? ?
    ? ? ? ? ?
    Protocol Buffer ? ? ? ?
    ? ? ? ?
    ? ? ? ? ? ?
    go-protoparser
    protolint

    View Slide

  22. Ͳ͏͍͏ઃܭɾΞϓϩʔνΛऔͬ
    ͨͷ͔

    View Slide

  23. ࠓ೔͸ Protocol Buffer εΩʔϚͷύʔαʔʹয఺Λ౰ͯΔ
    syntax = "proto3";
    package tutorial;
    message Outer { // A message is an aggregate a set of typed
    fi
    elds.
    message Inner { // Level 2
    int64 ival = 1;
    }
    repeated Inner inner_message = 2;
    EnumAllowingAlias enum_
    fi
    eld =3;
    map my_map = 4;
    }

    View Slide

  24. Parse ؔ਺
    ͜ͷؔ਺͸จࣈετϦʔϜΛҾ਺ʹऔͬͯɺந৅ߏจ໦Λฦ͢

    io.Reader ΠϯλʔϑΣʔεΛ࢖͑͹ɺ༷ʑͳೖྗʹରԠͰ͖Δ
    func Parse(input io.Reader) (*parser.Proto, error)

    View Slide

  25. Proto ܕ
    εΩʔϚ͸࣍ͷจ๏نଇʹै͏:

    • syntax จ͔Β͸͡·Δ

    • ͦͷޙ͸ɺimport จɺpackage จɺoption จɺmessage จɺenum จɺ
    service จɺemptyStatement จͷͲΕ͔Λ܁Γฦ͢
    // Proto represents a protocol buffer de
    fi
    nition.
    type Proto struct {
    Syntax *Synta
    x

    ProtoBody []interface{}
    }

    View Slide

  26. Proto ܕ
    syntax = "proto3";
    package tutorial;
    message Outer {
    message Inner { // Level 2
    int64 ival = 1;
    }
    repeated Inner inner_message = 2;
    EnumAllowingAlias enum_
    fi
    eld =3;
    map my_map = 4;
    }
    Proto
    Syntax Package Message
    Message Field Field MapField
    Field
    ProtoBody

    View Slide

  27. Ͳ͏΍ͬͯύʔε͢Δʁ
    Ξϓϩʔν͸ෳ਺͋Δ:

    • yacc/lex ͷΑ͏ͳπʔϧΛ࢖͏

    • ਖ਼نදݱΛ࢖͏

    View Slide

  28. πʔϧ
    πʔϧΛ࢖͏͜ͱࣗମ͸໰୊ͳ͍͕:

    • ͦΕઐ༻ͷݴޠΛֶͿඞཁ͕ൃੜ͕ͪ͠

    View Slide

  29. πʔϧ
    πʔϧΛ࢖͏͜ͱࣗମ͸໰୊ͳ͍͕:

    • ͦΕઐ༻ͷݴޠΛֶͿඞཁ͕ൃੜ͕ͪ͠

    • ༻్͕߹Θͳ͍͜ͱ΋͋Δ

    • ͲͷΑ͏ͳจ๏نଇʹͰ΋ରԠͰ͖Δͱ͍͏Θ͚Ͱ͸ͳ͍

    View Slide

  30. ਖ਼نදݱ
    ࠷ॳʹ࡞ͬͨύʔαʔͷΞϓϩʔν

    • Մಡੑ͕௿͍

    View Slide

  31. ਖ਼نදݱ
    ࠷ॳʹ࡞ͬͨύʔαʔͷΞϓϩʔν

    • Մಡੑ͕௿͍

    • ͢΂ͯͷจ๏نଇΛ໢ཏͰ͖ͳ͚Ε͹݁ہɺঢ়ଶ؅ཧ͕ඞཁʹͳΔ

    View Slide

  32. ਖ਼نදݱ
    ࠷ॳʹ࡞ͬͨύʔαʔͷΞϓϩʔν

    • Մಡੑ͕௿͍

    • ͢΂ͯͷจ๏نଇΛ໢ཏͰ͖ͳ͚Ε͹݁ہɺঢ়ଶ؅ཧ͕ඞཁʹͳΔ

    • ࠶ؼతͳϚονϯά͸೉͍͠

    View Slide

  33. ϨΩαʔͱύʔαʔΛࣗ࡞͠Α͏
    ϨΩαʔͱύʔαʔʹ෼͚࣮ͯ૷͢Δͱɺෳࡶͳจ๏نଇ΋ѻ͍΍͘͢
    ͳΔ͜ͱ͕஌ΒΕ͍ͯΔ

    จ๏نଇͷ؍఺͔Β͸ɺϨΩαʔ͸τʔΧφΠβʔͱ΋ݺ͹ΕΔɻύʔ
    αʔ͸γϯλοΫεΞφϥΠβʔͱ΋ݺ͹ΕΔɻ

    ίϯύΠϥ΍ΠϯλϓϦλ͸௨ৗ͜ΕΒΛผʑͷϓϩηεͱͯ͠ѻ͏ɻ͜
    ͷઃܭ͸ͦͷޙͷ࣮૷Λָʹͯ͘͠ΕΔ

    View Slide

  34. Parser
    io.Reader Lexer
    Proto
    ύʔαʔͷॳظԽ
    ύʔαʔ͸ϨΩαʔʹґଘ͢Δ

    ϨΩαʔ͸จࣈετϦʔϜʹґଘ͢Δ
    // Parse parses a Protocol Buffer
    fi
    le.
    func Parse(input io.Reader) (*parser.Proto, error) {
    p := parser.NewParser(
    lexer.NewLexer(
    input,
    ),
    )
    return p.ParseProto()
    }

    View Slide

  35. ࣮૷ৄࡉʢϨΩαʔʣ

    View Slide

  36. Parser
    io.Reader Lexer
    Proto

    View Slide

  37. ࣈ۟ղੳʹΑͬͯಘΒΕͨΞΠςϜ
    ࣈ۟ղੳͷ݁Ռ͸ 2 ͭͷϑΟʔϧυͰද͢

    • ܕ: ྫ͑͹ Number

    • ஋: ྫ͑͹ “100”
    // Lexer is a tokenizer.
    type Lexer struct {
    Token Token // Token is the lexical type.
    Text string // Text is the lexical value. It has a cool name "lexeme."

    View Slide

  38. Token ܕ
    τʔΫϯ͸ͨͩͷ Integer ఆ਺

    iota Λ࢖ͬͯɺ͢΂ͯͷτʔΫϯΛྻڍ͢Δ
    type Token int // Token represents a lexical type.
    const ( // The result of Scan is one of these tokens.
    TILLEGAL Token = iota // Special tokens
    TEO
    F

    TIDENT // Identi
    fi
    ers

    View Slide

  39. TINTLIT // Literals
    TFLOATLI
    T

    TBOOLLI
    T

    TSTRLI
    T

    // Misc characters
    TSEMICOLON // ;
    TCOLON // :
    TEQUALS // =
    TQUOTE // " or '
    ...
    TSYNTAX // Keywords
    TPACKAG
    E

    TMESSAG
    E

    ...

    View Slide

  40. ϨΩαʔͷΤϯτϦʔؔ਺
    ύʔαʔ͸ lexer.Next() ͷݺͼग़͠ͱ lexer ͷΞΠςϜͷࢀরɺΛ܁Γ
    ฦ͠ߦ͏
    // Next scans the read buffer.
    func (lex *Lexer) Next() {
    var err error
    lex.Token, lex.Text, lex.Pos, err = lex.scan()
    if err != nil {
    log.Printf(`lexer encountered the error "%v"`, err)
    }
    }

    View Slide

  41. bufio.Reader
    Lexer ͸จࣈετϦʔϜΛ rune ୯ҐͰૢ࡞
    ͢Δ

    ͦͷͨΊʹɺLexer ͸ io.Reader Λ
    bufio.Reader Ͱϥοϓ͍ͯ͠Δ

    bufio.Reader ͸όοϑΝϦϯάͱςΩετ

    I/O ͷͨΊͷϔϧύʔؔ਺Λఏڙͯ͘͠ΕΔ
    // Lexer is a tokenizer.
    type Lexer struct {
    ...
    r *bu
    fi
    o.Reade
    r

    }
    // NewLexer creates a new lexer.
    func NewLexer(r io.Reader) *Lexer {
    l := &Lexer{
    r: bu
    fi
    o.NewReader(r),
    }
    return
    l

    }

    View Slide

  42. read ؔ਺
    var eof = rune(0)
    func (l *Lexer) read() rune {
    ch, _, err := l.r.ReadRune()
    if err != nil {
    return eo
    f

    }
    return c
    h

    }

    View Slide

  43. peek ؔ਺
    func (l *Lexer) peek() rune {
    ch := l.read()
    l.r.UnreadRune()
    return c
    h

    }

    View Slide

  44. ࣮ࡍʹࣈ۟ղੳΛߦ͏ؔ਺
    εςʔτϚγϯ͸ switch จΛ
    ࢖࣮ͬͯ૷Ͱ͖Δ

    • ࠷ॳͷ rune Λ peek ͢Δ
    func (l *Lexer) scan() (Token, string, error) {
    ch := l.peek()

    View Slide

  45. ࣮ࡍʹࣈ۟ղੳΛߦ͏ؔ਺
    εςʔτϚγϯ͸ switch จΛ
    ࢖࣮ͬͯ૷Ͱ͖Δ

    • ࠷ॳͷ rune Λ peek ͢Δ

    • ࣍ͷΞΫγϣϯΛܾΊΔ
    func (l *Lexer) scan() (Token, string, error) {
    ch := l.peek()
    switch {
    case unicode.IsSpace(ch):
    case ch == eof:
    case isQuote(ch):

    View Slide

  46. ࣮ࡍʹࣈ۟ղੳΛߦ͏ؔ਺
    εςʔτϚγϯ͸ switch จΛ
    ࢖࣮ͬͯ૷Ͱ͖Δ

    • ࠷ॳͷ rune Λ peek ͢Δ

    • ࣍ͷΞΫγϣϯΛܾΊΔ

    • ͦͷΞΫγϣϯΛ࣮ߦ͢Δ
    func (l *Lexer) scan() (Token, string, error) {
    ch := l.peek()
    switch {
    case unicode.IsSpace(ch):
    l.read()
    return l.scan()
    case ch == eof:
    return TEOF, "", nil
    case isQuote(ch):
    lit, err := l.scanStrLit()
    if err != nil { return TILLEGAL, "", err }
    return TSTRLIT, lit, nil

    View Slide

  47. ࣝผࢠΛಡΈऔΔ
    // ident = letter { letter | decimalDigit | "_" }
    func (l *Lexer) scanIdent() string {
    ident := string(l.read())
    for {
    next := l.peek()
    switch {
    case isLetter(next), isDecimalDigit(next), next == '_':
    ident += string(s.read())
    default:
    return iden
    t

    }

    View Slide

  48. จࣈͱ਺ࣈΛಡΈऔΔ
    // letter = "A" … "Z" | "a" … "z"
    func isLetter(r rune) bool { // ref. https://en.wikipedia.org/wiki/List_of_Unicode_characters#Basic_Latin
    if r < 'A' { return false }
    if r > 'z' { return false }
    if r > 'Z' && r < 'a' { return false }
    return true
    }
    // decimalDigit = "0" … "9"
    func isDecimalDigit(r rune) bool {
    return '0' <= r && r <= '9
    '

    }

    View Slide

  49. σϞ: εΩʔϚΛࣈ۟ղੳ “Lex” ͢Δ
    import (
    "fmt"
    "os"
    "github.com/yoheimuta/go-protoparser/v4/lexer"
    )

    func main() {
    lex := lexer.NewLexer(os.Stdin)
    for !lex.IsEOF() {
    lex.Next()
    fmt.Println("[", lex.Text, "]")
    }
    }

    View Slide

  50. σϞ: εΩʔϚΛࣈ۟ղੳ “Lex” ͢Δ
    import (
    "fmt"
    "os"
    "github.com/yoheimuta/go-protoparser/v4/lexer"
    )

    func main() {
    lex := lexer.NewLexer(os.Stdin)
    for !lex.IsEOF() {
    lex.Next()
    fmt.Println("[", lex.Text, "]")
    }
    }

    View Slide

  51. ࣮૷ৄࡉʢύʔαʔʣ

    View Slide

  52. Parser
    io.Reader Lexer
    Proto

    View Slide

  53. ύʔεΛ͸͡ΊΔ
    ࠷ॳʹ syntax จΛύʔε͠
    ͯɺͦͷޙʹ body

    ͦΕͧΕΛผͷؔ਺ʹ෼͚ͯ
    ͓͘ͱϢχοτςετ͕ॻ͖
    ΍͘͢ͳΔ
    // ParseProto parses the proto.
    // proto = syntax { import | package | option |
    topLevelDef | emptyStatement }
    func (p *Parser) ParseProto() (*Proto, error) {
    syntax, err := p.ParseSyntax()
    if err != nil { return nil, err }
    protoBody, err := p.parseProtoBody()
    if err != nil { return nil, err }
    return &Proto{
    Syntax: syntax,
    ProtoBody: protoBody,
    }, nil
    }

    View Slide

  54. Syntax
    Protocol Buffer εΩʔϚʹ͸ 2 ͭͷόʔδϣϯ͕͋Δ

    • syntax = "proto3"

    • syntax = "proto2"
    // Syntax is used to de
    fi
    ne the protobuf version.
    type Syntax struct {
    ProtobufVersion string
    }

    View Slide

  55. 4 ͭͷεςοϓͰεΩʔϚΛύʔε͢Δ
    1. ܁Γฦ͠ࣈ۟ղੳ “Lex” ͢Δ

    View Slide

  56. func (p *Parser) ParseSyntax() (*Syntax, error) {
    p.lex.Next()
    if p.lex.Token != lexer.TSYNTAX { return nil, p.unexpected("syntax") // unexpected is a helper function to report the error }
    p.lex.Next()
    if p.lex.Token != lexer.TEQUALS { return nil, p.unexpected("=") }
    p.lex.Next()
    if p.lex.Token != lexer.TQUOTE { return nil, p.unexpected("quote") }
    p.lex.Next()
    if p.lex.Text != "proto3" && p.lex.Text != "proto2" { return nil, p.unexpected("proto3 or proto2") }
    version := p.lex.Tex
    t

    p.lex.Next()
    if p.lex.Token != lexer.TQUOTE { return nil, p.unexpected("quote") }
    p.lex.Next()
    if p.lex.Token != lexer.TSEMICOLON { return nil, p.unexpected(";") }
    return &Syntax{ ProtobufVersion: version }, nil
    }

    View Slide

  57. 4 ͭͷεςοϓͰεΩʔϚΛύʔε͢Δ
    1. ܁Γฦ͠ࣈ۟ղੳ “Lex” ͢Δ

    2. ઌಡΈ “Lookahead” ͢Δ

    View Slide

  58. ઌಡΈʢϧοΫΞϔουʣύʔαʔ
    yacc ͕ੜ੒͢ΔύʔαʔΛؚΊɺଟ͘ͷύʔα͸ 1 ͭτʔΫϯΛઌಡ
    Έ͢Δ͜ͱͰɺ࣍ʹ࣮ߦ͢ΔΞΫγϣϯΛܾఆ͢Δ

    ϓϩάϥϛϯάݴޠͷઃܭऀ͸จ๏Λఆٛ͢Δͱ͖ʹɺ͜ΕΛߟྀ͠
    ͍ͯΔ

    ޮ཰తͳ্ʹɺ࣮૷΋؆୯ʂ

    View Slide

  59. parseProtoBody
    1 ͭτʔΫϯΛઌಡΈ͢Δ
    // protoBody = { import | package | option | topLevelDef |
    emptyStatement }
    // topLevelDef = message | enum | service | extend
    func (p *Parser) parseProtoBody() ([]interface{}, error) {
    var protoBody []interface{}
    for {
    if p.IsEOF() { return protoBody, nil }
    var stmt interface{}
    p.lex.Next()
    token := p.lex.Toke
    n

    switch token {

    View Slide

  60. 4 ͭͷεςοϓͰεΩʔϚΛύʔε͢Δ
    1. ܁Γฦ͠ࣈ۟ղੳ “Lex” ͢Δ

    2. ઌಡΈ “Lookahead” ͢Δ

    3. εςʔτΛભҠ͢Δ

    View Slide

  61. parseProtoBody
    1 ͭτʔΫϯΛઌಡΈ͢Δ

    εςʔτ͕֬ఆͨ͠ΒɺͦΕ
    ʹରԠ͢Δؔ਺ΛݺͿ
    case scanner.TIMPORT:
    importValue, err := p.ParseImport()
    if err != nil {
    return nil, er
    r

    }
    stmt = importValu
    e

    case scanner.TPACKAGE:
    packageValue, err := p.ParsePackage()
    if err != nil {
    return nil, er
    r

    }
    stmt = packageValue

    View Slide

  62. parseProtoBody
    1 ͭτʔΫϯΛઌಡΈ͢Δ

    εςʔτ͕֬ఆͨ͠ΒɺͦΕ
    ʹରԠ͢Δؔ਺ΛݺͿ
    case scanner.TOPTION:
    option, err := p.ParseOption()
    if err != nil {
    return nil, er
    r

    }
    stmt = optio
    n

    case scanner.TMESSAGE:
    message, err := p.ParseMessage()
    if err != nil {
    return nil, er
    r

    }
    stmt = message

    View Slide

  63. parseProtoBody
    1 ͭτʔΫϯΛઌಡΈ͢Δ

    εςʔτ͕֬ఆͨ͠ΒɺͦΕ
    ʹରԠ͢Δؔ਺ΛݺͿ
    case scanner.TENUM:
    enum, err := p.ParseEnum()
    if err != nil {
    return nil, er
    r

    }
    stmt = enu
    m

    case scanner.TSERVICE:
    service, err := p.ParseService()
    if err != nil {
    return nil, er
    r

    }
    stmt = service

    View Slide

  64. 4 ͭͷεςοϓͰεΩʔϚΛύʔε͢Δ
    1. ܁Γฦ͠ࣈ۟ղੳ “Lex” ͢Δ

    2. ઌಡΈ “Lookahead” ͢Δ

    3. εςʔτΛભҠ͢Δ

    4. ந৅ߏจ໦ͷϊʔυΛ૊ΈཱͯΔ

    View Slide

  65. parseProtoBody
    ͦΕͧΕͷจʹରԠ͢Δϊʔ
    υ͕ἧͬͨΒ Proto ߏ଄ମʹ
    ֨ೲ͢Δ
    case scanner.TEXTEND:
    extend, err := p.ParseExtend()
    if err != nil {
    return nil, er
    r

    }
    stmt = exten
    d

    }
    protoBody = append(protoBody, extend)
    return &Proto{
    Syntax: syntax,
    ProtoBody: protoBody,
    }, nil

    View Slide

  66. σϞ: εΩʔϚΛύʔε͢Δ
    syntax = "proto3";
    package tutorial;
    message Outer {
    message Inner { // Level 2
    int64 ival = 1;
    }
    repeated Inner inner_message = 2;
    EnumAllowingAlias enum_
    fi
    eld =3;
    map my_map = 4;
    }

    View Slide

  67. σϞ: εΩʔϚΛύʔε͢Δ
    syntax = "proto3";
    package tutorial;
    message Outer {
    message Inner { // Level 2
    int64 ival = 1;
    }
    repeated Inner inner_message = 2;
    EnumAllowingAlias enum_
    fi
    eld =3;
    map my_map = 4;
    }

    View Slide

  68. ·ͱΊ
    ϨΩαʔͱύʔαʔΛ෼͚ͯઃܭͨ͜͠ͱͰɺෳࡶͳจ๏نଇͰ΋ѻ
    ͍΍͘͢ͳͬͨ

    ϨΩαʔͱύʔαʔͦΕͧΕͷεςʔτϚγϯ͸খ͘͞ɺݟ௨͠Α͘
    ࣮૷Ͱ͖ΔΑ͏ʹͳͬͨ

    ͦΕͧΕͷΞΫγϣϯ΋খ͘͞ɺςετΛॻ͖΍͘͢ͳͬͨ

    View Slide

  69. Ԡ༻తͳ՝୊

    View Slide

  70. ໰୊
    ࠷ॳʹެ։ͨ͠ go-protoparser ʹ͸ 2 ͭͷ໰୊͕͋ͬͨ:

    1. όοΫτϥοΩϯά͕Ͱ͖ͳ͍ͱίʔυ͕ෳࡶʹͳΓ͕ͪ

    • ϨΩαʔʹόοϑΝػߏΛ࣋ͨͤͯɺઌಡΈ͕ؒҧͬͯͨΒ࠷ॳ
    ͔Β΍Γ௚ͤΔΑ͏ʹͨ͠ɻৄࡉ͸ϦϙδτϦΛࢀর

    View Slide

  71. ໰୊
    ࠷ॳʹެ։ͨ͠ go-protoparser ʹ͸ 2 ͭͷ໰୊͕͋ͬͨ:

    1. όοΫτϥοΩϯά͕Ͱ͖ͳ͍ͱίʔυ͕ෳࡶʹͳΓ͕ͪ

    • ϨΩαʔʹόοϑΝػߏΛ࣋ͨͤͯɺઌಡΈ͕ؒҧͬͯͨΒ࠷ॳ
    ͔Β΍Γ௚ͤΔΑ͏ʹͨ͠ɻৄࡉ͸ϦϙδτϦΛࢀর

    2. ந৅ߏจ໦͕࢖͍ͮΒ͍

    • ੩తղੳثΛΧελϚΠζ͢Δͱ͖ʹ΋େࣄͳࢹ఺

    • ࠓ೔͸ͪ͜ΒΛৄ͘͠ݟ͍ͯ͘

    View Slide

  72. Parser
    io.Reader Lexer
    Proto Analyzer

    View Slide

  73. ந৅ߏจ໦ͷΠϯλʔϑΣʔεͷ໰୊
    Ϧϯλʔ͸ ProtoBody Λ૸ࠪͯ͠ɺཁૉͷܕΛಛఆ͔ͯ͠ΒϦϯτ
    ϧʔϧΛద༻͢Δඞཁ͕͋Δ
    // Proto represents a protocol buffer de
    fi
    nition.
    type Proto struct {
    Syntax *Synta
    x

    // ProtoBody is a slice of sum type consisted of *Import, *Package, *Option, *Message, *Enum,
    *Service, *Extend and *EmptyStatement.
    ProtoBody []interface{}
    }

    View Slide

  74. ܕ Switch
    ܕ Switch Ͱܕ Assertion Λ௚ྻ
    ʹࢦఆ͢Δ͜ͱ͕Ͱ͖Δ

    • ϘΠϥʔϓϨʔτ͕ଟ͍

    • ؒҧ͑΍͍͢
    for _, s := range src {
    switch t := s.(type) {
    case *parser.Import:


    case *parser.Package:


    case *parser.Option:


    case *parser.Message:

    View Slide

  75. Visitor ύλʔϯ
    σβΠϯύλʔϯͷҰͭ

    • Visitor ύλʔϯ͸ɺଟ͘ͷΫϥε
    ͔Β੒ΔΦϒδΣΫτͰߏ੒͞Ε
    ͨߏ଄͕͋ͬͯɺͦΕʹର͢Δ৽
    ͍͠ॲཧΛఆ͍ٛͨ͠ͱ͖ʹ࢖͏
    Proto
    Syntax Package Message
    Message Field Field MapField
    Field
    ProtoBody

    View Slide

  76. Visitor ύλʔϯ
    σβΠϯύλʔϯͷҰͭ

    • Visitor ύλʔϯ͸ɺଟ͘ͷΫϥε
    ͔Β੒ΔΦϒδΣΫτͰߏ੒͞Ε
    ͨߏ଄͕͋ͬͯɺͦΕʹର͢Δ৽
    ͍͠ॲཧΛఆ͍ٛͨ͠ͱ͖ʹ࢖͏

    • Visitor ͸ΦϒδΣΫτߏ଄ͷཁૉ
    ʹର͢ΔॲཧΛද͢
    Visitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    LintingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    FormatingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    Visitee
    Accept(Visitor)
    Syntax
    Accept(Visitor)
    Message
    Accept(Visitor)

    View Slide

  77. Visitor ύλʔϯΛಋೖ͢Δ 4 εςοϓ
    1. ύʔαʔͰ Visitor ΠϯλʔϑΣʔεΛఆٛ͢Δ
    Visitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    LintingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    FormatingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    Visitee
    Accept(Visitor)
    Syntax
    Accept(Visitor)
    Message
    Accept(Visitor)

    View Slide

  78. Visitor ΠϯλʔϑΣʔε
    Ϧϯλʔ͸͜ͷ Visitor Λ࣮૷͢Δ͜ͱʹͳΔ
    // Visitor is for dispatching Protocol Buffer elements.
    type Visitor interface {
    VisitComment(*Comment)
    VisitEnum(*Enum) (next bool)
    VisitField(*Field) (next bool)
    VisitImport(*Import) (next bool)
    VisitMessage(*Message) (next bool)
    VisitPackage(*Package) (next bool)
    VisitSyntax(*Syntax) (next bool)

    View Slide

  79. Visitor ύλʔϯΛಋೖ͢Δ 4 εςοϓ
    1. ύʔαʔͰ Visitor ΠϯλʔϑΣʔεΛఆٛ͢Δ

    2. ύʔαʔͰ Visitee ΠϯλʔϑΣʔεΛఆٛ͢Δ
    Visitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    LintingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    FormatingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    Visitee
    Accept(Visitor)
    Syntax
    Accept(Visitor)
    Message
    Accept(Visitor)

    View Slide

  80. Visitee ΠϯλʔϑΣʔε
    ந৅ߏจ໦͸͜ͷ Visitee Λ࣮૷͢Δ͜ͱʹͳΔ
    // Visitee is implemented by all Protocol Buffer elements.
    type Visitee interface {
    Accept(v Visitor)
    }
    // Proto represents a protocol buffer de
    fi
    nition.
    type Proto struct {
    Syntax *Synta
    x

    // ProtoBody is a slice of sum type consisted of *Import, *Package, *Option, *Message, *Enum,
    *Service, *Extend and *EmptyStatement.
    ProtoBody []Visite
    e

    }

    View Slide

  81. Visitor ύλʔϯΛಋೖ͢Δ 4 εςοϓ
    1. ύʔαʔͰ Visitor ΠϯλʔϑΣʔεΛఆٛ͢Δ

    2. ύʔαʔͰ Visitee ΠϯλʔϑΣʔεΛఆٛ͢Δ

    3. ந৅ߏจ໦͕ Visitee ΠϯλʔϑΣʔεΛ࣮૷͢Δ
    Visitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    LintingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    FormatingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    Visitee
    Accept(Visitor)
    Syntax
    Accept(Visitor)
    Message
    Accept(Visitor)

    View Slide

  82. Visitee ࣮૷
    ϧʔτϊʔυʢProtoʣͷ Accept ؔ਺
    ͕ɺϦϯλʔʹΑͬͯݺ͹ΕΔ

    Accept ؔ਺Λ௨ͯ͠ Visitor ͕఻ൖ͞
    ΕΔ
    // Accept dispatches the call to the visitor.
    func (p *Proto) Accept(v Visitor) {
    p.Syntax.Accept(v)
    for _, body := range p.ProtoBody {
    body.Accept(v)
    }
    }

    View Slide

  83. // Accept dispatches the call to the visitor.
    func (s *Syntax) Accept(v Visitor) {
    if !v.VisitSyntax(s) {
    return
    }
    for _, comment := range s.Comments {
    comment.Accept(v)
    }
    }
    ͦΕͧΕͷϊʔυ͸ରԠ͢Δ Visit ؔ
    ਺ΛݺͿ

    Ҿ͖ଓ͖ɺAccept ؔ਺Λ௨ͯ͠
    Visitor ͕఻ൖ͞ΕΔ
    Visitee ࣮૷

    View Slide

  84. Visitor ύλʔϯΛಋೖ͢Δ 4 εςοϓ
    1. ύʔαʔͰ Visitor ΠϯλʔϑΣʔεΛఆٛ͢Δ

    2. ύʔαʔͰ Visitee ΠϯλʔϑΣʔεΛఆٛ͢Δ

    3. ந৅ߏจ໦͕ Visitee ΠϯλʔϑΣʔεΛ࣮૷͢Δ

    4. ੩తղੳثʢex. Ϧϯλʔʣ͕ Visitor Πϯλʔ
    ϑΣʔεΛ࣮૷͢Δ
    Visitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    LintingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    FormatingVisitor
    VisitSyntax(Syntax)
    VisitMessage(Message)
    Visitee
    Accept(Visitor)
    Syntax
    Accept(Visitor)
    Message
    Accept(Visitor)

    View Slide

  85. // LintingVisitor represents a visitor representing various lint operations.
    type LintingVisitor struct{}
    func (v * LintingVisitor) VisitSyntax(s *parser.Syntax) bool {
    if s.ProtobufVersion != 3 {
    v.AddFailuref(s.Meta.Pos, "Syntax should be 3 but was %q.", s.ProtobufVersion)
    }
    return false
    }

    View Slide

  86. // LintingVisitor represents a visitor representing various lint operations.
    type LintingVisitor struct{}
    func (v * LintingVisitor) VisitSyntax(s *parser.Syntax) bool {
    if s.ProtobufVersion != 3 {
    v.AddFailuref(s.Meta.Pos, "Syntax should be 3 but was %q.", s.ProtobufVersion)
    }
    return false
    }
    func (v * LintingVisitor) VisitMessage(message *parser.Message) bool {
    name := message.MessageNam
    e

    if !strs.IsUpperCamelCase(name) {
    expected := strs.ToUpperCamelCase(name)
    v.AddFailuref(message.Meta.Pos, "Message name %q must be UpperCamelCase like %q", name, expected)
    }
    return true
    }

    View Slide

  87. Visit!
    proto := parser.Parse(reader)
    linter := LintingVisitor{}
    proto.Accept(linter)
    formatter := FormatingVisitor{}
    proto.Accept(formatter)

    View Slide

  88. Visit!
    proto := parser.Parse(reader)
    linter := LintingVisitor{}
    proto.Accept(linter)
    formatter := FormatingVisitor{}
    proto.Accept(formatter)

    View Slide

  89. ͜ΕͰ͓͠·͍
    ศརͳந৅ߏจ໦ͷΠϯλʔϑΣʔεΛඋ͑ͨɺຊ֨తͳύʔαʔͱ
    ϨΩαʔ͕Ͱ͖ͨ

    Go ඪ४ϥΠϒϥϦ͚ͩͰɺ֎෦ϥΠϒϥϦ͸࢖ͬͯͳ͍

    Visitor ύλʔϯ͸ύʔαʔΛΑΓ࢖͍΍ͯ͘͘͢͠ΕΔ

    ந৅ߏจ໦Λ࢖ͬͨॲཧͷ௥Ճ͸৽͍͠ Visitor ͷ࣮૷ͰࡁΉͷͰɺ
    ύʔαʔͷίʔυ͸ม͑ͳ͍͍ͯ͘

    View Slide

  90. ৄ͍͠৘ใ
    ϨΩαʔͱύʔαʔ: github.com/yoheimuta/go-protoparser

    Ϧϯλʔ: github.com/yoheimuta/protolint

    ίϯύΠϥ: ίϯύΠϥ―ݪཧɾٕ๏ɾπʔϧ

    Visitor ύλʔϯ: ΦϒδΣΫτࢦ޲ʹ͓͚Δ࠶ར༻ͷͨΊͷσβΠϯύλʔϯ

    View Slide

  91. Thank you
    @yoheimuta

    [email protected]
    The Go gopher was designed by Renée French. Illustrations by tottie.

    View Slide