$30 off During Our Annual Pro Sale. View Details »

Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go

aereal
August 04, 2017
3.5k

Goで実装する軽量マークアップ言語パーサー / Gohn: parser written in Go

talked at builderscon tokyo 2017

aereal

August 04, 2017
Tweet

More Decks by aereal

Transcript

  1. GoͰ࣮૷͢Δ

    ܰྔϚʔΫΞοϓݴޠ

    ύʔαʔ
    id:aereal @ builderscon tokyo 2017

    View Slide

  2. ࿩͢͜ͱ
    • ܰྔϚʔΫΞοϓݴޠͱ͸ͯͳه๏ʹ͍ͭͯ

    • ςΩετॲཧͱύʔαʔδΣωϨʔλʔͷඞཁੑ

    • Go/goyaccʹΑΔ͸ͯͳه๏ύʔαʔͷ঺հ

    • goyaccͷԠ༻஌ࣝ

    View Slide

  3. ࣗݾ঺հ
    • id:aereal

    • GitHub: aereal

    • גࣜձࣾ͸ͯͳ

    ΞϓϦέʔγϣϯΤϯδχΞ

    View Slide

  4. ⚠͓͜ͱΘΓ⚠
    • αʔϏεΛ৮͍ͬͯͯײͨ͡

    ݸਓతͳ՝୊ҙࣝʹجͮ͘ϓϥΠϕʔτϫʔΫͰ͢

    • αʔϏεʹ࠾༻͞ΕΔ͔͸ෆ໌

    View Slide

  5. ࢀߟ৘ใ
    • http://b.hatena.ne.jp/aereal/2017gokyoto/

    • ͸ͯͳϒοΫϚʔΫͰλάΛ෇͚ͯϒΫϚ͍ͯ͠·͢

    View Slide

  6. ܰྔϚʔΫΞοϓݴޠͱ

    ͸ͯͳه๏

    View Slide

  7. ܰྔϚʔΫΞοϓݴޠͱ͸
    • LML = Lightweight Markup Language

    • HTML΍XMLͱϓϨʔϯςΩετͷதؒʹ͋Δ

    • Markdown, Textile, ͸ͯͳه๏, etc.

    View Slide

  8. ܰྔϚʔΫΞοϓݴޠͱ͸
    • LML = Lightweight Markup Language

    • HTML΍XMLͱϓϨʔϯςΩετͷதؒʹ͋Δ

    • Markdown, Textile, ͸ͯͳه๏, etc.

    View Slide

  9. ͸ͯͳه๏ͱ͸
    • ͸ͯͳ͕ఏڙ͢Δ͍͔ͭ͘ͷαʔϏεͰ࢖͑ΔLML

    • ͸ͯͳϒϩάɺ͸ͯͳμΠΞϦʔɺetc.

    • HTMLʹม׵͞ΕΔศརͳه๏

    • org-modeͱͪΐͬͱࣅ͍ͯΔจ๏

    View Slide

  10. * ݟग़͠1
    ** ݟग़͠2
    [http://127.0.0.1/:title=΅͘ͷIPͰ͢]
    - Ruby
    - Perl
    - Go
    + ى
    + ঝ
    + స
    + ݁

    View Slide

  11. ݟग़͠1
    ݟग़͠2

    ΅͘ͷIPͰ͢


    Ruby
    Perl
    Go


    ى


    ݁

    View Slide

  12. ࣮૷͍Ζ͍Ζ
    • ͸ͯͳϒϩάɺ͸ͯͳμΠΞϦʔɺ͸ͯͳάϧʔϓ

    • Text-Hatena (CPAN)

    • Text-Xatena (CPAN)

    • chris4403/WikiTextConverter

    • motemen/pandoc

    View Slide

  13. ࣮૷͍Ζ͍Ζ
    • ࢓༷ ≈ ࣮૷

    • ࣮૷͕͍Ζ͍Ζ͋Δ

    • ͭ·Γ

    • ࣮૷ͷ਺͚ͩ࢓༷͕ଘࡏ͢Δ

    • ࢓༷Λ஌Δʹ͸Perlͱਖ਼نදݱΛಡΈղ͘ඞཁ͕͋Δ

    View Slide

  14. खࠒͳ࣮૷͕ແͯ͘ࠔΔ
    • PerlҎ֎Ͱॻ͔ΕͨΞϓϦέʔγϣϯͰ

    ͸ͯͳه๏Λ࢖͑ΔΑ͏ʹ͍ͨ͠ɺ͚Ͳ……

    • Perlͷ֦ுਖ਼نදݱΛۦ࢖͍ͯ͠ΔͷͰҠ২΋େม

    • HTMLม׵·Ͱ΍Δύʔαʔ͕ଟ͍

    View Slide

  15. ϙʔλϏϦςΟ
    • ϒϥ΢βͰϥΠϒϓϨϏϡʔͱ͔͍ͨ͠͡ΌΜ

    • ೖྗʹର͢Δग़ྗ (AST) ͚ͩΛܾΊ͍ͨ

    • Perl΍Go΍Scala, JavaScriptͦͷଞͰॻ͖͍ͨ

    View Slide

  16. HTMLม׵·Ͱ΍Γͨ͘ͳ͍
    • ଟ͘ͷύʔαʔ࣮૷͕HTMLม׵·Ͱߦ͏

    • ҰํɺೖྗʹͲΕ͘Β͍HTMLΛڐՄ͢Δ͔͸

    αʔϏεຖ (!= ύʔαʔ࣮૷ຖ) ʹҟͳΔ

    • → ύʔαʔͱHTMLม׵Λ෼཭͍ͨ͠

    View Slide

  17. ͜Μͳ͸ͯͳه๏ύʔαʔ͕
    ΄͍͠
    • ϦϑΝϨϯεͨΓ͏Δૉ๿ͳ࣮૷

    • = ਖ਼نදݱͰͳΜͱ͔͠Α͏ͱ͍͗ͯ͢͠ͳ͍

    • ύʔε݁Ռ͕HTMLͰ͸ͳ͘தؒදݱ͕ಘΒΕΔ

    View Slide

  18. ࡾߦͰ·ͱΊΔͱ
    • AST͘Ε!!!

    View Slide

  19. ࣍ճ༧ࠂ
    • ಛఆͷݴޠʹґଘ͠ͳ͍

    ྑ͍͔Μ͡ͷςΩετॲཧάοζ͸ͳ͍΋ͷ͔

    • ͨͩ͠ (֦ு) ਖ਼نදݱҎ֎

    • Αͦ͞͏ͳςΩετॲཧٕज़Λ୳͠ʹ͍͖·͢

    View Slide

  20. ςΩετॲཧͱ

    ύʔαʔδΣωϨʔλʔ

    View Slide

  21. ςΩετॲཧͻͱΊ͙Γ
    • ςΩετॲཧͷςΫχοΫΛ͍Ζ͍Ζ঺հ

    • έʔεʹΑͬͯ͸ύʔαʔΛॻ͘·Ͱ΋ͳ͔ͬͨΓ͢Δ

    View Slide

  22. τʔΫϯͷग़ݱҐஔ
    "id:aereal".substring(3) // => "aereal"

    View Slide

  23. τʔΫϯͷग़ݱҐஔ
    • τʔΫϯͷग़ݱҐஔ͕ݻఆ௕ͳΒ͜Ε͘Β͍Ͱ΋

    • Մม௕ͩͱഁ୼͢Δ

    • ͓ͦͯ͠Αͦͷจ๏͸Մม௕ͷτʔΫϯ͹͔Γ

    View Slide

  24. ਖ਼نදݱ
    /id:(.+)/.match("id:aereal")[1]
    // => "aereal"

    View Slide

  25. ਖ਼نදݱ
    • ׅހͷඇରԠ͸ݕग़Ͱ͖ͳ͍

    • (POSIXͷਖ਼نදݱͰ͸ෆՄɺ

    Perlͷ֦ுਖ਼نදݱͰ͸Ͱ͖ͨ͸ͣ)

    • Ұຊ௼Γ͕Ͱ͖ͳ͔ͬͨΒɺ

    ޙड़ͷঢ়ଶ؅ཧΛߦ͏ඞཁ͕͋Δ

    View Slide

  26. ঢ়ଶભҠΛ؅ཧ
    var isInIdNotation = false;
    while (1) {
    if (isInIdNotation) {
    var name = readText(); // => "aereal"
    } else {
    switch (readChar()) {
    case ':':
    isInIdNotation = true;
    default:
    // ...
    }
    }
    }

    View Slide

  27. ঢ়ଶભҠΛ؅ཧ
    var isInIdNotation = false;
    var isInHeading = false;
    var isInUnorderedList = false;
    var isInOrderedList = false;
    while (1) {
    if (isInIdNotation)
    if (isInHeading)
    if (isInUnorderedList)
    if (isInOrderedList)
    }

    View Slide

  28. View Slide


  29. • ͲΕ΋จ๏Λ၆ᛌͮ͠Β͍

    • ϞδϡʔϧԽ͕೉͍͠

    • → খ͍͞෦඼ΛੵΈ্͍͛ͯ͘ελΠϧͰ࡞ΕͨΒ……

    View Slide

  30. ͦ͜Ͱyacc
    • ύʔαʔδΣωϨʔλʔͷ1ͭ

    • BNFʹࣅͨߏจنଇ͔ΒύʔαʔΛੜ੒͢Δ

    • ෳ਺ͷنଇΛ૊Έ߹Θͤͯ1ͭͷنଇΛ࡞Γ্͛Δ

    • ίʔϧόοΫελΠϧͰ

    نଇΛϓϩάϥϜʹม׵͢Δ (ؐݩɺreduce)

    View Slide

  31. https://tools.ietf.org/html/rfc7230
    HTTP-Message = start-line
    *( header-field CRLF)
    CRLF
    [ message-body]
    start-line = request-line / status-line

    View Slide

  32. yacc
    • BNFͱ͍͏ந৅తͳํ๏Ͱطड़Ͱ͖Δͷ͕Α͍

    • ݴޠ಺DSLʹରͯ͠ϙʔλϏϦςΟͰ༏Δ

    • ϨΩαʔ (ࣈ۟ղੳث) ͸ผ్࣮૷͢Δඞཁ͕͋Δ

    • ߏจنଇͷίʔϧόοΫ෦͕

    ΤσΟλͰϋΠϥΠτ͞Εͳ͍ (ͳʹ͔͍͍ํ๏͋Γͦ͏)

    View Slide

  33. ࣍ճ༧ࠂ
    • yaccΑͦ͞͏ͱ͍͏͜ͱ͕Θ͔ͬͨ

    • GoͱyaccΛ૊Έ߹ΘͤΒΕΔͷ͔

    • ͸ͨͯ͠͸ͯͳه๏ύʔαʔΛ࡞Δ͜ͱ͕Ͱ͖Δͷ͔

    View Slide

  34. https://git.io/v7gcD
    github.com/aereal/gohn

    View Slide

  35. gohn
    • Written in Go w/goyacc

    • pronounce as `gone`

    • ओཁͳه๏͸࣮૷ࡁΈ

    View Slide

  36. gohnͷσβΠϯ
    • ඪ४ೖྗ͔Β͸ͯͳه๏Λड͚औΓɺ

    • ඪ४ग़ྗʹASTΛJSONʹγϦΞϥΠζͯ͠ग़ྗ͢Δ

    • → HTML΁ͷม׵͸ผ్࣮૷͢Δ

    • ͱͯ΋UNIXత

    View Slide

  37. AST
    • JSONʹγϦΞϥΠζ

    • JSON schemaΛެ։͍ͯ͠Δ

    • εΩʔϚ͔ΒHTMLม׵ثΛࣗಈੜ੒͢Δ͜ͱ΋Ͱ͖ͦ͏

    • https://github.com/aereal/gohn/blob/master/schema.json

    View Slide

  38. Goͱyacc
    • goyaccͱ͍͏πʔϧ͕͋Δ

    • go get golang.org/x/tools/cmd/goyacc
    • ΞΫγϣϯΛGoͰॻ͚Δ

    View Slide

  39. Goͱࣈ۟ղੳ
    • ࣈ۟ղੳ = ಡΜͩจࣈ͕ͲΜͳҙຯΛ࣋ͭͷ͔ฦ͢

    • text/scannerͱ͍͏ඪ४ύοέʔδ͕ศར

    • ڍಈΛΧελϚΠζͰ͖Δ

    • τʔΫϯΛফඅͨ͠࠷ޙͷҐஔΛه࿥ͯ͘͠ΕΔͷͰ

    Τϥʔϝοηʔδͷߏஙָ͕

    View Slide

  40. σϞ

    View Slide

  41. Ԡ༻ฤ

    View Slide

  42. HTTPه๏
    [http://example.com/]
    # 

    # http://example.com/
    #
    [http://127.0.0.1/:title=΅͘ͷIP]
    # 

    # ΅͘ͷIP
    #

    View Slide

  43. HTTPه๏
    • ΞϯΧʔϦϯΫʹม׵͞ΕΔه๏

    • ຤ඌʹల։࣌ͷΦϓγϣϯΛ `:` ʹଓ͚ͯطड़Ͱ͖Δ

    • `:` ͸URLͷҰ෦ʹݱΕΔ͜ͱ͕͋Δ

    • → ࣍ͷ1จࣈΛಡΉ͚ͩͰ͸:titleͷ։͔࢝൑அͰ͖ͳ͍

    View Slide

  44. ࠷ॳʹݱΕΔ `:` ͸εΩʔϜ෦ͱݟͳͯ͠ແࢹ͢Δ͜ͱʹ
    if !l.seenColon {
    l.seenColon = true
    return false // maybe part of URL
    } else {
    return true
    }
    https://github.com/aereal/gohn/blob/master/parser/
    lex.go#L100

    View Slide

  45. ࠶ؼతͳϧʔϧ
    • N > 1ͷࢠنଇ͔ΒͳΔنଇͷॻ͖ํ

    • appendͷॱ൪͚ͩؒҧ͑ͳ͍Α͏ʹ

    View Slide

  46. http_options:
    http_option
    {
    $$ = []string{$1}
    }
    | http_option http_options
    {
    options := $2
    $$ = append([]string{$1}, options...)
    }

    View Slide

  47. ςετ
    • Table-driven tests͕Φεεϝ

    • https://github.com/golang/go/wiki/TableDrivenTests

    • lexerΛؚΉparserͷػೳςετ͚ͩͰे෼ͩͱࢥ͏

    • https://github.com/aereal/gohn/blob/master/parser/
    parser_test.go#L17

    View Slide

  48. σόοά
    • tokenͷࣝผࢠ (int) ͔Β໊લ (string) Λ

    ٯҾ͖͢ΔϝιουΛఆ͓ٛͯ͘͠ͱศར

    • print͢ΔʹͤΑσόοΨΛ࢖͏ʹͤΑ

    • https://github.com/aereal/gohn/blob/master/parser/
    lex.go#L29

    View Slide

  49. ·ͱΊ

    View Slide

  50. Go/goyacc͸ศར
    • Go͸ෳࡶͳCLIΛϙʔλϒϧʹ࡞Δͷʹ޲͍͍ͯΔ

    • goyacc (yacc) ͸ෳࡶͳจ๏ͷύʔαʔʹ޲͍͍ͯΔ

    View Slide

  51. ܰྔϚʔΫΞοϓݴޠ͸
    ೉͍͠
    • ਓؒʹͱͬͯͷಡΈॻ͖͠΍͢͞ͱ

    ػցʹͱͬͯͷಡΈॻ͖͠΍͢͞͸ҟͳΔ

    • ݫ֨ͳจ๏نଇʹैΘͤΔύʔαʔΑΓ

    ޡΓగਖ਼ͯ͘͠ΕΔ΄͏͕࣮༻తͳͷͰ͸?

    View Slide

  52. ύʔαʔ࡞Γ͸ָ͍͠
    • Ͱ͖Δ͜ͱɺ΍Γ͍ͨ͜ͱɺؔ৺ͷ͋Δ͜ͱ͕

    ͏·͘όϥϯε͞Εͨ໨ඪ

    • WebͱςΩετॲཧ

    • খ͞ͳ໨ඪΛগͣͭ͠ੵΈॏͶ͍͚ͯΔ

    • ʮࠓ೔͸Ϧετه๏ͷ࣮૷͕Ͱ͖ͨͧʯ

    View Slide

  53. ڵຯΛ࣋ͬͯ͘Εͨਓ΁
    • ·ͣ͸JSONͷύʔαʔΛॻ͍ͯΈΔͱΑͦ͞͏

    • RFC, relaxed JSON, etc. ʹൃలͤͯ͞ΈΔ

    • ࣍͸ࣈ۟ղੳثΛखॻ͖ͯ͠ΈΔ

    • ࣍͸ߏจղੳث΋खॻ͖ͯ͠ΈΔ

    View Slide

  54. ׬

    View Slide