Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Let’s write a parser! [SoundCloud HQ edition]

Let’s write a parser! [SoundCloud HQ edition]

Denis Defreyne

May 17, 2016
Tweet

More Decks by Denis Defreyne

Other Decks in Programming

Transcript

  1. Let’s write
    a parser!
    DENIS DEFREYNE / SOUNDCLOUD, BERLIN / MAY 17TH, 2016

    View Slide

  2. 1. Language
    2

    View Slide

  3. I am Denis.
    3

    View Slide

  4. But how do you know that I am Denis?

    4

    View Slide

  5. But how do you know that I am Denis?

    I told you. I wrote it down. You’ve
    probably seen me before. Etc.
    5

    View Slide

  6. But how do you know that I am Denis?

    You understand English.
    6

    View Slide

  7. Computers are stupid.
    7

    View Slide

  8. 8
    $ git commit --message="Fix bugs"

    View Slide

  9. 9
    def greet(name)
    puts "Hello, #{name}"
    end

    View Slide

  10. 10
    def greet(name: String): Unit = {
    println(s"Hello, $name!")
    }

    View Slide

  11. Text forms a language,

    but computers don’t know that.
    11

    View Slide

  12. 2. Parsing
    12

    View Slide

  13. Basic idea:
    13
    Parser objects that are small,
    composable, and purely functional.

    View Slide

  14. 14
    def read(input, pos)

    View Slide

  15. 15
    def read(input, pos)
    Success.new(pos + 1)
    end

    View Slide

  16. 16
    def read(input, pos)
    Failure.new(pos)
    end

    View Slide

  17. 17
    char("H")
    Succeeds if the next character

    is the given one.

    View Slide

  18. 18
    char("H").apply("Hello")

    View Slide

  19. 18
    H e l l o
    char("H").apply("Hello")

    View Slide

  20. 18
    H e l l o
    0 1 2 3 4
    char("H").apply("Hello")

    View Slide

  21. 18
    H e l l o
    0 1 2 3 4
    char("H").apply("Hello")

    View Slide

  22. 18
    H e l l o
    0 1 2 3 4
    char("H").apply("Hello")

    View Slide

  23. 18
    H e l l o
    0 1 2 3 4
    char("H").apply("Hello")
    Success(pos = 1)

    View Slide

  24. 19
    char("H").apply("Adiós")

    View Slide

  25. 19
    A d i ó s
    0 1 2 3 4
    char("H").apply("Adiós")

    View Slide

  26. 19
    A d i ó s
    0 1 2 3 4
    char("H").apply("Adiós")

    View Slide

  27. 19
    A d i ó s
    0 1 2 3 4
    char("H").apply("Adiós")

    View Slide

  28. Failure(pos = 0)
    19
    A d i ó s
    0 1 2 3 4
    char("H").apply("Adiós")

    View Slide

  29. 20
    if input[pos] == @char
    Success.new(pos + 1)
    else
    Failure.new(pos)
    end

    View Slide

  30. 21
    seq(a, b)
    Succeeds if both given parsers

    succeed in sequence.

    View Slide

  31. 22
    seq(char("H"), char("e")).apply("Hello")

    View Slide

  32. H e l l o
    22
    0 1 2 3 4
    seq(char("H"), char("e")).apply("Hello")

    View Slide

  33. H e l l o
    22
    0 1 2 3 4
    seq(char("H"), char("e")).apply("Hello")

    View Slide

  34. H e l l o
    22
    0 1 2 3 4
    seq(char("H"), char("e")).apply("Hello")

    View Slide

  35. H e l l o
    22
    0 1 2 3 4
    seq(char("H"), char("e")).apply("Hello")

    View Slide

  36. H e l l o
    22
    0 1 2 3 4
    seq(char("H"), char("e")).apply("Hello")
    Success(pos = 2)

    View Slide

  37. 23
    seq(
    char("H"),
    char("e"),
    char("l"),
    char("l"),
    char("o"),
    )

    View Slide

  38. 24
    string(s)
    Succeeds if all characters

    in the given string

    can be read in sequence.

    View Slide

  39. H e l l o
    25
    0 1 2 3 4
    string("Hello").apply("Hello")

    View Slide

  40. H e l l o
    25
    0 1 2 3 4
    string("Hello").apply("Hello")

    View Slide

  41. H e l l o
    25
    0 1 2 3 4
    string("Hello").apply("Hello")

    View Slide

  42. H e l l o
    25
    0 1 2 3 4
    string("Hello").apply("Hello")
    Success(pos = 5)

    View Slide

  43. 26
    eof()
    Succeeds at the end of input;

    fails otherwise.

    View Slide

  44. H e l l o
    27
    0 1 2 3 4
    seq(string("Hello"), eof).apply("Hello")

    View Slide

  45. H e l l o
    27
    0 1 2 3 4
    seq(string("Hello"), eof).apply("Hello")

    View Slide

  46. H e l l o
    27
    0 1 2 3 4
    seq(string("Hello"), eof).apply("Hello")

    View Slide

  47. H e l l o
    27
    0 1 2 3 4
    seq(string("Hello"), eof).apply("Hello")

    View Slide

  48. H e l l o
    27
    0 1 2 3 4
    seq(string("Hello"), eof).apply("Hello")
    Success(pos = 5)

    View Slide

  49. 28
    0 1 2 3 4 5
    H e l l o !
    seq(string("Hello"), eof).apply("Hello!")

    View Slide

  50. 28
    0 1 2 3 4 5
    H e l l o !
    seq(string("Hello"), eof).apply("Hello!")

    View Slide

  51. 28
    0 1 2 3 4 5
    H e l l o !
    seq(string("Hello"), eof).apply("Hello!")

    View Slide

  52. 28
    0 1 2 3 4 5
    H e l l o !
    seq(string("Hello"), eof).apply("Hello!")

    View Slide

  53. 28
    0 1 2 3 4 5
    Failure(pos = 5)
    H e l l o !
    seq(string("Hello"), eof).apply("Hello!")

    View Slide

  54. 29
    alt(a, b)
    Succeeds if either of the

    given parsers succeed.

    View Slide

  55. A d i ó s
    30
    0 1 2 3 4
    alt(char("H"), char("A")).apply("Adiós")

    View Slide

  56. A d i ó s
    30
    0 1 2 3 4
    alt(char("H"), char("A")).apply("Adiós")

    View Slide

  57. A d i ó s
    30
    0 1 2 3 4
    alt(char("H"), char("A")).apply("Adiós")

    View Slide

  58. A d i ó s
    30
    0 1 2 3 4
    alt(char("H"), char("A")).apply("Adiós")
    Success(pos = 1)

    View Slide

  59. 31
    whitespace_char =
    alt(
    char(" "),
    char("\t"),
    char("\r"),
    char("\n"),
    )

    View Slide

  60. 32
    opt(p)
    Succeeds always, but only

    advances if p succeeds.

    View Slide

  61. 33
    repeat(p)
    Succeeds always, and attempts

    to apply p as often as possible.

    View Slide

  62. 34
    repeat(whitespace_char)

    View Slide

  63. 35
    intersperse(a, b)
    Alternates between a and b.,

    always ending with a.

    View Slide

  64. 36
    intersperse(char("a"), char(",")).apply("a,a,b")

    View Slide

  65. a , a , b
    36
    0 1 2 3 4
    intersperse(char("a"), char(",")).apply("a,a,b")

    View Slide

  66. a , a , b
    36
    0 1 2 3 4
    intersperse(char("a"), char(",")).apply("a,a,b")

    View Slide

  67. a , a , b
    36
    0 1 2 3 4
    intersperse(char("a"), char(",")).apply("a,a,b")

    View Slide

  68. a , a , b
    36
    0 1 2 3 4
    intersperse(char("a"), char(",")).apply("a,a,b")

    View Slide

  69. a , a , b
    36
    0 1 2 3 4
    intersperse(char("a"), char(",")).apply("a,a,b")
    Success(pos = 3)

    View Slide

  70. 37
    etc.

    View Slide

  71. 3. Examples
    38

    View Slide

  72. 39
    720
    6
    29530

    View Slide

  73. 40
    digit =
    alt(
    *('0'..'9')
    .map { |c| char(c) }
    )


    View Slide

  74. 41
    digit = char_in('0'..'9')






    View Slide

  75. 42
    digit = char_in('0'..'9')
    nat_number =
    seq(digit, repeat(digit))

    View Slide

  76. 43
    digit = char_in('0'..'9')
    nat_number =
    repeat1(digit)

    View Slide

  77. 44
    digit = char_in('0'..'9')


    nat_number =
    repeat1(digit)
    .capture

    View Slide

  78. 44
    digit = char_in('0'..'9')


    nat_number =
    repeat1(digit)
    .capture

    Success(pos = 3, data = "720")

    View Slide

  79. 45
    def read(input, pos)

    View Slide

  80. 46
    def read(input, pos)
    Success.new(pos + 1)
    end

    View Slide

  81. 47
    def read(input, pos)
    Success.new(pos + 1, "blahblah")
    end

    View Slide

  82. 48
    dec_number =
    seq(
    nat_number,
    char('.'),
    nat_number,
    )

    View Slide

  83. 49
    Horan,Niall,93
    Payne,Liam,93
    Tomlinson,Louis,91
    Styles,Harry,94
    Malik,Zayn,93

    View Slide

  84. 50
    field =

    repeat(char_not_in(',', "\n"))

    line =

    intersperse(field, char(','))

    file =

    seq(

    line.intersperse(char("\n")),

    eof(),

    )

    View Slide

  85. 50
    field =

    repeat(char_not_in(',', "\n"))

    line =

    intersperse(field, char(','))

    file =

    seq(

    line.intersperse(char("\n")),

    eof(),

    )

    View Slide

  86. 50
    field =

    repeat(char_not_in(',', "\n"))

    line =

    intersperse(field, char(','))

    file =

    seq(

    line.intersperse(char("\n")),

    eof(),

    )

    View Slide

  87. 50
    field =

    repeat(char_not_in(',', "\n"))

    line =

    intersperse(field, char(','))

    file =

    seq(

    line.intersperse(char("\n")),

    eof(),

    )

    View Slide

  88. 51
    Horan,Niall,93
    Payne,Liam,93
    Tomlinson,Louis,91
    Styles,Harry,94
    Malik,Zayn,93

    View Slide

  89. 52
    [
    ["Horan", "Niall", 93],
    ["Payne", "Liam", 93],
    ["Tomlinson", "Louis", 91],
    ["Styles", "Harry", 94],
    ["Malik", "Zayn", 93],
    ]

    View Slide

  90. 53
    add(1, mul(2, 3))
    sub(5, 4)

    View Slide

  91. 54
    lparen = char('(')
    rparen = char(')')
    comma = char(',')

    View Slide

  92. 55
    expr =
    alt(lazy { funcall }, nat_number)

    View Slide

  93. 56
    funcall =
    seq(
    identifier,
    lparen,
    arg_list,
    rparen,
    )

    View Slide

  94. 57
    letter =

    char_in('a'..'z')

    identifier =

    repeat1(letter)

    View Slide

  95. 58
    arg_list =


    intersperse(
    expr,
    seq(comma, whitespace),
    )


    View Slide

  96. 59
    arg_list =

    opt(

    intersperse(
    expr,
    seq(comma, whitespace),
    )

    )

    View Slide

  97. 60

    View Slide

  98. 60
    expr_list =

    intersperse(expr, char("\n"))


    View Slide

  99. 60
    expr_list =

    intersperse(expr, char("\n"))

    program =

    seq(expr_list, eof)

    View Slide

  100. 61
    add(1, mul(2, 3))
    sub(5, 4)

    View Slide

  101. 62
    Success(pos = 27)

    View Slide

  102. Where’s the data!!!
    63

    View Slide

  103. 64
    funcall =
    seq(
    identifier,
    lparen,
    arg_list,
    rparen,
    ) 


    View Slide

  104. 65
    funcall =
    seq(
    identifier.capture,
    lparen,
    arg_list,
    rparen,
    ) 


    View Slide

  105. 66
    funcall =
    seq(
    identifier.capture,
    lparen,
    arg_list,
    rparen,
    ).map do |data|
    # stuff here
    end

    View Slide

  106. 67
    funcall =
    seq(
    identifier.capture,
    lparen,
    arg_list,
    rparen,
    ).map do |data|
    FunCall.new(data[0], data[2])
    end

    View Slide

  107. 68
    add(1, mul(2, 3))
    sub(5, 4)

    View Slide

  108. 69
    [
    FunCall.new("add", [
    1,
    FunCall.new("mul", [2, 3]),
    ]),
    FunCall.new("sub", [5, 4]),
    ]

    View Slide

  109. And that is how you can write a parser.

    70

    View Slide

  110. And that is how you can write a parser

    using parser combinators.
    71

    View Slide

  111. 72
    ḌPARSE

    View Slide

  112. 72
    ḌPARSE
    A GOOD PARSER LIBRARY FOR RUBY

    View Slide

  113. github.com/ddfreyne/d-parse
    73

    View Slide

  114. github.com/ddfreyne/d-parse
    73

    View Slide

  115. github.com/ddfreyne/d-parse
    73

    View Slide

  116. 74

    View Slide

  117. 74
    require 'd-parse'


    View Slide

  118. 74
    require 'd-parse'

    module JSONGrammar

    View Slide

  119. 74
    require 'd-parse'

    module JSONGrammar
    extend DParse::DSL


    View Slide

  120. 74
    require 'd-parse'

    module JSONGrammar
    extend DParse::DSL

    DIGIT = char_in('0'..'9')

    NUMBER = repeat1(DIGIT)

    end


    View Slide

  121. 74
    require 'd-parse'

    module JSONGrammar
    extend DParse::DSL

    DIGIT = char_in('0'..'9')

    NUMBER = repeat1(DIGIT)

    end

    res = Grammar::NUMBER.apply('8700')

    View Slide

  122. 75

    View Slide

  123. 75
    case res

    View Slide

  124. 75
    case res
    when DParse::Success

    puts(res.data.inspect)

    View Slide

  125. 75
    case res
    when DParse::Success

    puts(res.data.inspect)
    when DParse::Failure

    $stderr.puts res.pretty_message

    exit(1)

    end

    View Slide

  126. 76

    expected identifier at line 1, column 36
    def reticulate(splines, threshold, ) {

    View Slide

  127. 77
    github.com/ddfreyne/d-parse

    View Slide

  128. 77
    github.com/ddfreyne/d-parse
    PRE-
    ALPHA!

    BE AN
    EARLY


    ADOPTER!


    View Slide

  129. 78
    My name is Denis.
    Ready to parse your questions.
    Find me at [email protected], or @denis on Slack.

    View Slide