Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ruleguard vs Semgrep vs CodeQL

Ruleguard vs Semgrep vs CodeQL

```
| Topic | Ruleguard vs Semgrep vs CodeQL |
| Location | online |
| Date | October 17, 202 0 |
```

Sub-topics:

- go/analysis example
- Ruleguard example
- Semgrep example
- CodeQL example
- Using ruleguard from golagnci-lint
- Ruleguard guide
- How ast matching works
- Type matching examples
- Side-by-side comparison

Iskander (Alex) Sharipov

October 17, 2020
Tweet

More Decks by Iskander (Alex) Sharipov

Other Decks in Programming

Transcript

  1. Ruleguard
    CodeQL
    Semgrep
    Iskander (pronounced as “Alex”) @quasilyte
    vs
    vs

    View Slide

  2. Me & static analysis
    go-critic NoVerify Ruleguard
    .-

    View Slide

  3. Our starting point
    We assume that:
    ● You know that static analysis is cool

    View Slide

  4. Our starting point
    We assume that:
    ● You know that static analysis is cool
    ● You’re using golangci-lint

    View Slide

  5. Our starting point
    We assume that:
    ● You know that static analysis is cool
    ● You’re using golangci-lint
    ● You want to create custom code checkers

    View Slide

  6. /browsing memes/
    Trying to come up
    with linting idea...

    View Slide

  7. !

    View Slide

  8. Somewhere on Twitter...
    Excellent!

    View Slide

  9. func f(w io.Writer, b []byte) {
    - io.WriteString(w, string(b))
    + w.Write(b)
    }
    Bad code example

    View Slide

  10. 6 hours later...

    View Slide

  11. 6 hours later...
    W
    hy?!
    WDYM
    AST type types are not “types”?!
    No results on
    stackoverflow?!
    How?!

    View Slide

  12. Let’s create our own linter!
    We’ll use a fancy go/analysis framework
    -...

    View Slide

  13. var analyzer = &analysis.Analyzer{
    Name: "writestring",
    Doc: "find sloppy io.WriteString() usages",
    Run: run,
    }
    func run(pass *analysis.Pass) (interface{}, error) {
    // Analyzer implementation...
    return nil, nil
    }
    Analyzer definition

    View Slide

  14. for _, f := range pass.Files {
    ast.Inspect(f, func(n ast.Node) bool {
    // Check n node...
    })
    }
    Analyzer implementation

    View Slide

  15. // 1. Is it a call expression?
    call, ok := n.(*ast.CallExpr)
    if !ok || len(call.Args) != 2 {
    return true
    }
    Check n node: part 1

    View Slide

  16. // 2. Is it io.WriteString() call?
    fn, ok := call.Fun.(*ast.SelectorExpr)
    if !ok || fn.Sel.Name != "WriteString" {
    return true
    }
    pkg, ok := fn.X.(*ast.Ident)
    if !ok || pkg.Name != "io" {
    return true
    }
    Check n node: part 2

    View Slide

  17. // 3. Is second arg a string(b) expr?
    stringCall, ok := call.Args[1].(*ast.CallExpr)
    if !ok || len(stringCall.Args) != 1 {
    return true
    }
    stringFn, ok := stringCall.Fun.(*ast.Ident)
    if !ok || stringFn.Name != "string" {
    return true
    }
    Check n node: part 3

    View Slide

  18. // 4. Does b has a type of []byte?
    b := stringCall.Args[0]
    if pass.TypesInfo.TypeOf(b).String() != "[]byte" {
    return true
    }
    Check n node: part 4

    View Slide

  19. // 5. Report the issue
    msg := "io.WriteString(w, string(b)) -> w.Write(b)"
    pass.Reportf(call.Pos(), msg)
    Check n node: part 5

    View Slide

  20. func main() {
    singlechecker.Main(analyzer)
    }
    Main function definition

    View Slide

  21. It works
    But not without problems...
    .-

    View Slide

  22. func f(io InputController, b []byte) {
    io.WriteString(w, string(b))
    }
    io could be something else!

    View Slide

  23. func f(io InputController, b []byte) {
    io.WriteString(w, string(b))
    }
    io could be something else!
    Need to check that io is a package

    View Slide

  24. import "github.com/quasilyte/io" // not stdlib!
    func f(b []byte) {
    io.WriteString(w, string(b))
    }
    io could be something else!
    But even if it is a package we can get confused

    View Slide

  25. The warning message is not perfect

    View Slide

  26. The warning message is not perfect
    [ ]byte variable is called “x”, not “b”

    View Slide

  27. It could be worse
    .-

    View Slide

  28. Let’s try again
    Now with ruleguard
    -...

    View Slide

  29. func writeString(m fluent.Matcher) {
    m.Match(`io.WriteString($w, string($b))`).
    Where(m["b"].Type.Is("[]byte")).
    Report("$$ -> $w.Write($b)")
    }
    writeString rule

    View Slide

  30. func writeString(m fluent.Matcher) {
    m.Match(`io.WriteString($w, string($b))`).
    Where(m["b"].Type.Is("[]byte")).
    Report("$$ -> $w.Write($b)")
    }
    writeString rule
    A rules group named writeString
    (May include several rules)

    View Slide

  31. func writeString(m fluent.Matcher) {
    m.Match(`io.WriteString($w, string($b))`).
    Where(m["b"].Type.Is("[]byte")).
    Report("$$ -> $w.Write($b)")
    }
    writeString rule
    AST pattern

    View Slide

  32. func writeString(m fluent.Matcher) {
    m.Match(`io.WriteString($w, string($b))`).
    Where(m["b"].Type.Is("[]byte")).
    Report("$$ -> $w.Write($b)")
    }
    writeString rule
    Result filter

    View Slide

  33. func writeString(m fluent.Matcher) {
    m.Match(`io.WriteString($w, string($b))`).
    Where(m["b"].Type.Is("[]byte")).
    Report("$$ -> $w.Write($b)")
    }
    writeString rule
    Warning message template

    View Slide

  34. The warning message is perfect!

    View Slide

  35. func writeString(m fluent.Matcher) {
    m.Match(`io.WriteString($w, string($b))`).
    Where(m["b"].Type.Is("[]byte")).
    Suggest("$w.Write($b)")
    }
    writeString rule
    Auto fix replacement template
    (can be combined with Report)

    View Slide

  36. With -fix, suggestions are applied automagically

    View Slide

  37. Let’s try semgrep
    -...

    View Slide

  38. rules:
    - id: writestring
    patterns:
    - pattern: io.WriteString($W, string($B))
    message: "use $W.Write($B)"
    languages: [go]
    severity: ERROR
    writestring.yml

    View Slide

  39. Something went wrong...

    View Slide

  40. Something went wrong...
    False positive!

    View Slide

  41. rules:
    - id: writestring
    patterns:
    - pattern: io.WriteString($W, string($B))
    message: "use $W.Write($B)"
    languages: [go]
    severity: ERROR
    writestring.yml
    TODO: type filters

    View Slide

  42. By the way...
    Have you heard of YAML5?
    -...

    View Slide

  43. {
    rules: [
    {
    id: 'writestring',
    patterns: [
    {pattern: 'io.WriteString($W, string($B))'},
    ],
    message: 'use $W.Write($B)',
    languages: ['go'],
    severity: 'ERROR',
    },
    ],
    }
    Using YAML5 format for semgrep rules

    View Slide

  44. Let’s try CodeQL
    .-

    View Slide

  45. View Slide

  46. View Slide

  47. View Slide

  48. View Slide

  49. View Slide

  50. import go
    from CallExpr c,
    Expr w,
    ConversionExpr conv,
    SelectorExpr fn
    where w = c.getArgument(0)
    and conv = c.getArgument(1)
    and fn = c.getCalleeExpr()
    and fn.getSelector().getName() = "WriteString"
    and fn.getBase().toString() = "io"
    and conv.getOperand().getType() instanceof ByteSliceType
    and conv.getType() instanceof StringType
    select c, "use " + w + ".Write(" + conv.getOperand() + ")"
    CodeQL query

    View Slide

  51. How to run?
    ● Use the online query console
    ● Select quasilyte/codeql-test project
    ● Copy/paste query from the previous slide

    View Slide

  52. CodeQL pros
    ● SSA support

    View Slide

  53. CodeQL pros
    ● SSA support
    ● Taint analysis (source-sink)

    View Slide

  54. CodeQL pros
    ● SSA support
    ● Taint analysis (source-sink)
    ● Not limited by (Go) syntax rules

    View Slide

  55. CodeQL pros
    ● SSA support
    ● Taint analysis (source-sink)
    ● Not limited by (Go) syntax rules
    ● Real declarative programming language

    View Slide

  56. CodeQL pros
    ● SSA support
    ● Taint analysis (source-sink)
    ● Not limited by (Go) syntax rules
    ● Real declarative programming language
    ● Backed by GitHub

    View Slide

  57. CodeQL pros
    ● SSA support
    ● Taint analysis (source-sink)
    ● Not limited by (Go) syntax rules
    ● Real declarative programming language
    ● Backed by GitHub Microsoft

    View Slide

  58. CodeQL pros
    ● SSA support
    ● Taint analysis (source-sink)
    ● Not limited by (Go) syntax rules
    ● Real declarative programming language
    ● Backed by GitHub Microsoft
    ● 1st class GitHub integration

    View Slide

  59. Truth be told...
    Ruleguard
    and
    Semgrep
    CodeQL

    View Slide

  60. CodeQL cons
    The main points that I want to cover:
    1. Steep learning curve
    2. Simple things are not simple
    3. Non-trivial QL may look alien for many people

    View Slide

  61. Why Ruleguard then?
    ● Very easy to get started (just “go get” it)

    View Slide

  62. Why Ruleguard then?
    ● Very easy to get started (just “go get” it)
    ● Rules are written in pure Go

    View Slide

  63. Why Ruleguard then?
    ● Very easy to get started (just “go get” it)
    ● Rules are written in pure Go
    ● Integrated in golangci-lint and go-critic

    View Slide

  64. Why Ruleguard then?
    ● Very easy to get started (just “go get” it)
    ● Rules are written in pure Go
    ● Integrated in golangci-lint and go-critic
    ● Simple things are simple

    View Slide

  65. Why Ruleguard then?
    ● Very easy to get started (just “go get” it)
    ● Rules are written in pure Go
    ● Integrated in golangci-lint and go-critic
    ● Simple things are simple
    ● Very Go-centric (both pro and con)

    View Slide

  66. Using ruleguard from golangci
    -...

    View Slide

  67. Enabling Ruleguard
    1. Install golangci-lint on your pipeline (if not yet)
    2. Prepare a rules file (a Go file with ruleguard rules)
    3. Enable ruleguard in golangci-lint config
    You can also use Ruleguard directly or via go-critic.

    View Slide

  68. ruleguard

    View Slide

  69. go-critic ruleguard

    View Slide

  70. go-critic
    golangci ruleguard

    View Slide

  71. linters:
    enable:
    - gocritic
    linters-settings:
    gocritic:
    enabled-checks:
    - ruleguard
    settings:
    ruleguard:
    rules: "rules.go"
    .golangci.yml checklist

    View Slide

  72. linters:
    enable:
    - gocritic
    linters-settings:
    gocritic:
    enabled-checks:
    - ruleguard
    settings:
    ruleguard:
    rules: "rules.go"
    .golangci.yml checklist
    go-critic linter
    should be enabled

    View Slide

  73. linters:
    enable:
    - gocritic
    linters-settings:
    gocritic:
    enabled-checks:
    - ruleguard
    settings:
    ruleguard:
    rules: "rules.go"
    .golangci.yml checklist
    ruleguard checker
    should be enabled

    View Slide

  74. linters:
    enable:
    - gocritic
    linters-settings:
    gocritic:
    enabled-checks:
    - ruleguard
    settings:
    ruleguard:
    rules: "rules.go"
    .golangci.yml checklist
    rules param should
    be set

    View Slide

  75. Ruleguard guide

    View Slide

  76. m.Match(`pattern1`, `pattern2`)
    Match() does the syntax matching

    View Slide

  77. m.Match(`pattern1`, `pattern2`)
    Match() does the syntax matching
    Matching alternations:
    pattern1|pattern2

    View Slide

  78. `$x = $x` pattern string

    View Slide

  79. `$x = $x` pattern string
    Parsed AST

    View Slide

  80. `$x = $x` pattern string
    Parsed AST
    Modified AST (with meta nodes)

    View Slide

  81. func match(pat, n ast.Node) bool
    pat is a compiled pattern
    n is a node being matched
    AST matching engine

    View Slide

  82. Algorithm
    ● Both pat and n are traversed
    ● Non-meta nodes are compared normally
    ● pat meta nodes are separate cases
    ● Named matches are collected (capture)
    ● Some patterns may involve backtracking

    View Slide

  83. ● $x is a simple “match any” named match
    ● $_ is a “match any” unnamed match
    ● $*_ matches zero or more nodes
    Meta node examples

    View Slide

  84. Pattern matching
    =
    $x $x
    +=
    a 10
    Pattern $x=$x Target a+=10

    View Slide

  85. Pattern matching
    =
    $x $x
    +=
    a 10
    Pattern $x=$x Target a+=10

    View Slide

  86. Pattern matching
    =
    $x $x
    =
    a 10
    Pattern $x=$x Target a=10

    View Slide

  87. Pattern matching
    =
    $x $x
    =
    a 10
    Pattern $x=$x Target a=10

    View Slide

  88. Pattern matching
    =
    $x $x
    =
    a 10
    Pattern $x=$x Target a=10
    $x is bound to a

    View Slide

  89. Pattern matching
    =
    $x $x
    =
    a 10
    Pattern $x=$x Target a=10
    a != 10

    View Slide

  90. Pattern matching
    =
    $x $x
    =
    a a
    Pattern $x=$x Target a=a

    View Slide

  91. Pattern matching
    =
    $x $x
    =
    a a
    Pattern $x=$x Target a=a

    View Slide

  92. Pattern matching
    =
    $x $x
    =
    a a
    Pattern $x=$x Target a=a
    $x is bound to a

    View Slide

  93. Pattern matching
    =
    $x $x
    =
    a a
    Pattern $x=$x Target a=a
    a = a, pattern matched

    View Slide

  94. m.Where(cond1 && cond2)
    Where() is for the match filtering

    View Slide

  95. m.Where(cond1 && cond2)
    Where() is for the match filtering
    Where expression

    View Slide

  96. m.Where(cond1 && cond2)
    Where() is for the match filtering
    Where expression operands

    View Slide

  97. Where() expression operands
    ● Matched text predicates

    View Slide

  98. Where() expression operands
    ● Matched text predicates
    ● Properties like AssignableTo/ConvertibleTo/Pure

    View Slide

  99. Where() expression operands
    ● Matched text predicates
    ● Properties like AssignableTo/ConvertibleTo/Pure
    ● Check whether a value implements interface

    View Slide

  100. Where() expression operands
    ● Matched text predicates
    ● Properties like AssignableTo/ConvertibleTo/Pure
    ● Check whether a value implements interface
    ● Type matching expressions

    View Slide

  101. Where() expression operands
    ● Matched text predicates
    ● Properties like AssignableTo/ConvertibleTo/Pure
    ● Check whether a value implements interface
    ● Type matching expressions
    ● File-related filters (like “file imports X”)

    View Slide

  102. $t Arbitrary type
    []byte Byte slice type
    []$t Arbitrary slice type
    map[$t]$t Map with $t key and value types
    map[$t]struct{} Any set-like map
    func($_) $_ Any T1->T2 function type
    Type matching examples

    View Slide

  103. struct{$*_} Arbitrary struct
    struct{$x; $x} Struct of 2 $x-typed fields
    struct{$_; $_} Struct with any 2 fields
    struct{$x; $*_} Struct that starts with $x field
    struct{$*_; $x} Struct that ends with $x field
    struct{$*_; $x; $*_} Struct that contains $x field
    Type matching examples (cont.)

    View Slide

  104. // Just report a message
    m.Report("warning message")
    // Report + do an auto fix in -fix mode
    m.Suggest("autofix template")
    Report() and Suggest() handle a match

    View Slide

  105. More ruleguard examples
    .-

    View Slide

  106. func printFmt(m fluent.Matcher) {
    m.Match(`fmt.Println($s, $*_)`).
    Where(m["s"].Text.Matches(`%[sdv]`)).
    Report("found formatting directives")
    }
    Find formatting directives in a
    non-formatting fmt calls

    View Slide

  107. func badLock(m fluent.Matcher) {
    m.Match(`$mu.Lock(); $mu.Unlock()`).
    Report(`$mu unlocked immediately`)
    m.Match(`$mu.Lock(); defer $mu.RUnlock()`).
    Report(`maybe $mu.RLock() is intended?`)
    }
    Find mutex usage issues
    (real-world example)

    View Slide

  108. func sprintErr(m fluent.Matcher) {
    m.Match(`fmt.Sprint($err)`,
    `fmt.Sprintf("%s", $err)`,
    `fmt.Sprintf("%v", $err)`).
    Where(m["err"].Type.Is(`error)).
    Suggest(`$err.Error()`)
    }
    Suggest error.Error() instead

    View Slide

  109. func arrayDeref(m fluent.Matcher) {
    m.Match(`(*$arr)[$i]`).
    Where(m["arr"].Type.Is(`*[$_]$_`)).
    Suggest(`$arr[$i]`)
    }
    Find redundant explicit array
    dereference expressions

    View Slide

  110. func osFilepath(m fluent.Matcher) {
    m.Match(`os.PathSeparator`).
    Where(m.File().Imports("path/filepath")).
    Suggest(`filepath.Separator`)
    }
    Suggest filepath.Separator instead of
    os.PathSeparator

    View Slide

  111. # -e runs a single inline rule
    ruleguard -e 'm.Match(`!($x != $y)`)' file.go
    Running ruleguard with -e

    View Slide

  112. Side-by-side comparison

    View Slide

  113. Written in
    go-ruleguard Go
    Semgrep Mostly OCaml
    CodeQL ??? (Compler+Runtime are closed source)
    Ruleguard vs Semgrep vs CodeQL

    View Slide

  114. Written in
    go-ruleguard Go
    Semgrep Not Go
    CodeQL Probably not Go
    Ruleguard vs Semgrep vs CodeQL

    View Slide

  115. Matching mechanism
    go-ruleguard AST patterns
    Semgrep AST patterns
    CodeQL Dedicated query language
    Ruleguard vs Semgrep vs CodeQL

    View Slide

  116. Type matching mechanism
    go-ruleguard Typematch patterns + predicates
    Semgrep N/A (planned, but not implemented yet)
    CodeQL Type assertion-like API
    Ruleguard vs Semgrep vs CodeQL

    View Slide

  117. DSL
    go-ruleguard Go
    Semgrep YAML files
    CodeQL Dedicated query language
    Ruleguard vs Semgrep vs CodeQL

    View Slide

  118. Supported languages
    go-ruleguard Go
    Semgrep Go + other languages
    CodeQL Go + other languages
    Ruleguard vs Semgrep vs CodeQL

    View Slide

  119. How much you can do
    go-ruleguard Simple-medium diagnostics
    Semgrep Simple-medium diagnostics
    CodeQL Almost whatever you want
    Ruleguard vs Semgrep vs CodeQL

    View Slide

  120. Links
    ● Ruleguard quickstart: EN, RU
    ● Ruleguard DSL documentation
    ● Ruleguard examples: one, two
    ● gogrep - AST patterns matching library for Go
    ● A list of similar tools
    ● .golangci.yml from go-critic (uses ruleguard)

    View Slide

  121. Ruleguard
    CodeQL
    Semgrep
    Искандер (Alex) Шарипов @quasilyte
    vs
    vs

    View Slide