Можно ли обобщить анализатор исходных кодов?

Можно ли обобщить анализатор исходных кодов?

Доклад Ивана Кочуркина (Positive Technologies) для PDUG-секции на PHDays 8.

Transcript

  1. Source code analyzers: how generalizable are they? Ivan Kochurkin Positive

    Technologies Team Lead 1
  2. ABOUT ME ABOUT ME Ivan Kochurkin Team Lead at ,

    Data Flow Source Code Analyzer Developer at , Objective-C → Swi Source Code Converter Active Contributor on GitHub: Tech Article Writer at and other blogs Positive Technologies Swi ify KvanTTT habr.com 2
  3. ANALYZER TYPES ANALYZER TYPES 1. Regular Expressions 2. Tokens 3.

    Parse Trees and AST 4. Data & Control Flow Graphs (DFG & CFG) 5. Binary | Intermediate Language 3
  4. ⏭ REGULAR EXPRESSIONS ⏭ REGULAR EXPRESSIONS 1. <table>(.*?)</table> 2. Attributes?

    <table.*?>(.*?)</table> 3. Elements? tr, td 4. Comments? <!-- html comment --> 5. ... 6. NO NOO ̼ O O NΘ stop the an *̶͑̾̾ ̅ ͫ ̙̤g ͇̫͛͆̾ ͫ̑͆ l͖͉̗̩̳̟ ̍ ͫͥ ͨ e ̠̅ s ͎a ̧͈͖ r̽̾̈́͒͑ e n ot rè ̑ ͧ̌ a ͨ l̘̝̙ ̃ ͤ͂̾̆ ZA̡ ͊͠͝ LGΌ IS ͮ̂ ҉̯͈͕̹̘̱ TO ͇̹̺ͅ Ɲ̴ȳ ̳ TH ̘ Ë ͖́̉ ͠P ̯͍̭ O ̚ N ̐ Y̡ H̸̡ ̪̯ ͨ͊̽̅̾̎ Ȩ ̬̩ ̾͛ ͪ̈́̀́ ͘ ̶̧̨̱̹̭̯ ͧ̾ ͬ C̷ ̙̲̝͖ ͭ̏ ͥ ͮ ͟ O ͮ ̮̪̝͍ M ̲̖͊̒ ͪ ͩ ͬ̚̚ ͜ Ȇ̴ ̟̟͙̞ ͩ͌ ͝ S ̨̥̫͎̭ ͯ̿̔̀ ͅ 4
  5. ㊙ REGEX DSL ㊙ REGEX DSL [ ] - Matches

    a single character that is contained within the brackets. [^ ] - Matches a single character that is not contained within the brackets. ? - Optional symbol * - Zero or more occurrences. ab*c matches ac, abc, abbc + - One or more occurrences. | - Or. gray|grey can match gray or grey. 5
  6. REGEX PATTERNS REGEX PATTERNS Advantages Disadvantages Very simple Hard to

    support Formal model is not required Generally not recursive Universal Slow Hidden tokens (whitespaces, comments) cannot be skipped 6
  7. REGEX PATTERNS REGEX PATTERNS Floating Point Numbers: [-+]?[0-9]*\.?[0-9] Emails Addresses

    IP Address Find | Validation `\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b` 7
  8. 8

  9. ⏭ TOKENS ⏭ TOKENS Lexeme - Recognized char sequence Token

    = Lexeme + Type Grammar Code Sample Keyword: 'var'; Id: [a-z]+; Digit: [0-9]+; Comment: '/*' .*? '*/'; Semi: ';'; Whitespace: ' '+; var a = 17; /* comment */ 9
  10. ㊙ TOKEN DSL EXAMPLE ㊙ TOKEN DSL EXAMPLE Regex +

    Additional Syntax <[regex]> - Id token by custom regex <"regex"> - String by custom regex <(begin..end)> - Numbers (range) </*regex*/> - Comments by custom regex 10
  11. TOKEN PATTERNS TOKEN PATTERNS Simple, but still not recursive <[password]>

    = <""> </* password\s*=\s*"god" */> <[md5|sha1]>( <"(?i)select\s\w*"> + <~> <"\w*"> 11
  12. A ERROR IN CODE DUE TO ERROR IN A ERROR

    IN CODE DUE TO ERROR IN PARSER? PARSER? Grammar Grammar ❌ Wrong ❌ Wrong ✔ Right ✔ Right WTF? WTF? Identifier: [A-Za-z]+ add constraint С_PK primary key (ID); add constraint C_PK primary key (ID); 12
  13. TEXT FINGERPRINTING WITH ZERO- TEXT FINGERPRINTING WITH ZERO- LENGTH CHARACTERS

    LENGTH CHARACTERS Be c aref ul wh at yo u copy Be c•aref•ul wh•at yo•u copy• Detail: | https://diffchecker.com habr.com Medium 13
  14. Good ☹ Bad < img src = ' // host

    / 1 / image.jpg ' < img src = ' // host / 1 / ' onerror = ' alert ( 0 ) ' Correct Correct LIB PROTECTION LIB PROTECTION var a = Request.Params["a"]; var b = Request.Params["b"]; Response.Write($"<img src='//host/1/{a}' onclick='f({b})'/>"); Response.Write(SafeString<Html>.Format($" ... ")); 14
  15. 15

  16. PARSE TREE & AST PARSE TREE & AST Parse Tree

    - obtained from sequence of tokens. AST - Abstract Syntax Tree, i.e., a parse tree without spaces, semicolons, and other non-significant tokens. 16
  17. LEADING & TRAILING TOKENS LEADING & TRAILING TOKENS // leading

    1 (var) // leading 2 (var) var foo = 42; /* trailing (;)*/ int bar = 100500; // trailing // leading (EOF) EOF 17
  18. ANTLR PARSER GENERATOR ANTLR PARSER GENERATOR , , , ,

    grammars-v4 grammars-v4 PHP JavaScript T-SQL Java PL/SQL MySQL 18
  19. ㊙ PARSE TREE DSL ㊙ PARSE TREE DSL Parse Tree

    DSL = Tokens DSL + Additional Syntax Invocation: method_name(expr (',' expr)*) Member reference expression: target.name Try/Catch Block: try {...} catch { } 19
  20. PARSE TREE PATTERNS PARSE TREE PATTERNS Empty try/catch block (All)

    Empty try/catch block (All) Insecure SSL connection (Java) Insecure SSL connection (Java) Cookie without secure attribute (PHP) Cookie without secure attribute (PHP) try { // multiple statements } catch { } new AllowAllHostnameVerifier(...) <|> SSLSocketFactory.ALLOW_ALL_HOSTNAME_VERIFIER session_set_cookie_params(#, #, #) 20
  21. 21

  22. ⏭ DATA FLOW GRAPH (DFG) ⏭ DATA FLOW GRAPH (DFG)

    Graph Graph a a 1,0 1 a 0,0 2 b b 0,0 b 0,1 a + 42 42 c c b * 17 17 if (cond) a = 1; // a - def else a = 2; // a - def b = a + 42; // b - def (2), a - use (2) c = b; // b – use, c - def c = b * 17; // b – def, c – use 22
  23. DFG & AST INTERACTION DFG & AST INTERACTION Var Initialization

    Var Initialization b b 0,0 a Multi Assignment Multi Assignment 16 b 0,0 a int b; int a = b; a = b = 16; 23
  24. CONDITIONAL EXPRESSIONS CONDITIONAL EXPRESSIONS PHP conditional PHP conditional C# conditional

    C# conditional PHP result? true2 C# result? true echo (true ? 'true' : false ? 'true2' : 'false2'); Console.Write(true ? "true" : false ? "true2" : "false2"); 24
  25. DFG PATTERNS (SQL INJECTION) DFG PATTERNS (SQL INJECTION) ❌ Vulnerability

    ❌ Vulnerability ✔ No vulnerability (transform function) ✔ No vulnerability (transform function) Project: $id = $_SESSION[ 'id' ]; $query = "SELECT * FROM users WHERE user_id = '$id'"; $result = mysqli_query($query); $id = $_SESSION[ 'id' ]; $query = "SELECT * FROM users WHERE user_id = '$id'"; $query = mysqli_real_escape_string($query); $result = mysqli_query($query); DVWA 25
  26. ㊙ DATA FLOW GRAPH DSL ㊙ DATA FLOW GRAPH DSL

    Data Flow DSL = Parse Tree DSL + Additional Syntax Flow operator: a ← b Flow variable: a: <[regex]> Type identifier: <[[fully.qualified.name]]> Flow negation: <~> a k = b; c = k + 42; a = c 26
  27. DATA FLOW PATTERNS DATA FLOW PATTERNS PHP XSS PHP XSS

    C# XSS C# XSS C# Sample C# Sample a: <[]> = _GET[#]; <~> b: <[]> = htmlspecialchars(a, ...); <[print|echo]>(b); a: <[]> = <[[Request.Params]]>[#]; <~> b: <[]> = <[[System.Web.HttpUtility.HtmlEncode]]>(a); <[[HttpResponse.Write]]>(b); var resp = this.Response; ... resp.Write("some string"); // typeof(resp) == HttpResponse 27
  28. ⏭ CONTROL FLOW GRAPH (CFG) ⏭ CONTROL FLOW GRAPH (CFG)

    name == "admin" key1 == "validkey" true "Wrong role!" false data true "Wrong key!" false str1 = ... TF8.GetString(data) str1 Encoding.UTF8.GetString(data) str1 = "Wrong key!" str1 = "Wrong role!" Response.Write(str1) 28
  29. CFG PATTERNS CFG PATTERNS ⚠ Detection methods: • Parse Tree

    • Control Flow Graph Goto Fail Vulnerability hashOut.data = hashes + SSL_MD5_DIGEST_LEN; hashOut.length = SSL_SHA1_DIGEST_LEN; if ((err = SSLFreeBuffer(&hashCtx)) != 0) goto fail; // ... if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail; /* MISTAKE! THIS LINE SHOULD NOT BE HERE */ err = sslRawVerify(...); // Verification! 29
  30. 30

  31. parm name a data 0 {data = ... } string.IsNullOrEmpty

    parm 0 true false string IsNullOrEmpty data = new byte[0]; data = new byte[0] data new byte[0] str1 1,0 byte data = ... Base64String(parm); data = ... mBase64String(parm) data Convert.FromBase64String(parm) Convert.FromBase64String parm Convert FromBase64String str1 name == "admin" 3,0 == "admin" true false F8.GetString(data); Encoding.UTF8.GetString(data) Encoding.UTF8.GetString data Encoding.UTF8 GetString Encoding UTF8 ⏭ CODE PROPERTY GRAPH (CPG) ⏭ CODE PROPERTY GRAPH (CPG) CPG = AST + DFG + CFG 31
  32. CPG OPTIMIZATIONS CPG OPTIMIZATIONS ➗ Static Constant Propagation Constant Folding

    2 + 2 * 2 → 6 "¯\\_(" + "ツ" + ")_/¯" → ¯\\_(ツ)_/¯ Dynamic Overrides resolving Overloads resolving 32
  33. CPG OPTIMIZATIONS CPG OPTIMIZATIONS Overloads resolving Overloads resolving Overrides resolving

    Overrides resolving Add(int x, int y) => x + y; Add(string x, string y) => string.Concat(x, y); ... Add(x, 2); // Static resolving succeeded: first method. Add(x, y); // Static resolving failed. if (cond) x = new A(); else x = new B(); Console.WriteLine(x.ToString()); // x ∈ A ∪ B 33
  34. 34

  35. IR | BINARY INSTRUCTIONS IR | BINARY INSTRUCTIONS ❌ ❌

    ✔ ✔ mov eax, tainted_input mov ecx, untainted_input add ecx, eax ; ecx is TAINTED mov eax, tainted_input xor eax, eax ; eax is UNTAINTED 35
  36. ㊙ TURING-COMPLETE DSL ㊙ TURING-COMPLETE DSL Implementation language (C#, Java,

    PHP, etc.) Universal actions 36
  37. IMPLEMENTATION LANGUAGE IMPLEMENTATION LANGUAGE PT Pattern Matching (PT.PM) PT Pattern

    Matching (PT.PM) // Pattern: <[.+]> public override MatchContext Match(Token t, MatchContext c) { Regex regex = t.Root.Language.IsCaseInsensitive ? caseInsensitiveRegex : this.regex; string text = t.TextValue; TextSpan textSpan = regex.Match(text).GetTextSpan(text); return !textSpan.IsZero ? c.AddMatch(t) : c.Fail(); } 37
  38. CONCLUSION CONCLUSION Several source code analyzer types have been described.

    Different models show different properties of programs. More complex model → less generalized analyzer. 38
  39. WE ARE HIRING WE ARE HIRING Responsibilities Responsibilities Data Flow

    Analyzer Development Algorithm Implementation Languages and Vulnerabilities Research Senior C# Developer (Algorithms, Compilers) 39
  40. THANKS! Made using reveal.js Slides: kvanttt.github.io 40