Look Ma No Hands: Learning Input Grammar without Inputs

Look Ma No Hands: Learning Input Grammar without Inputs

D27cb84e0d30e2778e9b66d6a5f42106?s=128

Rahul Gopinath

June 12, 2018
Tweet

Transcript

  1. 4.

    POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312

    <?xml version="1.0"?> <soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"> <soap:Body xmlns:m="http://www.stock.org/stock"> <m:GetStockPrice> <m:StockName>IBM</m:StockName> </m:GetStockPrice> </soap:Body> </soap:Envelope> 3
  2. 5.

    POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312

    <?xml version="1.0"?> <soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"> <soap:Body xmlns:m="http://www.stock.org/stock"> <m:GetStockPrice> <m:StockName>IBM</m:StockName> </m:GetStockPrice> </soap:Body> </soap:Envelope> HTTP POST 4
  3. 6.

    POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312

    <?xml version="1.0"?> <soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"> <soap:Body xmlns:m="http://www.stock.org/stock"> <m:GetStockPrice> <m:StockName>IBM</m:StockName> </m:GetStockPrice> </soap:Body> </soap:Envelope> HTTP POST XML PAYLOAD 5
  4. 7.

    POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312

    <?xml version="1.0"?> <soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"> <soap:Body xmlns:m="http://www.stock.org/stock"> <m:GetStockPrice> <m:StockName>IBM</m:StockName> </m:GetStockPrice> </soap:Body> </soap:Envelope> HTTP POST XML PAYLOAD SOAP 6
  5. 8.

    POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312

    <?xml version="1.0"?> <soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding"> <soap:Body xmlns:m="http://www.stock.org/stock"> <m:GetStockPrice> <m:StockName>IBM</m:StockName> </m:GetStockPrice> </soap:Body> </soap:Envelope> HTTP POST XML PAYLOAD SOAP RPC Call 7
  6. 9.

    HTTP POST XML PAYLOAD SOAP RPC Call HTTP Parser XML

    Parser SOAP Parser RPC Parser Application 8
  7. 11.

    $ ./fuzzit.py [ ; x 1 - G P Z

    + w c c k c ] ; , N 9 J + ? # 6 ^ 6 \ e ? ] 9 l u 2 _ % ' 4 G X " 0 V U B [ E / r ~ f A p u 6 b 8 < { % s i q 8 Z h . 6 { V , h r ? ; { Ti . r 3 P I x M M M v 6 { x S ^ + ' H q ! A x B " Y X R S @ ! Kd6;wtAMefFWM(`|J_<1~o}z3K(CCzRH JIIvHz>_*. \ > J r l U 3 2 ~ e G P ? l R = b F 3 + ; y $ 3 l o d Q < B 8 9 ! 5 " W 2 f K * v E 7 v { ' ) K C - i , c { < [ ~ m ! ] o ; { . ' } G j \ ( X } EtYetrpbY@aGZ1{P!AZU7x#4(Rtn!q4nCwqol^y6}0| Ko=*JK~;zMKV=9Nai:wxu{J&UV#HaU)*BiC<),`+t*g ka<W=Z.%T5WGHZpI30D<Pq>&]BS6R&j? # t P 7 i a V } - } ` \ ? [ _ [ Z ^ L B M P G - FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2D|vBy! ^ z k h d f 3 C 5 P A k R ? V h n | 3='i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@55ap\zIy l"'f,$ee,J4Gw:cgNKLie3nx9(`efSlg6#[K"@WjhZ} r[Scun&sBCS,T[/vY'pduwgzDlVNy7'rnzxNwI) (ynBa>%|b`;`9fG]P_0hdG~$@6 3]KAeEnQ7lU)3Pn, 0)G/6N-wyzj/MTd#A;r A Naive Fuzzer HTTP Parser XML Parser SOAP Parser RPC Parser Application 10
  8. 12.

    A Naive Fuzzer HTTP Parser XML Parser SOAP Parser RPC

    Parser Application $ ./fuzzit.py [ ; x 1 - G P Z + w c c k c ] ; , N 9 J + ? # 6 ^ 6 \ e ? ] 9 l u 2 _ % ' 4 G X " 0 V U B [ E / r ~ f A p u 6 b 8 < { % s i q 8 Z h . 6 { V , h r ? ; { Ti . r 3 P I x M M M v 6 { x S ^ + ' H q ! A x B " Y X R S @ ! Kd6;wtAMefFWM(`|J_<1~o}z3K(CCzRH JIIvHz>_*. \ > J r l U 3 2 ~ e G P ? l R = b F 3 + ; y $ 3 l o d Q < B 8 9 ! 5 " W 2 f K * v E 7 v { ' ) K C - i , c { < [ ~ m ! ] o ; { . ' } G j \ ( X } EtYetrpbY@aGZ1{P!AZU7x#4(Rtn!q4nCwqol^y6}0| Ko=*JK~;zMKV=9Nai:wxu{J&UV#HaU)*BiC<),`+t*g ka<W=Z.%T5WGHZpI30D<Pq>&]BS6R&j? # t P 7 i a V } - } ` \ ? [ _ [ Z ^ L B M P G - FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2D|vBy! ^ z k h d f 3 C 5 P A k R ? V h n | 3='i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@55ap\zIy l"'f,$ee,J4Gw:cgNKLie3nx9(`efSlg6#[K"@WjhZ} r[Scun&sBCS,T[/vY'pduwgzDlVNy7'rnzxNwI) (ynBa>%|b`;`9fG]P_0hdG~$@6 3]KAeEnQ7lU)3Pn, 0)G/6N-wyzj/MTd#A;r 11
  9. 18.

    AFL Fuzz Parser Time (sec) Stmt Coverage JSON Parser 7942

    36(49)% MathExpr 77 63(77)% URLParser 14 56(62)% • Few valid inputs produced • Doesn't explore the input space very well • Performance is affected by complexity of grammar 17
  10. 19.

    • Explore input space symbolically • Very fast to explore

    simple input languages Parser Time (sec) Stmt Coverage MathExpr 5.25 99(99) % URLParser 0.58 98(99)% * compared with equivalent C programs 18 KLEE
  11. 20.

    KLEE •Explore input space symbolically •Performance suffers with even slightly

    complex grammars Parser Time (sec) Stmt Coverage MathExpr 5.25 99 (99) % URLParser 0.58 98 (99) % JSON Parser 14617 31 (31) % * compared with equivalent C programs 19
  12. 23.

    AUTOGRAM protected void parseURL(URL u, String spec, int start, int

    limit) { String protocol = u.getProtocol(); String authority = u.getAuthority(); String userInfo = u.getUserInfo(); String host = u.getHost(); int port = u.getPort(); int i = 0; boolean isUNCName = (start <= limit - 4) && (spec.charAt(start) == '/') && (spec.charAt(start + 1) == '/') && (spec.charAt(start + 2) == '/') && (spec.charAt(start + 3) == '/'); if (!isUNCName && (start <= limit - 2) && (spec.charAt(start) == '/') && (spec.charAt(start + 1) == '/')) { start += 2; i = spec.indexOf('/', start); if (i < 0) { i = spec.indexOf('?', start); if (i < 0) i = limit; } host = authority = spec.substring(start, i); int ind = authority.indexOf('@'); if (ind != -1) { userInfo = authority.substring(0, ind); host = authority.substring(ind+1); } else userInfo = null; if (host != null) { if (host.length()>0 && (host.charAt(0) == '[')) { if ((ind = host.indexOf(']')) > 2) { String nhost = host ; host = nhost.substring(0,ind+1); port = -1 ; if (nhost.length() > ind+1) { if (nhost.charAt(ind+1) == ':') { ++ind ; if (nhost.length() > (ind + 1)) port = Integer.parseInt(nhost.substring(ind+1)); } } } } else { ind = host.indexOf(':'); port = -1; if (ind >= 0) { if (host.length() > (ind + 1)) { port = Integer.parseInt(host.substring(ind + 1)); } host = host.substring(0, ind); } } } else host = ""; start = i; if (host == null) host = “"; ... setURL(u, protocol, host, port, authority, userInfo, ...); 21
  13. 24.

    AUTOGRAM http://admin:pass123@www.google.com:80/command?foo=bar&lorem=ipsum#fragment http://www.guardian.co.uk/sports/worldcup#results ftp://bob:12345@ftp.example.com/oss/debian7.iso URL ::= PROTOCOL ‘://‘ AUTHORITY PATH

    [‘?’ QUERY] [‘#’ REF] AUTHORITY ::= [USERINFO ‘@‘] HOST [‘:’ PORT] PROTOCOL ::= ‘http’ | ‘ftp’ USERINFO ::= r{[a-z]+} ‘:’ r{[a-z0-9]+} HOST ::= r{[a-z.]+} PORT ::= ’80’ PATH ::= r{/[a-z0-9.]*} QUERY ::= ‘foo=bar&lorem=ipsum’ REF ::= r{[a-z]+} 21
  14. 27.

    parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http://admin:pass123@www.google.com:80

    :spec parseURL http 80 www.google.com admin:pass123 setURL :protocol :authority :port :host 22
  15. 28.

    parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http://admin:pass123@www.google.com:80

    :spec parseURL http 80 www.google.com admin:pass123 setURL :protocol :authority :port :host admin pass123 setUserInfo 22
  16. 29.

    parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http

    80 www.google.com admin:pass123 http://admin:pass123@www.google.com:80 :spec setURL :protocol :authority parseURL :port :host admin pass123 setUserInfo ftp example.ftp.com boo:12345 ftp://boo:12345@ftp.example.com :spec :protocol :authority :host boo 12345 23
  17. 30.

    parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http

    | ftp [80] www.google.com |example.ftp.com :spec setURL :protocol :authority parseURL :port :host admin|boo pass123|12345 setUserInfo SPEC AUTHORITY 24
  18. 31.

    parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http

    | ftp [80] www.google.com |example.ftp.com :spec setURL :protocol :authority parseURL :port :host admin|boo pass123|12345 setUserInfo SPEC ::= PROTOCOL ‘://‘ AUTHORITY ‘@’ HOST [‘:’ PORT] AUTHORITY ::= USER ‘:’ PASSWORD USER ::=r{[a-z]+} PASSWORD ::=r{[a-z0-9]+} HOST ::=r{[a-z]+} PORT ::=r{[0-9]+} SPEC AUTHORITY 24
  19. 35.
  20. 39.

    PyChains Fix the problem with a random character isDigit(input[0]) Yes

    Reject! input = "x" input[0] in ['+', '-'] input[0] == '(' else !EOF(input[0:]) 32
  21. 40.

    PyChains Fix the problem with the choice "(" isDigit(input[0]) Yes

    Reject! input = "(" input[0] in ['+', '-'] input[0] == '(' else !EOF(input[0:]) 33
  22. 41.

    PyChains Continue with the next character isDigit(input[0]) Yes Reject! input

    = "(" !EOF(input[0:]) input[0] in ['+', '-'] input[0] == '(' else !EOF(input[1:]) 34
  23. 42.

    PyChains Continue with the next character isDigit(input[0]) Yes Reject! input

    = "(y" !EOF(input[0:]) input[0] in ['+', '-'] input[0] == '(' else !EOF(input[1:]) isDigit(input[1]) input[1] in ['+', '-'] input[1] == "(" input[1] == ")" else Reject! 35
  24. 43.

    PyChains isDigit(input[0]) input = "(1+2)" input[0] in ['+', '-'] input[0]

    == '(' else isDigit(input[1]) input[1] in ['+', '-'] input[1] == "(" input[1] == ")" isDigit(input[2]) input[2] in ['+', '-'] input[2] == "(" input[2] == ")" isDigit(input[3]) input[3] in ['+', '-'] input[3] == "(" input[3] == ")" isDigit(input[3]) input[3] in ['+', '-'] input[3] == "(" input[3] == ")" Accept! 36
  25. 45.

    PyChains • Faster for complex input languages Parser Time (sec)

    Stmt Coverage JSON Parser 1713 100 (44) % MathExpr 122 99 (62) % URLParser 1665 100 (56) % 38 Complexity
  26. 47.

    Limitations • Problems with mezzanine validations
 (secondary validations in the

    current layer) def parse_num(input): i = 0 while is_digit(input[i]) or input[i] in ['.','+','-']: i = i+1 return input[:i], input[i:] def parse_arithmetic(input): value1, rest = parse_num(input) if rest[0] not in ['+', '-']: raise ParseException(rest) value2, rest = parse_num(rest[1:]) if rest != '': raise ParseException(rest) return (rest[0], float(value1), float(value2)) parse('10.0.1+1') ValueError 'Invalid Int' parse('99+1') (+,99,1) parse('2.1-3') (-,2.1,3) 40
  27. 48.

    Limitations • Problems with mezzanine validations • Solution: Throw out

    accumulated characters from the point of secondary validation, and start again. 10.0.1+1 ValueError 'Invalid Int' 10.0? ... 10.05345+563.334 Inefficient! 41
  28. 49.

    PyChains | PyGram | Fuzz Grammar Inference Engine: PyGram Sample

    inputs Generated inputs (Infer Grammar) Fix for speed 42
  29. 51.

    Mezzanine Validations 44 http /mypath?a=b [ffcc:xxx http://[ffcc:xxx/mypath?a=b :spec :protocol :path

    :host if host[0] == ‘[’: validateIPv6(host) Mezzanine validation Generate new host string by limited symbolic execution (Research in progress) • Not as costly as full symbolic execution • Not as costly as throwing out and restarting at the mezzanine validation point
  30. 52.

    PyChains | PyGram | Fuzz Fix for Mezzanine Validations Partial

    prefixes Generated inputs Partial decomposition of input 45
  31. 53.

    PyChains | PyGram | Fuzz Toolchain: Pygmalion Partial prefixes Generated

    inputs Advantages: • No samples required • Explores the complete input space • Fast Partial decomposition of input 46 Caution: • Research in progress • Currently only in Python (3.6) • PyGram works only on Top-Down Recursive Descent style parsers.
  32. 54.

    Pygmalion PyChains | Trace | Track | Mine | Infer

    | Refine | Fuzz Grammar Inference Engine: PyGram 47
  33. 55.

    Pygmalion PyChains | Trace | Track | Mine | Infer

    | Refine => CFG Generate inputs Language specific: Comparisons and Taints Generate Dynamic Dataflow Graph Generate Parse Tree Infer Context Free Grammar Generalize The Grammar 48
  34. 56.