Slide 1

Slide 1 text

Rahul Gopinath Postdoctoral Researcher Look Ma No Hands Learning Input Grammar without Inputs 1

Slide 2

Slide 2 text

Why learn input grammars? 2

Slide 3

Slide 3 text

Why learn input grammars? 2

Slide 4

Slide 4 text

POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312 IBM 3

Slide 5

Slide 5 text

POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312 IBM HTTP POST 4

Slide 6

Slide 6 text

POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312 IBM HTTP POST XML PAYLOAD 5

Slide 7

Slide 7 text

POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312 IBM HTTP POST XML PAYLOAD SOAP 6

Slide 8

Slide 8 text

POST /InStock HTTP/1.1 Host: www.stock.org Content-Type: application/soap+xml; charset=utf-8 Content-Length: 312 IBM HTTP POST XML PAYLOAD SOAP RPC Call 7

Slide 9

Slide 9 text

HTTP POST XML PAYLOAD SOAP RPC Call HTTP Parser XML Parser SOAP Parser RPC Parser Application 8

Slide 10

Slide 10 text

HTTP Parser XML Parser SOAP Parser RPC Parser Application Target 9

Slide 11

Slide 11 text

$ ./fuzzit.py [ ; x 1 - G P Z + w c c k c ] ; , N 9 J + ? # 6 ^ 6 \ e ? ] 9 l u 2 _ % ' 4 G X " 0 V U B [ E / r ~ f A p u 6 b 8 < { % s i q 8 Z h . 6 { V , h r ? ; { Ti . r 3 P I x M M M v 6 { x S ^ + ' H q ! A x B " Y X R S @ ! Kd6;wtAMefFWM(`|J_<1~o}z3K(CCzRH JIIvHz>_*. \ > J r l U 3 2 ~ e G P ? l R = b F 3 + ; y $ 3 l o d Q < B 8 9 ! 5 " W 2 f K * v E 7 v { ' ) K C - i , c { < [ ~ m ! ] o ; { . ' } G j \ ( X } EtYetrpbY@aGZ1{P!AZU7x#4(Rtn!q4nCwqol^y6}0| Ko=*JK~;zMKV=9Nai:wxu{J&UV#HaU)*BiC<),`+t*g ka&]BS6R&j? # t P 7 i a V } - } ` \ ? [ _ [ Z ^ L B M P G - FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2D|vBy! ^ z k h d f 3 C 5 P A k R ? V h n | 3='i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@55ap\zIy l"'f,$ee,J4Gw:cgNKLie3nx9(`efSlg6#[K"@WjhZ} r[Scun&sBCS,T[/vY'pduwgzDlVNy7'rnzxNwI) (ynBa>%|b`;`9fG]P_0hdG~$@6 3]KAeEnQ7lU)3Pn, 0)G/6N-wyzj/MTd#A;r A Naive Fuzzer HTTP Parser XML Parser SOAP Parser RPC Parser Application 10

Slide 12

Slide 12 text

A Naive Fuzzer HTTP Parser XML Parser SOAP Parser RPC Parser Application $ ./fuzzit.py [ ; x 1 - G P Z + w c c k c ] ; , N 9 J + ? # 6 ^ 6 \ e ? ] 9 l u 2 _ % ' 4 G X " 0 V U B [ E / r ~ f A p u 6 b 8 < { % s i q 8 Z h . 6 { V , h r ? ; { Ti . r 3 P I x M M M v 6 { x S ^ + ' H q ! A x B " Y X R S @ ! Kd6;wtAMefFWM(`|J_<1~o}z3K(CCzRH JIIvHz>_*. \ > J r l U 3 2 ~ e G P ? l R = b F 3 + ; y $ 3 l o d Q < B 8 9 ! 5 " W 2 f K * v E 7 v { ' ) K C - i , c { < [ ~ m ! ] o ; { . ' } G j \ ( X } EtYetrpbY@aGZ1{P!AZU7x#4(Rtn!q4nCwqol^y6}0| Ko=*JK~;zMKV=9Nai:wxu{J&UV#HaU)*BiC<),`+t*g ka&]BS6R&j? # t P 7 i a V } - } ` \ ? [ _ [ Z ^ L B M P G - FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2D|vBy! ^ z k h d f 3 C 5 P A k R ? V h n | 3='i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@55ap\zIy l"'f,$ee,J4Gw:cgNKLie3nx9(`efSlg6#[K"@WjhZ} r[Scun&sBCS,T[/vY'pduwgzDlVNy7'rnzxNwI) (ynBa>%|b`;`9fG]P_0hdG~$@6 3]KAeEnQ7lU)3Pn, 0)G/6N-wyzj/MTd#A;r 11

Slide 13

Slide 13 text

What we need is the Input Grammar! 12

Slide 14

Slide 14 text

1. JSFunFuzz 2. GramFuzz 3. LangFuzz If you already have the Grammar: 13

Slide 15

Slide 15 text

What if you don't have the grammar? 14

Slide 16

Slide 16 text

State of the ART 15 AFL Glade [PLDI2017]

Slide 17

Slide 17 text

AFL Fuzz • Mutate sample inputs (if available) • Branch coverage directed 16

Slide 18

Slide 18 text

AFL Fuzz Parser Time (sec) Stmt Coverage JSON Parser 7942 36(49)% MathExpr 77 63(77)% URLParser 14 56(62)% • Few valid inputs produced • Doesn't explore the input space very well • Performance is affected by complexity of grammar 17

Slide 19

Slide 19 text

• Explore input space symbolically • Very fast to explore simple input languages Parser Time (sec) Stmt Coverage MathExpr 5.25 99(99) % URLParser 0.58 98(99)% * compared with equivalent C programs 18 KLEE

Slide 20

Slide 20 text

KLEE •Explore input space symbolically •Performance suffers with even slightly complex grammars Parser Time (sec) Stmt Coverage MathExpr 5.25 99 (99) % URLParser 0.58 98 (99) % JSON Parser 14617 31 (31) % * compared with equivalent C programs 19

Slide 21

Slide 21 text

AUTOGRAM Context-free grammar from samples 20

Slide 22

Slide 22 text

AUTOGRAM http://admin:pass123@www.google.com:80/command?foo=bar&lorem=ipsum#fragment http://www.guardian.co.uk/sports/worldcup#results ftp://bob:12345@ftp.example.com/oss/debian7.iso 21

Slide 23

Slide 23 text

AUTOGRAM protected void parseURL(URL u, String spec, int start, int limit) { String protocol = u.getProtocol(); String authority = u.getAuthority(); String userInfo = u.getUserInfo(); String host = u.getHost(); int port = u.getPort(); int i = 0; boolean isUNCName = (start <= limit - 4) && (spec.charAt(start) == '/') && (spec.charAt(start + 1) == '/') && (spec.charAt(start + 2) == '/') && (spec.charAt(start + 3) == '/'); if (!isUNCName && (start <= limit - 2) && (spec.charAt(start) == '/') && (spec.charAt(start + 1) == '/')) { start += 2; i = spec.indexOf('/', start); if (i < 0) { i = spec.indexOf('?', start); if (i < 0) i = limit; } host = authority = spec.substring(start, i); int ind = authority.indexOf('@'); if (ind != -1) { userInfo = authority.substring(0, ind); host = authority.substring(ind+1); } else userInfo = null; if (host != null) { if (host.length()>0 && (host.charAt(0) == '[')) { if ((ind = host.indexOf(']')) > 2) { String nhost = host ; host = nhost.substring(0,ind+1); port = -1 ; if (nhost.length() > ind+1) { if (nhost.charAt(ind+1) == ':') { ++ind ; if (nhost.length() > (ind + 1)) port = Integer.parseInt(nhost.substring(ind+1)); } } } } else { ind = host.indexOf(':'); port = -1; if (ind >= 0) { if (host.length() > (ind + 1)) { port = Integer.parseInt(host.substring(ind + 1)); } host = host.substring(0, ind); } } } else host = ""; start = i; if (host == null) host = “"; ... setURL(u, protocol, host, port, authority, userInfo, ...); 21

Slide 24

Slide 24 text

AUTOGRAM http://admin:pass123@www.google.com:80/command?foo=bar&lorem=ipsum#fragment http://www.guardian.co.uk/sports/worldcup#results ftp://bob:12345@ftp.example.com/oss/debian7.iso URL ::= PROTOCOL ‘://‘ AUTHORITY PATH [‘?’ QUERY] [‘#’ REF] AUTHORITY ::= [USERINFO ‘@‘] HOST [‘:’ PORT] PROTOCOL ::= ‘http’ | ‘ftp’ USERINFO ::= r{[a-z]+} ‘:’ r{[a-z0-9]+} HOST ::= r{[a-z.]+} PORT ::= ’80’ PATH ::= r{/[a-z0-9.]*} QUERY ::= ‘foo=bar&lorem=ipsum’ REF ::= r{[a-z]+} 21

Slide 25

Slide 25 text

parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http://admin:pass123@www.google.com:80 22

Slide 26

Slide 26 text

parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http://admin:pass123@www.google.com:80 :spec parseURL 22

Slide 27

Slide 27 text

parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http://admin:pass123@www.google.com:80 :spec parseURL http 80 www.google.com admin:pass123 setURL :protocol :authority :port :host 22

Slide 28

Slide 28 text

parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http://admin:pass123@www.google.com:80 :spec parseURL http 80 www.google.com admin:pass123 setURL :protocol :authority :port :host admin pass123 setUserInfo 22

Slide 29

Slide 29 text

parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http 80 www.google.com admin:pass123 http://admin:pass123@www.google.com:80 :spec setURL :protocol :authority parseURL :port :host admin pass123 setUserInfo ftp example.ftp.com boo:12345 ftp://boo:12345@ftp.example.com :spec :protocol :authority :host boo 12345 23

Slide 30

Slide 30 text

parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http | ftp [80] www.google.com |example.ftp.com :spec setURL :protocol :authority parseURL :port :host admin|boo pass123|12345 setUserInfo SPEC AUTHORITY 24

Slide 31

Slide 31 text

parseURL(spec) -> setURL(protocol, host, port, authority,…) -> setUserInfo(user, password) http | ftp [80] www.google.com |example.ftp.com :spec setURL :protocol :authority parseURL :port :host admin|boo pass123|12345 setUserInfo SPEC ::= PROTOCOL ‘://‘ AUTHORITY ‘@’ HOST [‘:’ PORT] AUTHORITY ::= USER ‘:’ PASSWORD USER ::=r{[a-z]+} PASSWORD ::=r{[a-z0-9]+} HOST ::=r{[a-z]+} PORT ::=r{[0-9]+} SPEC AUTHORITY 24

Slide 32

Slide 32 text

We still need samples But: 25

Slide 33

Slide 33 text

We still need samples • Could result in grammar with blindspots AUTOGRAM 26

Slide 34

Slide 34 text

We still need samples Symbolic execution is unscalable for complex parsers AUTOGRAM 27

Slide 35

Slide 35 text

Symbolic execution is unscalable for complex parsers Do we need full constraint solving? AUTOGRAM 28

Slide 36

Slide 36 text

Solve only the next character Idea! 29

Slide 37

Slide 37 text

PyChains Start with an empty string input = "" EOF? Yes No Reject! 30

Slide 38

Slide 38 text

PyChains Fix the problem with a random character EOF? No Yes Reject! input = "x" 31

Slide 39

Slide 39 text

PyChains Fix the problem with a random character isDigit(input[0]) Yes Reject! input = "x" input[0] in ['+', '-'] input[0] == '(' else !EOF(input[0:]) 32

Slide 40

Slide 40 text

PyChains Fix the problem with the choice "(" isDigit(input[0]) Yes Reject! input = "(" input[0] in ['+', '-'] input[0] == '(' else !EOF(input[0:]) 33

Slide 41

Slide 41 text

PyChains Continue with the next character isDigit(input[0]) Yes Reject! input = "(" !EOF(input[0:]) input[0] in ['+', '-'] input[0] == '(' else !EOF(input[1:]) 34

Slide 42

Slide 42 text

PyChains Continue with the next character isDigit(input[0]) Yes Reject! input = "(y" !EOF(input[0:]) input[0] in ['+', '-'] input[0] == '(' else !EOF(input[1:]) isDigit(input[1]) input[1] in ['+', '-'] input[1] == "(" input[1] == ")" else Reject! 35

Slide 43

Slide 43 text

PyChains isDigit(input[0]) input = "(1+2)" input[0] in ['+', '-'] input[0] == '(' else isDigit(input[1]) input[1] in ['+', '-'] input[1] == "(" input[1] == ")" isDigit(input[2]) input[2] in ['+', '-'] input[2] == "(" input[2] == ")" isDigit(input[3]) input[3] in ['+', '-'] input[3] == "(" input[3] == ")" isDigit(input[3]) input[3] in ['+', '-'] input[3] == "(" input[3] == ")" Accept! 36

Slide 44

Slide 44 text

PyChains • Relies on: • Dynamic taint tracking • Tracing character comparisons 37

Slide 45

Slide 45 text

PyChains • Faster for complex input languages Parser Time (sec) Stmt Coverage JSON Parser 1713 100 (44) % MathExpr 122 99 (62) % URLParser 1665 100 (56) % 38 Complexity

Slide 46

Slide 46 text

Limitations • Not as fast as naive fuzzers
 (considering #inputs produced) 39

Slide 47

Slide 47 text

Limitations • Problems with mezzanine validations
 (secondary validations in the current layer) def parse_num(input): i = 0 while is_digit(input[i]) or input[i] in ['.','+','-']: i = i+1 return input[:i], input[i:] def parse_arithmetic(input): value1, rest = parse_num(input) if rest[0] not in ['+', '-']: raise ParseException(rest) value2, rest = parse_num(rest[1:]) if rest != '': raise ParseException(rest) return (rest[0], float(value1), float(value2)) parse('10.0.1+1') ValueError 'Invalid Int' parse('99+1') (+,99,1) parse('2.1-3') (-,2.1,3) 40

Slide 48

Slide 48 text

Limitations • Problems with mezzanine validations • Solution: Throw out accumulated characters from the point of secondary validation, and start again. 10.0.1+1 ValueError 'Invalid Int' 10.0? ... 10.05345+563.334 Inefficient! 41

Slide 49

Slide 49 text

PyChains | PyGram | Fuzz Grammar Inference Engine: PyGram Sample inputs Generated inputs (Infer Grammar) Fix for speed 42

Slide 50

Slide 50 text

PyChains | PyGram Mezzanine Validations Partial prefixes Partial decomposition of input 43

Slide 51

Slide 51 text

Mezzanine Validations 44 http /mypath?a=b [ffcc:xxx http://[ffcc:xxx/mypath?a=b :spec :protocol :path :host if host[0] == ‘[’: validateIPv6(host) Mezzanine validation Generate new host string by limited symbolic execution (Research in progress) • Not as costly as full symbolic execution • Not as costly as throwing out and restarting at the mezzanine validation point

Slide 52

Slide 52 text

PyChains | PyGram | Fuzz Fix for Mezzanine Validations Partial prefixes Generated inputs Partial decomposition of input 45

Slide 53

Slide 53 text

PyChains | PyGram | Fuzz Toolchain: Pygmalion Partial prefixes Generated inputs Advantages: • No samples required • Explores the complete input space • Fast Partial decomposition of input 46 Caution: • Research in progress • Currently only in Python (3.6) • PyGram works only on Top-Down Recursive Descent style parsers.

Slide 54

Slide 54 text

Pygmalion PyChains | Trace | Track | Mine | Infer | Refine | Fuzz Grammar Inference Engine: PyGram 47

Slide 55

Slide 55 text

Pygmalion PyChains | Trace | Track | Mine | Infer | Refine => CFG Generate inputs Language specific: Comparisons and Taints Generate Dynamic Dataflow Graph Generate Parse Tree Infer Context Free Grammar Generalize The Grammar 48

Slide 56

Slide 56 text

49 DEMO

Slide 57

Slide 57 text

!50 Summary