Specification for Free: Behavior-Driven Fuzzing with Inferred Specifications

1 Speci fi cations for Free:  Behavior-Driven Fuzzing with Inferred
Speci fi cations Rahul Gopinath

2 Speci fi cations for Free:  Behavior-Driven Fuzzing with Inferred
Speci fi cations Rahul Gopinath

Prerequisites https://github.com/vrthra/summer2026-nus • Install Python 3.12 • Install Graphviz (optional)
• Install Jupyter • Start Jupyter 3 $ jupyter notebook --ip='*' --NotebookApp.token='' --NotebookApp.password=''

Prerequisites https://github.com/vrthra/summer2026-nus • Install Python 3.12 • Install Graphviz (optional)
• Install Jupyter • Start Jupyter 4 $ jupyter notebook --ip='*' --NotebookApp.token='' --NotebookApp.password=''

5 Security Research | Software Verification Research

6 Security Research | Software Verification Research

https://fuzzingbook.org 7

https://rahul.gopinath.org/posts 8

http://localhost:8888/notebooks/RoadMap.ipynb 9

http://localhost:8888/notebooks/RoadMap.ipynb 10

http://localhost:8888/notebooks/x0_0_Prerequisites.ipynb 11

12 Finding hidden bugs

13 Fuzzing Crash? Program Trash deck technique: 1950s - Gerald
Weinberg

14 $ ./fuzz [;x1-GPZ+wcckc];,N9J+?#6^6\e?]9lu  2_%'4GX"0VUB[E/r ~fApu6b8<{%siq8Z  h.6{V,hr?;{Ti.r3PIxMMMv6{xS^+'Hq!  AxB"YXRS@!Kd6;wtAMefFWM(`|J_<1~o}  z3K(CCzRH JIIvHz>_*.\>JrlU32~eGP? 
lR=bF3+;y$3lodQ<B89!5"W2fK*vE7v{'  )KC-i,c{<[~m!]o;{.'}Gj\(X}EtYetrp  bY@aGZ1{P!AZU7x#4(Rtn!q4nCwqol^y6  }0|Ko=*JK~;zMKV=9Nai:wxu{J&UV#HaU  )*BiC<),`+t*gka<W=Z.%T5WGHZpI30D<  Pq>&]BS6R&j?#tP7iaV}-}`\?[_[Z^LBM  PG-FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2  D|vBy!^zkhdf3C5PAkR?V((-%><hn|3='  i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@  5:dfd45*(7^%5ap\zIyl"'f,$ee,J4Gw:  cgNKLie3nx9(`efSlg6#[K"@WjhZ}r[Sc  un&sBCS,T[/3]KAeEnQ7lU)3Pn,0)G/6N  -wyzj/MTd#A;r*(ds./df3r8Odaf?/<#r Program ✘ Random Fuzzing

15 http://localhost:8888/notebooks/x1_0_GeneratingSamples.ipynb

http://localhost:8888/notebooks/x1_0_GeneratingSamples.ipynb 16

17 http://localhost:8888/notebooks/x1_0_GeneratingSamples.ipynb

@app.route('/admin') def admin(): username = request.cookies.get("username") if not username: return
{"Error": "Specify username in Cookie"} username = urllib.quote(os.path.basename(username)) url = "http://permissions:5000/permissions/{}".format(username) resp = requests.request(method="GET", url=url) # "superadmin\ud888" will be simpli fi ed to "superadmin" ret = ujson.loads(resp.text) if resp.status_code == 200: if "superadmin" in ret["roles"]: return {"OK": "Superadmin Access granted"} else: e = u"Access denied. User has following roles: {}".format(ret["roles"]) return {"Error": e}, 401 else:return {"Error": ret["Error"]}, 500 [ ; x 1 - G P Z + w c c k c ] ; , N 9 J + ? # 6 ^ 6 \ e ? ] 9 l u 2 _ % ' 4 G X " 0 V U B [ E / r ~ f A p u 6 b 8 < { % s i q 8 Z h . 6 { V , h r ? ; {Ti.r3PIxMMMv6{xS^+'Hq!AxB"YXRS@! Kd6;wtAMefFWM(`|J_<1~o}z3K(CCzRH J I I v H z > _ * . \ > J r l U 3 2 ~ e G P ? lR=bF3+;y$3lodQ<B89!5"W2fK*vE7v{')KC- i,c{<[~m!]o;{.'}Gj\(X}EtYetrpbY@aGZ1{P! A Z U 7 x # 4 ( R t n ! q 4 n C w q o l ^ y 6 } 0 | Ko=*JK~;zMKV=9Nai:wxu{J&UV#HaU)*Bi C < ) , ` + t * g k a < W = Z . % T 5 W G H Z p I 3 0 D < P q > & ] B S 6 R & j ? # t P 7 i a V } - } ` \ ? [ _ [ Z ^ L B M P G - FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2D|vBy! ^ z k h d f 3 C 5 P A k R ? V ( ( - % > < h n | 3='i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@ 5 : d f d 4 5 * ( 7 ^ % 5 a p \ z I y l " ' f , $ee,J4Gw:cgNKLie3nx9(`efSlg6#[K"@Wjh Z}r[Scun&sBCS,T[/3]KAeEnQ7lU)3Pn,0)G/ 6N-wyzj/MTd#A;r Program 18 https://www.fuzzingbook.org/html/Fuzzer.html Traditional Fuzzing

19 (CACM '90) No longer very effective

20 http://localhost:8888/notebooks/x1_0_GeneratingSamples.ipynb#A-random-fuzzer.

• Insert Instrumentation • Generate inputs • Collect execution feedback
• Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 21 Feedback Driven Fuzzing 21 https://www.fuzzingbook.org/html/MutationFuzzer.html

22 def triangle(a, b, c): if a == b: if
b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene def triangle(a, b, c): __probe_enter() if a == b: __probe_1() if b == c: __probe_2() return Equilateral else: __probe_3() return Isosceles else: __probe_4() if b == c: __probe_5() return Isosceles else: __probe_6() if a == c: __probe_7() return Isosceles else: __probe_8() return Scalene def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 22 https://www.fuzzingbook.org/html/MutationFuzzer.html Feedback Driven Fuzzing

23 triangle (1,1,1) def triangle(a, b, c): if a ==
b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 23 https://www.fuzzingbook.org/html/MutationFuzzer.html Feedback Driven Fuzzing

b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 24 https://www.fuzzingbook.org/html/MutationFuzzer.html Feedback Driven Fuzzing

triangle (1,1,1) 25 triangle (1,1,2) def triangle(a, b, c): if
a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage Mutated 25 https://www.fuzzingbook.org/html/MutationFuzzer.html Feedback Driven Fuzzing

b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage Mutated 26 https://www.fuzzingbook.org/html/MutationFuzzer.html Feedback Driven Fuzzing

Feedback Driven Fuzzing 27 triangle (1,1,2) def triangle(a, b, c):
if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 27 https://www.fuzzingbook.org/html/MutationFuzzer.html triangle (1,1,1) triangle (1,1,3)

Feedback Driven Fuzzing • Insert Instrumentation • Generate inputs •
Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 28 triangle (1,1,2) def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene triangle (1,1,1) 28

29 Minimal Framework to Track Coverage http://localhost:8888/notebooks/x1_0_GeneratingSamples.ipynb#Tracking-Coverage

30 Weakness: static int is_reserved_word_token(const char *s, int len) {
const char *reserved[] = { "break", "case", "catch", "continue", "debugger", "default", "delete", "do", "else", "false", "finally", "for", "function", "if", "in", "instanceof", "new", "null", "return", "switch", "this", "throw", "true", "try", "typeof", "var", "void", "while", "with", "let", "undefined", ((void *)0)}; int i; if (!mjs_is_alpha(s[0])) return 0; for (i = 0; reserved[i] != ((void *)0); i++) { if (len == (int)strlen(reserved[i]) && strncmp(s, reserved[i], len) == 0) return i + 1; } return 0; } Tokens if (x > 100) { } coverage: 20% if (x > 100) { } e coverage: 5% if (x > 100) { } el coverage: 5% if (x > 100) { } els coverage: 5% if (x > 100) { } else coverage: 25% No smooth coverage gradient in  parsers with lexical tokens 30 Note: Constant unrolling (e.g. AFL) does not help in such lexical tokens Feedback Driven Fuzzing

31 def json_raw(stm): while True: stm.skipspaces() c = stm.peek() if
c == 't': return json_fixed(stm, 'true') elif c == 'f': return json_fixed(stm, 'false') elif c == 'n': return json_fixed(stm, 'null') elif c == '"': return json_string(stm) elif c == '{': return json_dict(stm) elif c == '[': return json_list(stm) elif c in NUMSTART: return json_number(stm) raise JSONError(E_MALF, stm, stm.pos) Weak points: • Need for smooth coverage gradient • Coverage only provides first level guidance 1. {"abc":[]} 2. [{"a":[]}, {"b":[]}, {"c":["ab","c"]}] 31 Feedback Driven Fuzzing

34 JSON Parser Validation [{"id":1},{"id":2,"tags":["x","y"]}] {"user":{"name":"alice","roles":["admin","dev"]}} [{"path":["home","docs"]},{"size":{"w":1920,"h":1080}}] {"api":{"status":200,"headers":{"content-type":"json"}}} {"system":{"services":[{"name":"db"},{"name":"cache"}]}} Business
Logic Where you want to reach System Internals

36 Overcoming Parsers

37 1y78 ✘ ( ✘ (1+2) ✔ 21214/91*293 ✔ &12133
✘

38 Fuzzing Parsers $ ./fuzz [;x1-GPZ+wcckc];,N9J+?#6^6\e?]9lu  2_%'4GX"0VUB[E/r ~fApu6b8<{%siq8Z  h.6{V,hr?;{Ti.r3PIxMMMv6{xS^+'Hq!  AxB"YXRS@!Kd6;wtAMefFWM(`|J_<1~o} 
z3K(CCzRH JIIvHz>_*.\>JrlU32~eGP?  lR=bF3+;y$3lodQ<B89!5"W2fK*vE7v{'  )KC-i,c{<[~m!]o;{.'}Gj\(X}EtYetrp  bY@aGZ1{P!AZU7x#4(Rtn!q4nCwqol^y6  }0|Ko=*JK~;zMKV=9Nai:wxu{J&UV#HaU  )*BiC<),`+t*gka<W=Z.%T5WGHZpI30D<  Pq>&]BS6R&j?#tP7iaV}-}`\?[_[Z^LBM  PG-FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2  D|vBy!^zkhdf3C5PAkR?V((-%><hn|3='  i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@  5:dfd45*(7^%5ap\zIyl"'f,$ee,J4Gw:  cgNKLie3nx9(`efSlg6#[K"@WjhZ}r[Sc  un&sBCS,T[/3]KAeEnQ7lU)3Pn,0)G/6N  -wyzj/MTd#A;r*(ds./df3r8Odaf?/<#r Parser Syntax Error Interpreter #

39 Leveraging Instrumentation Based Feedback def parse_num(s,i): n = ''
while s[i:] and is_digit(s[i]): n += s[i] i = i +1 return i,n def parse_paren(s, i): assert s[i] == '(' i, v = parse_expr(s, i+1) if s[i:] == '': raise Exception(s, i) assert s[i] == ')' return i+1, v def parse_expr(s, i = 0): expr = [] is_op = True while s[i:]: c = s[i] if c in string.digits: if not is_op: raise Exception(s,i) i,num = parse_num(s,i) expr.append(num) is_op = False elif c in ['+', '-', '*', '/']: if is_op: raise Exception(s,i) expr.append(c) is_op = True i = i + 1 elif c == '(': if not is_op: raise Exception(s,i) i, cexpr = parse_paren(s, i) expr.append(cexpr) is_op = False elif c == ')': break else: raise Exception(s,i) if is_op: raise Exception(s,i) return i, expr Parser Syntax Error Interpreter #

http://localhost:8888/notebooks/x1_1_TrackingAccess.ipynb 40

41 Tracker for Access http://localhost:8888/notebooks/x1_1_TrackAccess.ipynb#xtstr

42 AST Rewriting http://localhost:8888/notebooks/x1_1_TrackAccess.ipynb#InRewriter

43 Find The Last Compared Index

44 Speci fi cation Free Generator

45 Speci fi cation Free Generator A ( 2 -
B 9 ) 4 ) A ∉ (,+,-,1,2,3,4,5,6,7,8,9,0 B ∉ +,-,1,2,3,4,5,6,7,8,9,0,) ) ∉ +,-,1,2,3,4,5,6,7,8,9,0 (2-94)

47 Limitation: Lack of control

48 Constraining the Input Domain with Grammars

49 Grammar

50 Formal Languages Formal Language Descriptions 3. Regular Context Free
Recursively Enumerable (Chomsky,1956) Easy to produce and parse Argument Stack Return Stack Easier to reason with

http://localhost:8888/notebooks/x0_1_Grammars.ipynb 53

http://localhost:8888/notebooks/x0_3_Parser.ipynb 55

56 Parsing is Surprisingly Integral to Fuzzing 1. Parsing is
necessary for decomposing and recomposing inputs without affecting the validity of inputs constructed (8 / 3) * 49

necessary for decomposing and recomposing inputs without affecting the validity of inputs constructed (8 / 3) * 49 http://localhost:8888/notebooks/x0_3_Parser.ipynb#Arborist

necessary for decomposing and recomposing inputs without affecting the validity of inputs constructed (2 + 1) * 49 1 2 + (8 / 3) * 49

59 Parsing is Surprisingly Integral to Fuzzing 2. Use parsers
to mine the input distribution, and generate uncommon inputs

60 Parsing is Surprisingly Integral to Fuzzing 2. Use parsers
to extract the complexity of an input or a set of inputs 18439249 (8/3)*49 https://rahul.gopinath.org/post/2024/03/23/k-paths-for-context-free-grammars/

62 Grammars 8.2 - 27 - -9 / +((+9 *
--2 + --+-+- ((-1 * +(8 - 5 - 6)) * (-(a-+(((+(4) )))) - ++4) / +(-+---((5.6 - --(3 * -1.8 * +(6 * +-(((-(-6) * ---+6)) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) + 8.2 - 27 - -9 / +((+9 * --2 + --+-+-((-1 * + (8 - 5 - 6)) * (-(a-+(((+(4))))) - + +4) / +(-+---((5.6 - --(3 * -1.8 * + (6 * +-(((-(-6) * ---+6)) / +--(+-+- 7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(- -2 - -++-9.0)))) / 5 * --++090 + * - +5 + 7.513)))) - (+1 / ++((-84)))))) )) * 8.2 - 27 - -9 / +((+9 * --2 + - -+-+-((-1 * +(8 - 5 - 6)) * (-(a-+(( (+(4))))) - ++4) / +(-+---((5.6 - -- (3 * -1.8 * +(6 * +-(((-(-6) * ---+6 )) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6 .37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 <start> := <expr> <expr> := <expr> '+' <expr> | <expr> '-' <expr> | <expr> '/' <expr> | <expr> '*' <expr> | '(' <expr> ')' | <number> <number> := <integer> | <integer> '.' <integer> <integer>:= <digit> <integer> | <digit> <digit> := [0-9] For Fuzzing (Hanford 1970) (Purdom 1972)

63 Grammars As effective producers Interpreter Parser ✘ ✔ 8.2
- 27 - -9 / +((+9 * --2 + --+-+- ((-1 * +(8 - 5 - 6)) * (-(a-+(((+(4) )))) - ++4) / +(-+---((5.6 - --(3 * -1.8 * +(6 * +-(((-(-6) * ---+6)) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) + 8.2 - 27 - -9 / +((+9 * --2 + --+-+-((-1 * + (8 - 5 - 6)) * (-(a-+(((+(4))))) - + +4) / +(-+---((5.6 - --(3 * -1.8 * + (6 * +-(((-(-6) * ---+6)) / +--(+-+- 7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(- -2 - -++-9.0)))) / 5 * --++090 + * - +5 + 7.513)))) - (+1 / ++((-84)))))) )) * 8.2 - 27 - -9 / +((+9 * --2 + - -+-+-((-1 * +(8 - 5 - 6)) * (-(a-+(( (+(4))))) - ++4) / +(-+---((5.6 - -- (3 * -1.8 * +(6 * +-(((-(-6) * ---+6 )) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6 .37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 ++5 / +-(--2 - -++-9.0)))) / 5 * --++090

64 Grammars <start> := <expr> <expr> := <expr> '+' <expr>
| <expr> '-' <expr> | <expr> '/' <expr> | <expr> '*' <expr> | '(' <expr> ')' | <number> <number> := <integer> | <integer> '.' <integer> <integer>:= <digit> <integer> | <digit> <digit> := [0-9] As efficient producers def start(): expr() def expr(): match (random() % 6): case 0: expr(); print('+'); expr() case 1: expr(); print('-'); expr() case 2: expr(); print('/'); expr() case 3: expr(); print('*'); expr() case 4: print('('); expr(); print(')') case 5: number() def number(): match (random() % 2): case 0: integer() case 1: integer(); print('.'); integer() def integer(): match (random() % 2): case 0: digit(); integer() case 1: digit() def digit(): match (random() % 10): case 0: print('0') case 1: print('1') case 2: print('2') case 3: print('3') case 4: print('4') case 5: print('5') case 6: print('6') case 7: print('7') Compiled Grammar (F1)

http://localhost:8888/notebooks/x0_2_GrammarFuzzer.ipynb 65

66 Where to Get the Input Grammar From?

67 https://www.fuzzingbook.org/html/GrammarMiner.html AUTOGRAM Where to Get the Grammar From? Hand-written
parsers already encode the grammar

68 Where to Get the Grammar From? Handwritten parsers contain
the parse structure key value key value scheme parse_scheme parse_hostpath parse_querystring parse_fragment domain TLD subdomain parse_host subdirectory parse_fslocation binary parse_binaryname parameters parse_parameters parse_url

69 Where to Get the Grammar From? Mining Grammar from
a hand-written parser https://www.example.com/forum/questions/cgi?tag=networking&order=newwest#top key value key value split scheme parse_scheme host path parse_hostpath query string parse_querystring fragment parse_fragment domain TLD subdomain parse_host subdirectory parse_fslocation binary parse_binaryname parameters parse_parameters With Dynamic Data Flow Analysis parseurl

70 http://user:[email protected]:80/?q=path#ref urlparse:url = 'http://user:[email protected]:80/?q=path#ref' urlsplit:scheme = 'http' urlsplit:netloc =
'user:[email protected]:80' urlsplit:fragment = 'ref' urlsplit:query = 'q=path' https://soft-eng.sydney.edu.au:80/ urlparse:url = 'https://soft-eng.sydney.edu.au:80/' urlsplit:scheme = 'https' urlsplit:netloc = 'soft-eng.sydney.edu.au:80' http://www.fuzzingbook.org/#News urlparse:url = 'http://www.fuzzingbook.org/#News' urlsplit:scheme = 'http' urlsplit:netloc = 'www.fuzzingbook.org' urlsplit:fragment = 'News' Mining with Dynamic Data Flow Analysis

71 { '<urlparse:url>': [ ['<urlsplit:scheme>', '://', '<urlsplit:netloc>', '/?', '<urlsplit:query>', '#',
'<urlsplit:fragment>'], ['<urlsplit:scheme>', '://', '<urlsplit:netloc>', '/#','<urlsplit:fragment>'], ['<urlsplit:scheme>', '://', '<urlsplit:netloc>', '/']], '<urlsplit:scheme>' : [ ['http'], ['http']], '<urlsplit:netloc>': [ ['user:[email protected]:80'], ['www.fuzzingbook.org'], ['soft-eng.sydney.edu.au']], '<urlsplit:query>' : [ ['q=path']], '<urlsplit:fragment>' : [ ['ref'], ['News']],  } http://user:[email protected]:80/?q=path#ref urlparse:url = 'http://user:[email protected]:80/?q=path#ref' urlsplit:scheme = 'http' urlsplit:netloc = 'user:[email protected]:80' urlsplit:fragment = 'ref' urlsplit:query = 'q=path' https://soft-eng.sydney.edu.au:80/ urlparse:url = 'https://soft-eng.sydney.edu.au:80/' urlsplit:scheme = 'https' urlsplit:netloc = 'soft-eng.sydney.edu.au:80' http://www.fuzzingbook.org/#News urlparse:url = 'http://www.fuzzingbook.org/#News' urlsplit:scheme = 'http' urlsplit:netloc = 'www.fuzzingbook.org' urlsplit:fragment = 'News' Mining with Dynamic Data Flow Analysis

{ '<urlparse:url>': [ ['<urlsplit:scheme>', '://', '<urlsplit:netloc>', '/?', '<urlsplit:query>', '#', '<urlsplit:fragment>'],
['<urlsplit:scheme>', '://', '<urlsplit:netloc>', '/#','<urlsplit:fragment>'], ['<urlsplit:scheme>', '://', '<urlsplit:netloc>', '/']], '<urlsplit:scheme>' : [ ['http'], ['http']], '<urlsplit:netloc>': [ ['user:[email protected]:80'], ['www.fuzzingbook.org'], ['soft-eng.sydney.edu.au']], '<urlsplit:query>' : [ ['q=path']], '<urlsplit:fragment>' : [ ['ref'], ['News']],  } Limitations • Poor accuracy in most handwritten parsers • Handwritten parsers are not often well formed • Control flow is ignored Mining with Dynamic Data Flow Analysis

http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb 73

74 MIMID Where to Get the Grammar From? Hand-written parsers
already encode the grammar http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb

75 Where to Get the Grammar From? 1. Extract the
input string accesses 2. Attach control fl ow information Hand-written parsers already encode the grammar Each control- fl ow structure gets wrapped in a context-manager -If conditionals -Loops -Subroutines

76 Instrumentation by AST rewrite http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb#Context-Mangers

77 Track Where Each Character Index was Accessed http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb#Rewriting-the-source-to-track-control- fl
ow-and-taints.

78 How to Extract This Grammar? • Inputs + control
fl ow -> Dynamic Control Dependence Trees • DCD Trees -> Parse Tree http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb#Mining-the-Traces-Generated

79 Control Dependence Graph Statement B is control dependent on
A if A determines whether B executes. def parse_csv(s,i): while s[i:]: if is_digit(s[i]): n,j = num(s[i:]) i = i+j else: comma(s[i]) i += 1 CDG for parse_csv while: determines whether if: executes

80 def parse_csv(s,i): while s[i:]: if is_digit(s[i]): n,j = num(s[i:])
i = i+j else: comma(s[i]) i += 1 CDG for parse_csv Dynamic Control Dependence Tree Each statement execution is represented as a separate node DCD Tree for call parse_csv()

81 def parse_csv(s,i): while s[i:]: if is_digit(s[i]): n,j = num(s[i:])
i = i+j else: comma(s[i]) i += 1 '1' '2' ',' DCD Tree ~ Parse Tree •No tracking beyond input bu ff er •Characters are attached to nodes where they are accessed last "12," "12,"

82 •Characters are attached to nodes where they are accessed
last http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb#reparsing-behaviour

83 http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb#The-Complete-Miner The Complete Parse Tree Miner

84 def is_digit(i): return i in '0123456789' def parse_num(s,i): n
= '' while s[i:] and is_digit(s[i]): n += s[i] i = i +1 return i,n def parse_paren(s, i): assert s[i] == '(' i, v = parse_expr(s, i+1) if s[i:] == '': raise Ex(s, i) assert s[i] == ')' return i+1, v def parse_expr(s, i = 0): expr, is_op = [], True while s[i:]: c = s[i] if isdigit(c): if not is_op: raise Ex(s,i) i,num = parse_num(s,i) expr.append(num) is_op = False elif c in ['+', '-', '*', '/']: if is_op: raise Ex(s,i) expr.append(c) is_op, i = True, i + 1 elif c == '(': if not is_op: raise Ex(s,i) i, cexpr = parse_paren(s, i) expr.append(cexpr) is_op = False elif c == ')': break else: raise Ex(s,i) if is_op: raise Ex(s,i) return i, expr 9+3/4 Parse tree for parse_expr('9+3/4')

85 http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb#Generalize-Nodes

86 9+3/4 Identifying Compatible Nodes Which nodes correspond to the
same nonterminal http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb#Generalize-Loops

87 (9 + 1) * 3 3 * (9 +
1)

88 9 + 1 3 * (9 + 1)

89 3 (9 + 1) * 3 * (9 +
1)

90 3*(1) 1

91 3*(1) 1 <parse_expr> := <while 1:1> <while 1:0> <while
1:1> <while 1:1> <parse_expr> :=

<parse_expr> := <while 1:1> <while 1:0> <while 1:1> | <while
1:1> <while 1:0> <while 1:1> <while 1:0> <while 1:1> <while 1:0> <while 1:1> | <while 1:1> <while 1:0> <while 1:1> <while 1:0> <while 1:1> | <while 1:1> <while 1:1> := <if 1:1> <if 1:1> := <parse_num> | <parse_paren> <parse_num> := <is_digit> <is_digit> := '3' | '1' <parse_paren>:= '(' <parse_expr> ')' <while 1:0> := <if 1:0> <if 1:0> := '*' 92

<parse_expr> := <while_s> <while_s> := <while_1:1> <while_1:0> <while_s> | <while_1:1>
<parse_expr> := <while 1:1> <while 1:0> <while 1:1> | <while 1:1> <while 1:0> <while 1:1> <while 1:0> <while 1:1> <while 1:0> <while 1:1> | <while 1:1> <while 1:0> <while 1:1> <while 1:0> <while 1:1> | <while 1:1> <while 1:1> := <if 1:1> <if 1:1> := <parse_num> | <parse_paren> <parse_num> := <is_digit> <is_digit> := '3' | '1' <parse_paren>:= '(' <parse_expr> ')' <while 1:0> := <if 1:0> <if 1:0> := '*' 93 Generalizing Loops with Regular Inference

94 http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb#Generating-a-Grammar

95 def is_digit(i): return i in '0123456789' def parse_num(s,i): n
= '' while s[i:] and is_digit(s[i]): n += s[i] i = i +1 return i,n def parse_paren(s, i): assert s[i] == '(' i, v = parse_expr(s, i+1) if s[i:] == '': raise Ex(s, i) assert s[i] == ')' return i+1, v def parse_expr(s, i = 0): expr, is_op = [], True while s[i:]: c = s[i] if isdigit(c): if not is_op: raise Ex(s,i) i,num = parse_num(s,i) expr.append(num) is_op = False elif c in ['+', '-', '*', '/']: if is_op: raise Ex(s,i) expr.append(c) is_op, i = True, i + 1 elif c == '(': if not is_op: raise Ex(s,i) i, cexpr = parse_paren(s, i) expr.append(cexpr) is_op = False elif c == ')': break else: raise Ex(s,i) if is_op: raise Ex(s,i) return i, expr <START> := <parse_expr.0-0-c> <parse_expr.0-0-c> := <parse_expr.0-1-s><parse_expr.0> | <parse_expr.0> <parse_expr.0-1-s> := <parse_expr.0><parse_expr.0-2> | <parse_expr.0><parse_expr.0-2><parse_expr.0-1-s> <parse_expr.0> := '(' <parse_expr.0-0-c> ')' | <parse_num.0-1-s> <parse_expr.0-2> := '*' | '+' | '-' | '/' <parse_num.0-1-s> := <is_digit.0-0-c> | <is_digit.0-0-c><parse_num.0-1-s> <is_digit.0-0-c> : [0-9] calc.py Recovered Arithmetic Grammar

96 8.2 - 27 - -9 / +((+9 * --2
+ --+-+- ((-1 * +(8 - 5 - 6)) * (-(a-+(((+(4) )))) - ++4) / +(-+---((5.6 - --(3 * -1.8 * +(6 * +-(((-(-6) * ---+6)) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) + 8.2 - 27 - -9 / +((+9 * --2 + --+-+-((-1 * + (8 - 5 - 6)) * (-(a-+(((+(4))))) - + +4) / +(-+---((5.6 - --(3 * -1.8 * + (6 * +-(((-(-6) * ---+6)) / +--(+-+- 7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(- -2 - -++-9.0)))) / 5 * --++090 + * - +5 + 7.513)))) - (+1 / ++((-84)))))) )) * 8.2 - 27 - -9 / +((+9 * --2 + - -+-+-((-1 * +(8 - 5 - 6)) * (-(a-+(( (+(4))))) - ++4) / +(-+---((5.6 - -- (3 * -1.8 * +(6 * +-(((-(-6) * ---+6 )) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6 .37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 <START> := <parse_expr.0-0-c> <parse_expr.0-0-c> := <parse_expr.0-1-s><parse_expr.0> | <parse_expr.0> <parse_expr.0-1-s> := <parse_expr.0><parse_expr.0-2> | <parse_expr.0><parse_expr.0-2><parse_expr.0-1-s> <parse_expr.0> := '(' <parse_expr.0-0-c> ')' | <parse_num.0-1-s> <parse_expr.0-2> := '*' | '+' | '-' | '/' <parse_num.0-1-s> := <is_digit.0-0-c> | <is_digit.0-0-c><parse_num.0-1-s> <is_digit.0-0-c> : [0-9]

97 <START> ::= <json_raw> <json_raw> ::= '"' <json_string'> | '['
<json_list'> | '{' <json_dict'> | <json_number'> | 'true' | 'false' | 'null' <json_number'> ::= <json_number>+ | <json_number>+ 'e' <json_number>+ <json_number> ::= '+' | '-' | '.' | [0-9] | 'E' | 'e' <json_string'> ::= <json_string>* '"' <json_list'> ::= ']' | <json_raw> (','<json_raw>)* ']' | ( ',' <json_raw>)+ (',' <json_raw>)* ']' <json_dict'> ::= '}' | ( '"' <json_string'> ':' <json_raw> ',' )* '"'<json_string'> ':' <json_raw> '}' <json_string> ::= ' ' | '!' | '#' | '$' | '%' | '&' | ''' | '*' | '+' | '-' | ',' | '.' | '/' | ':' | ';' | '<' | '=' | '>' | '?' | '@' | '[' | ']' | '^' | '_', ''',| '{' | '|' | '}' | '~' | '[A-Za-z0-9]' | '\' <decode_escape> <decode_escape> ::= '"' | '/' | 'b' | 'f' | 'n' | 'r' | 't' stm.next() if expect_key: raise JSONError(E_DKEY, stm, stm.pos) if c == '}': return result expect_key = 1 continue # parse out a key/value pair elif c == '"': key = _from_json_string(stm) stm.skipspaces() c = stm.next() if c != ':': raise JSONError(E_COLON, stm, stm.pos) stm.skipspaces() val = _from_json_raw(stm) result[key] = val expect_key = 0 continue raise JSONError(E_MALF, stm, stm.pos) def _from_json_raw(stm): while True: stm.skipspaces() c = stm.peek() if c == '"': return _from_json_string(stm) elif c == '{': return _from_json_dict(stm) elif c == '[': return _from_json_list(stm) elif c == 't': return _from_json_fixed(stm, 'true', True, E_BOOL) elif c == 'f': return _from_json_fixed(stm, 'false', False, E_BOOL) elif c == 'n': return _from_json_fixed(stm, 'null', None, E_NULL) elif c in NUMSTART: return _from_json_number(stm) raise JSONError(E_MALF, stm, stm.pos) def from_json(data): stm = JSONStream(data) return _from_json_raw(stm) microjson.py Recovered JSON grammar

101 The Oracle Problem

102 The Oracle Problem • You fuzz the program with
an input. • The program runs. • It fi nishes. Was the answer correct?

103 The Oracle Problem Was the answer correct? Option 1:
Di ff erential Testing Verify input/output against other implementations

Di ff erential Testing Constraint: Need access to di ff erent program implementations (This is the basis for regression testing)

Di ff erential Testing (This is the basis for regression testing) Option 2: Property Based Testing Constraint: Need access to di ff erent program implementations assert f(x) == y

Di ff erential Testing (This is the basis for regression testing) Option 2: Property Based Testing 3: Metamorphic Testing Constraint: Need access to di ff erent program implementations assert f(x + c) == f(x) + c

Di ff erential Testing (This is the basis for regression testing) Option 2: Property Based Testing 3: Metamorphic Testing Constraint: Need access to di ff erent program implementations Constraint: Need speci fi cations  (or intelligence)

Di ff erential Testing (This is the basis for regression testing) Option 2: Property Based Testing 3: Metamorphic Testing Constraint: Need access to di ff erent program implementations Constraint: Need speci fi cations  (or intelligence) Option 4: Symbolic Execution

Di ff erential Testing (This is the basis for regression testing) Option 2: Property Based Testing 3: Metamorphic Testing Constraint: Need access to di ff erent program implementations Constraint: Need speci fi cations  (or intelligence) Option 4: Symbolic Execution  5: Fuzzing

Di ff erential Testing (This is the basis for regression testing) Option 2: Property Based Testing 3: Metamorphic Testing Constraint: Need access to di ff erent program implementations Constraint: Need speci fi cations  (or intelligence) Option 4: Symbolic Execution  5: Fuzzing Constraint: Limited general oracles  (e.g. sanitizers)

111 The Oracle Problem: Solutions

112 The Oracle Problem: Solutions Normal execution establishes patterns: variables
stay within ranges, relationships hold.  A rare execution is one where those patterns break. That breakage is a potential bug.

Option 1: Di ff erential Testing Option 2: Property Based
Testing 3: Metamorphic Testing Constraint: Need access to di ff erent program implementations Constraint: Need speci fi cations  (or intelligence) Option 4: Symbolic Execution  5: Fuzzing Constraint: Limited general oracles  (e.g. sanitizers) 113 The Oracle Problem Was the answer correct? (This is the basis for regression testing) Option 6: Approximate Oracles Daikon Runtime Invariant Miner Ernst et al. 2001

114 A Runtime Invariant Miner http://localhost:8888/notebooks/RoadMap.ipynb#x6_0_InvariantMining

115 Daikon: Core Idea def triangle(a, b, c): if a
== b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Run the program on a bunch of inputs • Watch every variable at key points. • Find properties that always hold. triangle(3, 3, 3) → ENTER: a=3, b=3, c=3 EXIT: return Equilateral triangle(3, 3, 4) → ENTER: a=3, b=3, c=4 EXIT: return Isosceles triangle(2, 2, 4) → ENTER: a=3, b=3, c=4 EXIT: return Isosceles

== b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Run the program on a bunch of inputs • Watch every variable at key points. • Find properties that always hold. triangle(3, 3, 3) → ENTER: a=3, b=3, c=3 EXIT: return Equilateral triangle(3, 3, 4) → ENTER: a=3, b=3, c=4 EXIT: return Isosceles triangle(2, 2, 4) → ENTER: a=2, b=2, c=4 EXIT: return Isosceles After observing many runs: a >= 0 ✓ held every time → likely invariant a == b ✓ held every time → likely invariant

== b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Run the program on a bunch of inputs • Watch every variable at key points. • Find properties that always hold. triangle(3, 3, 3) → ENTER: a=3, b=3, c=3 EXIT: return Equilateral triangle(3, 3, 4) → ENTER: a=3, b=3, c=4 EXIT: return Isosceles triangle(2, 2, 3) → ENTER: a=2, b=2, c=3 EXIT: return Isosceles After observing many runs: a >= 0 ✓ held every time → likely invariant a == b ✘ held every time → likely invariant a + b < c ✓ held every time → likely invariant

== b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Run the program on a bunch of inputs • Watch every variable at key points. • Find properties that always hold. triangle(3, 3, 3) → ENTER: a=3, b=3, c=3 EXIT: return Equilateral triangle(3, 3, 4) → ENTER: a=3, b=3, c=4 EXIT: return Isosceles triangle(2, 2, 3) → ENTER: a=2, b=2, c=3 EXIT: return Isosceles triangle(2, 3, 4) → ENTER: a=2, b=3, c=4 EXIT: return Scalene After observing many runs: a >= 0 ✓ held every time → likely invariant a == b ✘ not held → discarded a + b < c ✓ held every time → likely invariant

== b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Run the program on a bunch of inputs • Watch every variable at key points. • Find properties that always hold. triangle(3, 3, 3) → ENTER: a=3, b=3, c=3 EXIT: return Equilateral triangle(3, 3, 4) → ENTER: a=3, b=3, c=4 EXIT: return Isosceles triangle(2, 2, 3) → ENTER: a=2, b=2, c=3 EXIT: return Isosceles triangle(2, 3, 4) → ENTER: a=2, b=3, c=4 EXIT: return Scalene After observing many runs: a >= 0 ✓ held every time → likely invariant a + b < c ✓ held every time → likely invariant

== b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene •These are likely invariants, not proofs.  •They describe what the code does, not what it should do.

121 Program Points • variable values (and invariants) are anchored
to where they were observed.    triangle:::ENTER ← variable values when function is called  triangle:::EXIT ← variable values (+ return value) when it returns • The same property at `ENTER` and at `EXIT` are two separate invariants. • A variable might be `>= 0` on the way in but not on the way out. • This is Daikon's key design decision: invariants are always located. http://localhost:8888/notebooks/x6_0_SimpleInvariantMiner.ipynb#Program-Points

122 Candidate Templates The miner generates templates for every variable
it sees, then kills them when they fail. Unary (one variable): - x >= 0 - type(x) is int - x is not None Binary (pairs of variables): - x == y  - x <= y  - x >= y   Ternary and so on. Any template that fails even once is discarded. Survivors are your invariants. The template set is the limit of what the miner can discover. Add templates, discover more. http://localhost:8888/notebooks/x6_0_SimpleInvariantMiner.ipynb#Candidate-Invariants

123 Suppression: Cutting the Noise The suppression step removes weaker
invariants dominated by stronger ones: x == y →   x <= y   x >= y  x is not None  y is not None http://localhost:8888/notebooks/x6_0_SimpleInvariantMiner.ipynb#Suppression

What To Do with Invariants Invariants become assertions in your
test suite def test_triangle_invariants(a, b, c): result = triangle(a, b, c) assert a >= 0 assert b >= 0 assert c >= 0 assert isinstance(result, str) Mined from triangle runs — auto-generated oracle

125 Leveraging Uncommon Executions An invariant violation means the program
has entered a state it has never been in before. • Rare executions are worth investigating --- because their correctness has never been veri fi ed. • If unusual, fl ag the current input as a seed for further fuzzing • If extreme, fl ag for manual inspection

127 Now for something completely different.

128 Variety of Fuzzers Publications from 2023-2024

129 Which Fuzzer Should We Use? Exponential growth in fuzzing
literature Cumulative publications in fuzzing Cumulative published vulnerabilities

130 Which Evaluation Metric Should I Use? • Comparable •
Ground truth • Unbiased • Budget friendly

131 New CVEs Found? CVE, short for Common Vulnerabilities and
Exposures, is a list of publicly disclosed computer security fl aws.

132 • Comparable • Ground truth • Unbiased • Budget
friendly Which Fuzzer Should I Use? New CVEs ? ?

133 Evaluate Fuzzers Based on the Speci fi ed Goal
Coverage?

134 • Gets saturated quickly (Signal is lost) • Is
reachability sufficient? • Limited evaluation of complex input conditions involved in expressing bugs • Are more complex coverage techniques needed? Coverage?

friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage

136 Synthetic Bugs? Evaluate Fuzzers Based on the Speci fi
ed Goal

137 Synthetic Bugs? LAVA: Large-scale Automated Vulnerability Addition • What
kind of bugs should be seeded? • What about unknown kinds of bugs? • Where should these bugs be seeded?

friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage Synthetic (LAVA) ? ?

139 Curated Bug Benchmarks? Evaluate Fuzzers Based on the Speci
fi ed Goal

140 Known Bug Benchmarks? E.g. Magma

141 Known Bug Benchmarks? • Gets outdated quickly (overfitting by
fuzzers) • Significant effort in creating and maintaining • Biased as to where and what kind of bugs are present "When a measure becomes a target it ceases to be a good measure" Goodhart's law

friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage Synthetic (LAVA) ? ? Benchm arks (M AGM A) ?

143 New Bugs in Existing Benchmarks Evaluate Fuzzers Based on
the Speci fi ed Goal

144 New Bugs in Existing Benchmarks • No ground truth
• Bug distribution is dependent on external factors • Can lead researchers to postpone publication of vulnerabilities • Feedback can't be used to decide budgeting.

friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage Synthetic (LAVA) ? ? Benchm arks (M AGM A) ? New Bugs in old Benchm arks ? Can

147 Fuzzing Your Fuzzer a.k.a. Mutation Testing

148 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN)
{ copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? IDEA: Induce a program variation with each valid token replacement

149 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN
+ 1) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? Fixes for independent bugs are almost always simple. Finite syntactic size for faults  (aka. competent programmer hypothesis): Gopinath, Jensen, and Groce Mutations: How Close are they to Real Faults? 2014 ISSRE

150 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN)
{ copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? (1) IDEA: Induce a program variation with each valid token replacement

151 unsigned int len = message_length(msg); if (len < >=
MAX_BUF_LEN) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? (2) IDEA: Induce a program variation with each valid token replacement

+ 1) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? (3) IDEA: Induce a program variation with each valid token replacement

+ 16) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? Complex bugs are almost always coupled to simpler bugs. Finite semantic depth for failures  aka. Coupling e ff ect hypothesis: Gopinath, Jensen, and Groce Mutations: How Close are they to Real Faults? 2017 ICST

155 Mutation Testing Process m4 m3 m2 m1 I1 I2
I4 I3 I5 Generate Mutants Generate Fuzz Inputs > Detect Differences from Original > Mutation Score = Detected Mutants / Generated Mutants Original

156 M mutants (1 input) Number of Mutants Executions Total
Campaign Effort for mutation testing = MxN program executions Mutation Testing Challenge N inputs (1 mutant) Number of Inputs Executions

157 What Is The Problem? Computation: Fuzzing -- More executions
the better Mutation testing -- Execute each input per mutation Solution: 1) Perform coverage analysis fi rst; remove trivial mutants 2) Evaluate independent (non-interacting) mutations simultaneously 3) Remove redundancy in executions

https://rahul.gopinath.org Traditional execution 158 m1 m2 m3 Setup for T1
Setup for T2 Actual tests Mutants are executed in parallel But a majority of time spent in initialization (an average 7 times the test execution time) (Bell 2014)

https://rahul.gopinath.org Split-Stream Execution 159 Setup for T1 Setup for T2
Actual tests Execute tests in parallel Fork off mutants as they are encountered T1 T2 Gopinath, Jensen, Groce “Topsy Turvy: A faster and smarter algorithm” ICSE 2016

https://rahul.gopinath.org Split-Stream Execution 160

friendly Mutation Testing New CVEs ? ? Coverage Synthetic (LAVA) ? ? Benchm arks (M AGM A) ? New Bugs in old Benchm arks ? M utation Testing ?

163 Seeded Fault Benchmarks • Easy to fine-tune a fuzzer
to overfit • Faults are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

164 Mutation Analysis • Easy to fine-tune a fuzzer to
overfit • Faults are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

165 Mutation Analysis • Easy to fine-tune a fuzzer to
overfit Very large number of faults • Faults are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

166 Mutation Analysis • Not easy to fine-tune • Faults
are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

167 Mutation Analysis • Not easy to fine-tune • Faults
are rarely similar to real faults Evidence that mutants are similar to real faults. • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

168 Mutation Analysis • Not easy to fine-tune • Evidence
that mutants are similar to real faults. • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • Based on bugs we know about! All possible faults, including unknown ones! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • Human bias in bug curation No human bias in introduced faults! • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • Limited supply As many as required! Including higher order ones! • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • As many as required! Including higher order ones! • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • As many as required! Including higher order ones! • Bug interactions requiring deduplication All mutants are evaluated independently

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • As many as required! Including higher order ones! • All mutants are evaluated indepdently

Specification for Free: Behavior-Driven Fuzzin...

Specification for Free: Behavior-Driven Fuzzing with Inferred Specifications

More Decks by Rahul Gopinath

Other Decks in Research

Featured

Transcript