How to Compare Fuzzers

1 How to Compare Fuzzers Rahul Gopinath https://rahul.gopinath.org [email protected]
@[email protected] with mutation analysis

2 How to Compare Fuzzers Rahul Gopinath https://rahul.gopinath.org [email protected] @[email protected]
with mutation analysis

Monzo Each node is a service

https://blog.qualys.com/vulnerabilities-threat-research/2023/09/04/qualys-top-20-exploited-vulnerabilities Bugs ... Are a Fact of Life

Bugs ... How Do We Remove Them? Report all violations
All reported violations are true Guarantees of resulting artefacts False Alarms Dynamic Analysis Software Testing Static Analysis Type Checking Missing cases

6 Input ✓ ✘ @app.route('/admin') def admin(): username = request.cookies.get("username")
if not username: return {"Error": "Specify username in Cookie"} username = urllib.quote(os.path.basename(username)) url = "http://permissions:5000/permissions/{}".format(username) resp = requests.request(method="GET", url=url) # "superadmin\ud888" will be simpli fi ed to "superadmin" ret = ujson.loads(resp.text) if resp.status_code == 200: if "superadmin" in ret["roles"]: return {"OK": "Superadmin Access granted"} else: e = u"Access denied. User has following roles: {}".format(ret["roles"]) return {"Error": e}, 401 else:return {"Error": ret["Error"]}, 500 Software Testing -- Key to Bug Removal

struct { type: error; payload_length: 4; payload[HELO]; } HeartbeatMessage; struct
{ type: response; payload_length: 64k; payload[HELO..dafdf..maybe...a..secret key...in...the...memory...or...other.. sensitive...information...that...the.. attacker..is..interested...in...afdfdf adfdfdf..and..other..random..junk]; } HeartbeatMessage; 7 Input ✓ ✘ Testing struct { type: request; payload_length: 64k; payload[HELO]; } HeartbeatMessage; /* Allocate memory for the response, size is 1 byte * message type, plus 2 bytes payload length, plus * payload, plus padding */ buffer = OPENSSL_malloc(1 + 2 + payload + padding); bp = buffer; /* Enter response type, length and copy payload */ *bp++ = TLS1_HB_RESPONSE; s2n(payload, bp); memcpy(bp, pl, payload); /* Random padding */ RAND_pseudo_bytes(p, padding); r = dtls1_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding);

@app.route('/admin') def admin(): username = request.cookies.get("username") if not username: return
{"Error": "Specify username in Cookie"} username = urllib.quote(os.path.basename(username)) url = "http://permissions:5000/permissions/{}".format(username) resp = requests.request(method="GET", url=url) # "superadmin\ud888" will be simpli fi ed to "superadmin" ret = ujson.loads(resp.text) if resp.status_code == 200: if "superadmin" in ret["roles"]: return {"OK": "Superadmin Access granted"} else: e = u"Access denied. User has following roles: {}".format(ret["roles"]) return {"Error": e}, 401 else:return {"Error": ret["Error"]}, 500 [ ; x 1 - G P Z + w c c k c ] ; , N 9 J + ? # 6 ^ 6 \ e ? ] 9 l u 2 _ % ' 4 G X " 0 V U B [ E / r ~ f A p u 6 b 8 < { % s i q 8 Z h . 6 { V , h r ? ; {Ti.r3PIxMMMv6{xS^+'Hq!AxB"YXRS@! Kd6;wtAMefFWM(`|J_<1~o}z3K(CCzRH J I I v H z > _ * . \ > J r l U 3 2 ~ e G P ? lR=bF3+;y$3lodQ<B89!5"W2fK*vE7v{')KC- i,c{<[~m!]o;{.'}Gj\(X}EtYetrpbY@aGZ1{P! A Z U 7 x # 4 ( R t n ! q 4 n C w q o l ^ y 6 } 0 | Ko=*JK~;zMKV=9Nai:wxu{J&UV#HaU)*Bi C < ) , ` + t * g k a < W = Z . % T 5 W G H Z p I 3 0 D < P q > & ] B S 6 R & j ? # t P 7 i a V } - } ` \ ? [ _ [ Z ^ L B M P G - FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2D|vBy! ^ z k h d f 3 C 5 P A k R ? V ( ( - % > < h n | 3='i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@ 5 : d f d 4 5 * ( 7 ^ % 5 a p \ z I y l " ' f , $ee,J4Gw:cgNKLie3nx9(`efSlg6#[K"@Wjh Z}r[Scun&sBCS,T[/3]KAeEnQ7lU)3Pn,0)G/ 6N-wyzj/MTd#A;r Program Fuzzing 8 https://www.fuzzingbook.org/html/Fuzzer.html

@app.route('/admin') def admin(): username = request.cookies.get("username") if not username: return
{"Error": "Specify username in Cookie"} username = urllib.quote(os.path.basename(username)) url = "http://permissions:5000/permissions/{}".format(username) resp = requests.request(method="GET", url=url) # "superadmin\ud888" will be simpli fi ed to "superadmin" ret = ujson.loads(resp.text) if resp.status_code == 200: if "superadmin" in ret["roles"]: return {"OK": "Superadmin Access granted"} else: e = u"Access denied. User has following roles: {}".format(ret["roles"]) return {"Error": e}, 401 else:return {"Error": ret["Error"]}, 500 Fuzzing Trash deck technique: 1950s - Gerald Weinberg Crash? 9

10 Variety of Fuzzers Publications from 2023-2024

11 Which Fuzzer Should We Use? Exponential growth in fuzzing
literature Cumulative publications in fuzzing Cumulative published vulnerabilities

12 Which Evaluation Metric Should I Use? • Comparable •
Ground truth • Unbiased • Budget friendly

13 New CVEs Found? CVE, short for Common Vulnerabilities and
Exposures, is a list of publicly disclosed computer security fl aws.

14 • Comparable • Ground truth • Unbiased • Budget
friendly Which Fuzzer Should I Use? New CVEs ? ?

15 Evaluate Fuzzers Based on the Speci fi ed Goal
Coverage?

16 • Gets saturated quickly (Signal is lost) • Is
reachability sufficient? • Limited evaluation of complex input conditions involved in expressing bugs • Are more complex coverage techniques needed? Coverage?

friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage

18 Synthetic Bugs? Evaluate Fuzzers Based on the Speci fi
ed Goal

19 Synthetic Bugs? LAVA: Large-scale Automated Vulnerability Addition • What
kind of bugs should be seeded? • What about unknown kinds of bugs? • Where should these bugs be seeded?

friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage Synthetic (LAVA) ? ?

21 Curated Bug Benchmarks? Evaluate Fuzzers Based on the Speci
fi ed Goal

22 Known Bug Benchmarks? E.g. Magma

23 Known Bug Benchmarks? • Gets outdated quickly (overfitting by
fuzzers) • Significant effort in creating and maintaining • Biased as to where and what kind of bugs are present "When a measure becomes a target it ceases to be a good measure" Goodhart's law

friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage Synthetic (LAVA) ? ? Benchm arks (M AGM A) ?

25 New Bugs in Existing Benchmarks Evaluate Fuzzers Based on
the Speci fi ed Goal

26 New Bugs in Existing Benchmarks • No ground truth
• Bug distribution is dependent on external factors • Can lead researchers to postpone publication of vulnerabilities • Feedback can't be used to decide budgeting.

friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage Synthetic (LAVA) ? ? Benchm arks (M AGM A) ? New Bugs in old Benchm arks ? Can

29 Fuzzing Your Fuzzer a.k.a. Mutation Testing

30 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN)
{ copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? IDEA: Induce a program variation with each valid token replacement

31 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN
+ 1) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? Fixes for independent bugs are almost always simple. Finite syntactic size for faults  (aka. competent programmer hypothesis): Gopinath, Jensen, and Groce Mutations: How Close are they to Real Faults? 2014 ISSRE

32 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN)
{ copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? (1) IDEA: Induce a program variation with each valid token replacement

33 unsigned int len = message_length(msg); if (len < >=
MAX_BUF_LEN) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? (2) IDEA: Induce a program variation with each valid token replacement

+ 1) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? (3) IDEA: Induce a program variation with each valid token replacement

+ 16) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? Complex bugs are almost always coupled to simpler bugs. Finite semantic depth for failures  aka. Coupling e ff ect hypothesis: Gopinath, Jensen, and Groce Mutations: How Close are they to Real Faults? 2017 ICST

36 Parameter Interactions in Faults

37 Mutation Testing Process m4 m3 m2 m1 I1 I2
I4 I3 I5 Generate Mutants Generate Fuzz Inputs > Detect Differences from Original > Mutation Score = Detected Mutants / Generated Mutants Original

38 M mutants (1 input) Number of Mutants Executions Total
Campaign Effort for mutation testing = MxN program executions Mutation Testing Challenge N inputs (1 mutant) Number of Inputs Executions

39 What Is The Problem? Computation: Fuzzing -- More executions
the better Mutation testing -- Execute each input per mutation Solution: 1) Perform coverage analysis fi rst; remove trivial mutants 2) Evaluate independent (non-interacting) mutations simultaneously 3) Remove redundancy in executions

https://rahul.gopinath.org Traditional execution 40 m1 m2 m3 Setup for T1
Setup for T2 Actual tests Mutants are executed in parallel But a majority of time spent in initialization (an average 7 times the test execution time) (Bell 2014)

https://rahul.gopinath.org Split-Stream Execution 41 Setup for T1 Setup for T2
Actual tests Execute tests in parallel Fork off mutants as they are encountered T1 T2 Gopinath, Jensen, Groce “Topsy Turvy: A faster and smarter algorithm” ICSE 2016

https://rahul.gopinath.org Split-Stream Execution 42

friendly Mutation Testing New CVEs ? ? Coverage Synthetic (LAVA) ? ? Benchm arks (M AGM A) ? New Bugs in old Benchm arks ? M utation Testing ?

45 Seeded Fault Benchmarks • Easy to fine-tune a fuzzer
to overfit • Faults are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

46 Mutation Analysis • Easy to fine-tune a fuzzer to
overfit • Faults are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

47 Mutation Analysis • Easy to fine-tune a fuzzer to
overfit Very large number of faults • Faults are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

48 Mutation Analysis • Not easy to fine-tune • Faults
are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

49 Mutation Analysis • Not easy to fine-tune • Faults
are rarely similar to real faults Evidence that mutants are similar to real faults. • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

50 Mutation Analysis • Not easy to fine-tune • Evidence
that mutants are similar to real faults. • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • Based on bugs we know about! All possible faults, including unknown ones! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • Human bias in bug curation No human bias in introduced faults! • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • Limited supply • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • Limited supply As many as required! Including higher order ones! • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • As many as required! Including higher order ones! • Bug interactions requiring deduplication

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • As many as required! Including higher order ones! • Bug interactions requiring deduplication All mutants are evaluated indepdently

that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • As many as required! Including higher order ones! • All mutants are evaluated indepdently

How to Compare Fuzzers

How to Compare Fuzzers

More Decks by Rahul Gopinath

Other Decks in Research

Featured

Transcript